from:"david at westcontrol dot com"

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

--- Comment #12 from David Brown  ---
(In reply to Segher Boessenkool from comment #11)
> (In reply to David Brown from comment #8)
> > As for using "=X" in the "opt == 3" case, I worry that that could lead to
> > errors as the two assembly lines are independent.  The first says "put X
> > anywhere", and the second - if it had "=X" - says "take X from anywhere". 
> > These don't have to be the same "anywhere" unless the input and output are
> > in the same statement - and if the compiler picked different anywheres, the
> > code would not work.
> 
> This cannot lead to errors.  The compiler knows where "x" is (it put it there
> itself!)

If I write :

int regtest(int a, int x) {
(void) a;// Ignore first parameter
return x;
}

I get the assembly (x86-64) :

movl %esi, %eax
ret

or (arm 32-bit)

mov r0, r1
bx lr


That's all fine, and as expected.  But with this :

int regtest(int a, int x) {
(void) a;// Ignore first parameter
asm("" :: "X" (x);
asm("" : "=X" (x);
return x;
}

the result is just a "ret" or a "bx lr" instruction.  The register move is
missing, because for the first assembly the "anywhere" is picked as esi/r1, and
for the second assembly the "anywhere" is picked as eax/r0.  This is a
perfectly valid choice of registers for the compiler, but it is not what we
want in this use-case.

It is only if I use "+X" (or add a "0" (x) input), rather than "=X", that the
compiler picks the same register for input and output, because it is within the
one assembly instruction.

(The same applies to "=g" / "+g".)

I don't understand why having a "asm("" :: "X" (x))" seems to be necessary
sometimes for "asm("" : "+X" (x))" to work the way I expect, without a compiler
error, but I /do/ understand why "asm("" :: "X" (x))" followed by "asm("" :
"=X" (x))" will sometimes not give the results I am looking for.

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

--- Comment #8 from David Brown  ---
(In reply to Segher Boessenkool from comment #4)
> Nothing has changed here.
> 
> opt == 2 and opt == 3 should use "=X", not "+X", btw.
> 

I realise (since you told me - thanks) that

asm ("" : "+X" (x))

and

asm ("" : "=X" (x) : "0" (x))

are the same.  However, it seems that 

asm ("" : "+X" (x) : "0" (x))

is different.  If I change "opt == 2" to use "=X", then it acts exactly like
the "opt == 1" version (as expected), but having "+X" in "opt == 2" gives
different results.  I don't know /why/ this seems to be treated differently,
since it is just repeating the input in the same target, but it is.

As for using "=X" in the "opt == 3" case, I worry that that could lead to
errors as the two assembly lines are independent.  The first says "put X
anywhere", and the second - if it had "=X" - says "take X from anywhere". 
These don't have to be the same "anywhere" unless the input and output are in
the same statement - and if the compiler picked different anywheres, the code
would not work.

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

--- Comment #7 from David Brown  ---
Yes, the goal is an optimisation barrier with the least possible impact on the
code.  For most uses, asm("" : "+g" (x)) has been ideal, as far as I have
tested.  Typically it ensures "x" is evaluated in a register before the
assembly "executes", and uses that register for "x" afterwards.  But "g" also
matches memory, so if "x" happens to be in a stack slot, it gets left there -
there is no extra movement in and out of a register.

But "g" does not match floating point or SIMD registers.  So if "x" is in one
of these registers, extra moves are generated, and I'd like to avoid them.  So
the ideal constraint would be one that matches memory or any register that can
contain data.  That's not quite the same as "match anything" - it should not
match immediate values, nor "special" registers such as PC, SP, or a flags
register, or the extra registers many processors have for floating point
settings, link registers, interrupt state registers, and so on.  But it might
include registers for other accelerators that some chips have.

[Bug c/113280] New: Strange error for empty inline assembly with +X constraint

2024-01-08 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

Bug ID: 113280
   Summary: Strange error for empty inline assembly with +X
constraint
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

I am getting strange results with an empty inline assembly line, used as an
optimisation barrier.  This is the test code I've tried in different setups:

#define opt 1

typedef int T;
T test(T a, T b) {
T x = a + b;
#if opt == 1
asm ("" : "+X" (x));   
#endif
#if opt == 2
asm ("" : "+X" (x) : "0" (x));
#endif
#if opt == 3
asm ("" :: "X" (x));
asm ("" : "+X" (x));   
#endif
return x - b;
} 

The idea of the assembly line is to be an optimisation barrier - it forces the
compiler to calculate the value of "x" before the assembly line, and has to use
that value (rather than any pre-calculated information) after the assembly
line.  This makes it something like an "asm("" ::: "memory")" barrier, but
focused on a single variable, which may not need to be moved out of registers. 
I've found this kind of construct "asm("" : "+g" (x))" useful to control
ordering of code around disabled interrupt sections of embedded code, and it
can also force the order of floating point calculations even while -ffast-math
is used.  Usually I've used "g" as the operand constraint, and that has worked
perfectly for everything except floating point types on targets that have
dedicated floating point registers.

I had expected "+X" (option 1 above) to work smoothly for floating point types.
 And it does, sometimes - other times I get an error:

error: inconsistent operand constraints in an 'asm'

I've tested mainly on x86-64 (current trunk, which will be 14.0) and ARM 32-bit
(13.2), with -O2 optimisation.  I get no errors with -O0, but there is no need
of an optimisation barrier if optimisation is not enabled!  The exact compiler
version does not seem to make a difference, but the target does, as does the
type T.

I can't see why there should be a difference in the way these three variations
work, or why some combinations give errors.  When there is no compiler error,
the code (in all my testing) has been optimal as intended.

Some test results:

T = float :
  opt 1   arm-32 fail  x86-64 fail
  opt 2   arm-32 fail  x86-64 ok
  opt 3   arm-32 ok  x86-64 ok

T = int :
  opt 1   arm-32 fail  x86-64 fail
  opt 2   arm-32 fail  x86-64 ok
  opt 3   arm-32 ok  x86-64 ok

T = double :
  opt 1   arm-32 fail  x86-64 fail
  opt 2   arm-32 fail  x86-64 ok
  opt 3   arm-32 ok  x86-64 ok

T = long long int :
  opt 1   arm-32 ok  x86-64 fail
  opt 2   arm-32 ok  x86-64 ok
  opt 3   arm-32 ok  x86-64 ok

T = _Complex double :
  opt 1   arm-32 ok  x86-64 ok
  opt 2   arm-32 fail  x86-64 fail
  opt 3   arm-32 ok  x86-64 ok

T = long double :
  (arm-32 has 64-bit long double, so the results are not relevant)
  opt 1 gives "inconsistent operand constraints in an asm"
  opt 2 and 3 give "output constraint 0 must specify a single register", which
is an understandable error.


It looks like the "+X" constraint only works reliably if preceded by the
assembly line with the "X" input constraint.  For my own use, I can live with
that - it all goes in a macro anyway.  But maybe it points to a deeper issue,
or at least a potential for neatening things up.

[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2023-12-19 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

--- Comment #51 from David Brown  ---
(In reply to M Welinder from comment #48)
> It's your (1).  gcc is changing a program that can rely on errno not being
> changed to one where the C library can change it.  (The current C library or
> any future library that the resulting binary may be dynamically linked
> against.)
> 
> Is there any real-world situation that benefits from introducing these
> calls?  It has the feel of optimizing for a benchmark.

There are several real-world benefits from transforming back and forth between
library calls for this kind of small standard library function.  One is that
turning explicit code into library calls can give smaller code - often of
importance in small embedded targets.  Sometimes it can also result in run-time
improvements, especially for larger data sizes - user-written code might just
copy byte by byte, while the library implementation uses more efficient larger
blocks.

Another is that turning library calls into inlined code can speed up code by
using additional knowledge of sizes, alignment, etc., to get faster results. 
This is most obvious for calls to memcpy() or memmove(), which can sometimes be
required to get the semantics correct for type manipulation, but may generate
no actual code at all.

A "C implementation" consists of a compiler and a standard library in tandem. 
The C library can make use of its knowledge of the C compiler, and any special
features, in its implementation.  (This is, in fact, required - some things in
the standard library cannot be implemented in "pure" C.)  The C compiler can
make use of its knowledge of the library implementation in its code generation
or analysis.  For the most part, compilers only make use of their knowledge of
the specifications of standard library functions, but they might also use
implementation details.

This means it is quite legitimate for the GCC team to say that gcc requires a C
library that does not set errno except for functions that explicitly say so in
their specifications.  Users don't get to mix and match random compilers and
random standard libraries and assume they form a conforming C implementation -
the pair must always be checked for compatibility.

The question then is if this would be an onerous requirement for standard
library implementations - do common existing libraries set errno in functions
that don't require it?  I cannot say, but I would be very surprised if they
did.  Modern thought, AFAIUI, considers errno to be a bad idea which should be
avoided whenever possible - it is a hinder to optimisation, analysis, and
parallelisation of code, as well as severely limiting C++ constexpr and other
compile-time calculations.

My thoughts here are that GCC should make this library requirement explicit and
public, after first confirming with some "big name" libraries like glibc,
newlib and muslc.  They could also add a flag "-funknown-stdlib" to disable any
transforms back or forth between standard library calls, and assume nothing
about the calls (not even what is given in the standards specifications).


(As a note - the paragraph 7.5p3 allowing standard library functions to set
errno is still in the current draft of C23.)

[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2023-12-18 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

--- Comment #47 from David Brown  ---
(In reply to M Welinder from comment #46)
> Should "-std=c99" imply turning off these optimizations?
> 
> Creating calls to, say, strlen is incompatible with the C99 standard and
> perhaps better limited to "-std=gnu-something" or an opt-in f-flag.

How is it incompatible with C99 to create calls to library functions?  I can
think of a two possibilities:

1. If the function implementation plays with errno (allowed in 7.5p3), in a way
that is visible to the code.

2. If the function is called with parameters that may invoke undefined
behaviour (such as calling "strlen" without being sure that the parameter
points to a null-terminated string), where such undefined behaviour is not
already present.

If the user writes code that acts like a call to strlen (let's assume the
implementation knows strlen does not change errno), then the compiler can
replace it with a library call.  Similarly, if the user writes a call to
strlen, then the compiler can replace it with inline code.

As long as there is no difference in the observable behaviour, the
transformation is allowed.

Or am I missing something here?

[Bug c/96284] Outdated C features should be made errors with newer standards

2023-12-01 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96284

--- Comment #16 from David Brown  ---
Thank you for making these changes.  There's always a trade-off between
supporting code that "has always compiled fine and works in testing", and
making it harder for people to write new poor quality code with a high risk of
bugs.  I believe, however, that changes like this help all developers - and
more importantly, help all the end-users by reducing software bug rates, even
if it is just by a very small step.

[Bug c/111769] Annotate function definitions and calls to facilitate link-time checking

2023-10-12 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111769

--- Comment #4 from David Brown  ---
(In reply to Richard Biener from comment #1)
> If you compile with debug info enabled the info should be already there,
> just nothing looks at this (and mismatches) at link time.

Perhaps I should file this as an enhancement request for binutils?  If there is
enough information already generated by gcc, then only the linker part needs to
be implemented.  (It could also be checked by an external program, but that
would be a lot more inconvenient for the user.)

Will the binutils folk be familiar with the debug information and its layout,
or is there a document I can point them to?  I assume this kind of information
will be stable and documented, because gdb will need it.  (I can read through
the material for the details, but a rough pointer to get me started would
help.)

[Bug c/111769] Annotate function definitions and calls to facilitate link-time checking

2023-10-12 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111769

--- Comment #3 from David Brown  ---
(In reply to Andrew Pinski from comment #2)
> IIRC there was a bug about this specific thing which was closed as fixed
> with the use of LTO ...

Certainly if you use LTO, then this is not necessary.  But LTO use is far from
universal, and there can be good reasons not to use it.  (It is rarely seen in
small embedded systems, for various reasons - some good, some less good.)  The
thought here is for a very low cost (for users) enhancement - the check could
be introduced without any change to the users' existing build process, code, or
linker scripts.

[Bug c/111769] New: Annotate function definitions and calls to facilitate link-time checking

2023-10-11 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111769

Bug ID: 111769
   Summary: Annotate function definitions and calls to facilitate
link-time checking
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

This is an enhancement idea, rather than a bug, and would require linker
support as well as compiler support.

In C, it is possible to define a function with one set of parameters (number
and type) in one translation unit, and declare and use it with a different set
in a different translation unit.  This is undefined behaviour, and generally a
really bad idea, but it can't be spotted by the toolchain unless you use LTO.

In C++, you don't see the same effect as different name mangling would result
in linker failures.

Would it be possible for the C compiler, when emitting an external linkage
function definition and the use (or possibly declaration) of a function (for
calling or taking its address), to add a directive or debug info section entry
giving the function type in active use?  These could then be checked at link
time, even without LTO, to catch mismatches.

My suggestion would be to use the existing C++ name-mangling scheme to encode
the function types.  Thus the declaration and definition of "int square(int
num)" would be annotated with a note that the function "square" has signature
"_Z6squarei".

[Bug c++/111399] Bogus -Wreturn-type diagnostic

2023-09-18 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111399

--- Comment #2 from David Brown  ---
Would it be possible to have the "-Wreturn-type" warning pass not issue a
warning immediately, but inject a warning into the code that could then be
removed later by optimisation?

What I mean, is to have something roughly like this :

void __attribute__((warning("Reached return in non-void function")))
__ReturnNonVoid(void);

int sign(int x) {
if (x < 0) return -1;
if (x == 0) return 0;
if (x > 0) return 1;
__ReturnNonVoid();
}


Instead of issuing a warning straight away, if the compiler had added a call to
a non-existent "__ReturnNonVoid" function (just like the sanitizer adds a call
to "__ubsan_handle_missing_return"), this would be optimised away in functions
like this sample and the false positive warning would not be issued.  (Ideally,
of course, it would be suitable GIMPLE, or appropriate internal representation
for that pass, rather than a call to a non-existent function.)

[Bug c/111400] Missing return sanitization only works in C++

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

--- Comment #4 from David Brown  ---
(In reply to Andreas Schwab from comment #3)
> You already have -W[error=]return-type.

Yes, and that is what I normally use - I am a big fan of gcc's static warnings.

Sometimes, however, there are false positives, or perhaps other reasons why the
programmer thinks it is safe to ignore the warning in a particular case.  Then
sanitizers can be a useful run-time fault-finding aid.  There's certainly a lot
of overlap in the kinds of mistakes that can be found with -Wreturn-type and
with -fsanitizer=return-type, but there are still benefits in have both.  (You
have both in C++, just not in C.)

[Bug c/111400] Missing return sanitization only works in C++

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

--- Comment #2 from David Brown  ---
(In reply to Richard Biener from comment #1)
> Confirmed.  Note C17 disallows a return wotihout an expression for a funcion
> that returns a value, not sure if that means falling off the function
> without a return (value) is still OK, it at least feels inconsistent.

This has all remained unchanged from C99 to C23 (draft), I believe, which makes
things easier!

As far as I can tell, the relevant point in the standards is 6.9.1p12,
"Function definitions", which says "Unless otherwise specified, if the } that
terminates a function is reached, and the value of the function call is used by
the caller, the behaviour is undefined".  

So while a non-void function cannot have a return statement without an
expression (6.8.6.4p1), control flow /can/ run off the terminating }.  I think
this is perhaps a concession to older pre-void C code, when a function that
does not have a return value would still be declared to return "int".

Thus I think gcc's lack of a sanitizer here is technically accurate - but not
helpful, unless you are working with 35 year old code!

[Bug c/111400] New: Missing return sanitization only works in C++

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

Bug ID: 111400
   Summary: Missing return sanitization only works in C++
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

With C++ and -fsanitize=return, the code :

int foo(void) { }

generates a call to __ubsan_handle_missing_return.

For C, there is no sanitizer call - just a simple "ret" instruction.

This is, of course, because in C (unlike C++), falling off the end of a
non-void function is legal and defined behaviour, as long as caller code does
not try to use the non-existent return value.  But just like in C++, it is
almost certainly an error in the C code if control flow ever falls off the end
of a non-void function.

Could -fsanitize=return be added to C?  It should not be included by
-fsanitize=undefined in C, since the behaviour is actually allowed, but it
would still be a useful option that could be enabled individually.

[Bug c++/111399] New: Sanitizer code generation smarter than warnings

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111399

Bug ID: 111399
   Summary: Sanitizer code generation smarter than warnings
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Given this code :

int sign(int x) {
if (x < 0) return -1;
if (x == 0) return 0;
if (x > 0) return 1;
}


and compiled with "-O2 -Wall", gcc is unable to see that all possible cases for
"x" are covered, so it generates a "control reaches end of non-void function
[-Wreturn-type]" warning.  It would be nice if gcc could see this is a false
positive, but analysis and warnings can't be perfect.

However, if I add the flag "-fsanitize=undefined", the compiler is smart enough
to see that all cases are covered, and there is no call to
__ubsan_handle_missing_return generated.

If the sanitizer code generation can see that all cases are covered, why can't
the -Wreturn-type warning detection?  I'm guessing it comes down to the
ordering of compiler passes and therefore the level of program analysis
information available at that point.  But perhaps the -Wreturn-type pass could
be done later when the information is available?

[Bug c/110249] __builtin_unreachable helps optimisation at -O1 but not at -O2

2023-06-14 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249

--- Comment #4 from David Brown  ---
Yes, __builtin_assume_aligned is the best way to write things in this
particular case (and optimises fine for -O1 and -O2).  It is also clearer in
the source code (IMHO), as it shows the programmer's intention better.

I am just a little concerned that optimisation hints provided by the programmer
via __builtin_unreachable are getting lost at -O2.  When the compiler knows
that something cannot happen - whether by __builtin_unreachable or by other
code analysis - it can often use that information to generate more efficient
code.  If that information can be used for better code at -O1, but is lost at
-O2, then something is wrong in the way that information is collected or passed
on inside the compiler.  Any case of -O1 generating more efficient code than
-O2 is a sign of a problem or limitation.

So I am not particularly bothered about this one piece of code - I am reporting
the issue because it might be an indication of a wider problem.

[Bug c/110249] __builtin_unreachable helps optimisation at -O1 but not at -O2

2023-06-14 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249

--- Comment #2 from David Brown  ---
My apologies - it does not optimise the code to a single aligned load at -O1
(at least, not with the combinations I have now tried).  The code was
originally C++, with a reference, which /does/ exhibit this behaviour of having
better code at -O1 than -O2.  I had translated the reference to a pointer to
get more generic code that is valid as C and C++, but I don't see the same
effect with the pointer.

For a reference, the issue is as I first described:

#include 
#include 

uint64_t read64r(const uint64_t ) {
if ((uint64_t)  % 8 ) {
__builtin_unreachable();
}
 uint64_t value;
 memcpy( , , sizeof(uint64_t) );
 return value; 
}

I tested with 64-bit RISC-V and 32-bit ARM using trunk (gcc 13.1, I think) on
godbolt.org.  The only relevant flag was the optimisation level.

[Bug c/110249] New: __builtin_unreachable helps optimisation at -O1 but not at -O2

2023-06-14 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249

Bug ID: 110249
   Summary: __builtin_unreachable helps optimisation at -O1 but
not at -O2
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Sometimes it can be useful to use __builtin_unreachable() to give the compiler
hints that can improve optimisation.  For example, it can be used here to tell
the compiler that the parameter is always 8-byte aligned:

#include 
#include 

uint64_t read64(const uint64_t * p) {
if ((uint64_t) p % 8 ) {
__builtin_unreachable();
}
 uint64_t value;
 memcpy( , p, sizeof(uint64_t) );
 return value; 
}

For some targets, such as 32-bit ARM and especially RISC-V, this can make a
difference to the generated code.  In testing, when given -O1 the compiler
takes advantage of the explicit undefined behaviour to see that the pointer is
aligned, and generates a single 64-bit load.  With -O2, however, it seems that
information is lost - perhaps due to earlier optimisation passes - and now slow
unaligned load code is generated.

Ideally, such optimisation information from undefined behaviour (explicit via a
builtin, or implicit via other code) should be kept - -O2 should have at least
as much information as -O1.  An alternative would be the addition of a more
directed "__builtin_assume" function that could be used.

(I know that in this particular case, __builtin_assume_aligned is available and
works exactly as intended at -O1 and -O2, but I think this is a more general
issue than just alignments.)

[Bug libstdc++/109631] New: Simple std::optional types returned on stack, not registers

2023-04-26 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109631

Bug ID: 109631
   Summary: Simple std::optional types returned on stack, not
registers
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

A std::optional is fundamentally like a std::pair, but with nice
access features, type safety, etc.  But for simple functions it is
significantly less efficient, because in gcc it is returned on the stack even
when T fits in a single register.

<https://godbolt.org/z/6hx6vxaG7>

On targets such as x86-64 and AARCH64, simple structs and pairs that fit in two
registers, are returned in two registers - there is no need for the caller to
reserve stack space and pass a hidden pointer to the callee function.  On clang
with its standard library, this also applies to std::optional for suitable
types T such as "int", but on gcc and its standard C++ library, the stack is
used for returning the std::optional.  Since both clang and gcc follow the
same calling conventions, this must (I believe) be a result of different
implementations of std::optional<> in the C++ libraries.

(I note that code for std::pair is more efficient here than using
std::pair.  But perhaps the details of std::optional<> require that
the contained element comes first.)



#include 
#include 

using A = std::optional;
using B = std::pair;
using C = std::pair;

A foo_A(int x) {
return x;
}

B foo_b(int x) {
B y { x, true };
return y;
}

C foo_c(int x) {
C y { true, x };
return y;
}

[Bug target/105523] Wrong warning array subscript [0] is outside array bounds

2023-04-25 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523

--- Comment #25 from David Brown  ---
(In reply to Andrew Pinski from comment #24)
> (In reply to LIU Hao from comment #22)
> > Yes, GCC should be told to shut up about dereferencing artificial address
> > values.
> 
> NO.
> Take:
> ```
> static inline int f(int *a)
> {
>   return a[10];
> }
> 
> int g()
> {
>   return f(0);
> }
> ```
> The warning is there for the above case really (and similar ones with struct
> offsets). Where you originally have a null pointer and have an offset from
> there; by the time the warning happens, the IR does not know if it was
> originally from an offset of a null pointer or if the value was written in.
> The paramater is there to "tune" the heurstic to figure out if it is null
> pointer deference or if it is some real (HW) address. Maybe
> -fno-delete-null-pointer-checks should imply --param=min-pagesize=0, though
> some folks want the warning indepdent of trying to delete null pointer
> checks.

It is worth noting, I think, that although on a target like the AVR (and most
embedded systems without an MMU) the address 0 is a real part of memory, and
can really be read and/or written, any code that tries to dereference a 0
pointer is almost always wrong.  I don't want gcc to consider 0 as an
acceptable address on these targets - I want it to warn me if it sees a null
pointer dereference.  If I really want to target address 0, as I might
occasionally do, I'll use a pointer to volatile - /then/ I'd like gcc to
believe me without question.

I don't know if every embedded developer feels the same way.  (Georg-Johann
could chime in with his opinion.)

[Bug target/105523] Wrong warning array subscript [0] is outside array bounds

2023-04-25 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523

--- Comment #23 from David Brown  ---
(In reply to LIU Hao from comment #22)
> Yes, GCC should be told to shut up about dereferencing artificial address
> values.

One possibility is to have the warnings disabled whenever you are using a
volatile pointer.  After all, "volatile" is an indication that the compiler
doesn't know everything, and it has to trust the programmer.

[Bug target/105523] Wrong warning array subscript [0] is outside array bounds

2023-04-25 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523

--- Comment #21 from David Brown  ---
(In reply to Andrew Pinski from comment #1)
> --param=min-pagesize= should be set to 0 for avr as zero is a valid address.

Is there any convenient description of "min-pagesize" ?  The user manual is
unhelpful: "Minimum page size for warning purposes."  Your comment here
suggests it is connected to whether zero is a valid address or not, which does
not sound at all related to the issue here.  (But you are correct that for the
AVR, and many embedded systems, accessing memory at address zero is valid.)

Testing (on godbolt) shows that setting "--param=min-pagesize=0" does stop this
warning triggering.  But this is an issue that affects all systems for which
the user might want to convert a fixed value to an address - i.e., all embedded
systems, and perhaps device drivers or other low-level code on bigger systems. 
Is setting "min-pagesize" to 0 actually a fix, or is it a workaround, and will
it have other effects on static warning checks and/or code generation?

A workaround is, of course, helpful - especially since this can be added to
peoples CFLAGS list.

[Bug target/105523] Wrong warning array subscript [0] is outside array bounds

2023-04-25 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523

--- Comment #20 from David Brown  ---
(In reply to Georg-Johann Lay from comment #19)
> Created attachment 54912 [details]
> pr105532.diff: Proposed patch for the AVR backend
> 
> Here is a proposed, untested patch.
> 
> gcc/
>   PR target/105532
>   * config/avr/avr.cc (opts.h): Include it.
>   (avr_option_override): Set --param_min_pagesize=0.
> 
> gcc/testsuite/
>   PR target/105532
>   * gcc.target/avr/torture/pr105532.c: New test.

This is not an AVR backend issue - it is much wider than that.  It is perhaps
reasonable to test a patch just on the AVR, but this needs to be fixed in the
core of gcc.

[Bug target/105523] Wrong warning array subscript [0] is outside array bounds

2023-04-24 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523

David Brown  changed:

   What|Removed |Added

 CC||david at westcontrol dot com

--- Comment #18 from David Brown  ---
This issue does not appear to be related to any particular backend.

Changing the initial code to remove the volatile :

void m(void) {
 uint8_t * ptr2 = ( uint8_t*)(0x0030);
*ptr2 = 0xd8;
}

gives a warning in gcc 11 as well - "warning: writing 1 byte into a region of
size 0".  (gcc 12 and 13 give the "array subscript [0] is outside array bounds"
warning.)


It is standard practice in embedded development to cast an integer value (often
a compile-time constant, but not always) to a pointer and use that to access
memory-mapped registers or other data at fixed addresses.  Typically the
pointers are pointer-to-volatile.  Sometimes the data type in question is a
"uintN_t" type, sometimes it is a struct covering a range of registers.  There
are other ways to specify fixed addresses, such as using "extern" data that is
defined in a linker file or an assembly file - but I have not seen other
methods used much in recent decades.

The compiler knows nothing about the size of the region pointed to here.  It is
no more appropriate for it to guess the size is 0 than to guess it is 42.  It
should either assume the size is one single object of the named type, or an
array of unknown size of the named type.  (I'd prefer the first, but the second
is also a reasonable choice.)

As an embedded developer, I don't want to have to turn off useful warnings to
avoid false positives on common and essential constructs.

[Bug c/82922] Request: add -Wstrict-prototypes to -Wextra as K style is obsolescent

2022-11-08 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82922

--- Comment #9 from David Brown  ---
Could -Wstrict-prototypes be added to -Wall, or even considered enabling by
default?  The next C standard will make "void foo()" mean the same as "void
foo(void)", like in C++, which makes the scope for confusion high.

[Bug ipa/101279] Function attributes often block inlining

2022-06-28 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101279

--- Comment #7 from David Brown  ---
(In reply to rguent...@suse.de from comment #6)
> > Am 28.06.2022 um 14:53 schrieb david at westcontrol dot com 
> > :
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101279
> > 
> > --- Comment #5 from David Brown  ---
> > (In reply to Richard Biener from comment #4)
> >> (In reply to frankhb1989 from comment #3)
> >>> There is a more specific instance here: can_inline_edge_by_limits_p in
> >>> ipa-inline.cc treats flags and "optimize" attributes differently.
> >> 
> >> A bit up there's a blacklist we maintain where inlining is not OK because 
> >> it
> >> results in semantic differences.
> >> 
> >> Generally we it's hard to second-guess the users intention when looking
> >> at an inline edge with different optimization settings of caller and 
> >> callee.
> >> For C++ comdats there might be even multiple variants with different
> >> optimization level (but we only keep one, special-casing this a bit).
> > 
> > I appreciate the "err on the side of caution" attitude.  Perhaps there 
> > could be
> > an extra "I know what I'm doing" attribute that lets you override the
> > blacklisting in a particular case.  This would only really make sense in 
> > cases
> > where the attribute can be attached to the expressions and statements within
> > the function (I think "-fwrapv" would be in this category).  In cases where
> > this is not possible, an error or warning message would be in order as the
> > compiler can't do what the programmer is asking.
> 
> Can you provide a specific example that you would allow this way?
> 
> 

I'd go back to my original example :


  __attribute__((optimize("-fwrapv")))
  static inline int wrapped_add(int a, int b) {
  return a + b;
  }

  int foo(int x, int y, int z) {
  return wrapped_add(wrapped_add(x, y), z);
  }

If you want to disable inlining of "wrapped_add" due to the change in the
semantics of integer arithmetic, that's fair enough.  But if I /really/ want
inlining, and write something like :

  __attribute__((optimize("-fwrapv"), always_inline))
  static inline int wrapped_add(int a, int b) {
  return a + b;
  }

then I want the code treated as :

  return (__wrapping_int) a + (__wrapping_int) b;

or

  return __wrapping_add_int(a, b);

If the way gcc marks and handles "-fwrapv" arithmetic does not support
something like that, which would allow inline mixing with other code, then that
would result in a compiler warning or error message.


It might be best to have a new attribute name here rather than using
"always_inline" - I have not thought through the consequences.

It might also be that if the compiler can be changed to support inlining of a
given optimisation attribute, then the attribute in question can be whitelisted
for inlining for everyone.  I suppose what I am saying is that if the compiler
can't be sure that inlining is safe, then it could be cautious by default while
letting the programmer override that caution explicitly.

[Bug ipa/101279] Function attributes often block inlining

2022-06-28 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101279

--- Comment #5 from David Brown  ---
(In reply to Richard Biener from comment #4)
> (In reply to frankhb1989 from comment #3)
> > There is a more specific instance here: can_inline_edge_by_limits_p in
> > ipa-inline.cc treats flags and "optimize" attributes differently.
> 
> A bit up there's a blacklist we maintain where inlining is not OK because it
> results in semantic differences.
> 
> Generally we it's hard to second-guess the users intention when looking
> at an inline edge with different optimization settings of caller and callee.
> For C++ comdats there might be even multiple variants with different
> optimization level (but we only keep one, special-casing this a bit).

I appreciate the "err on the side of caution" attitude.  Perhaps there could be
an extra "I know what I'm doing" attribute that lets you override the
blacklisting in a particular case.  This would only really make sense in cases
where the attribute can be attached to the expressions and statements within
the function (I think "-fwrapv" would be in this category).  In cases where
this is not possible, an error or warning message would be in order as the
compiler can't do what the programmer is asking.

[Bug c/105343] New: Inefficient initialisation in some kinds of structs

2022-04-22 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105343

Bug ID: 105343
   Summary: Inefficient initialisation in some kinds of structs
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

struct S { int a[1000]; };
struct X { struct S s; int b[2];};

extern int foobar(struct X * p);

int foo(struct S *s)
{
struct X x = { *s };
return foobar();
}


When the size of the array "a" is small enough that the compiler does the
initialisation inline, the code is fine.  With a bigger array it uses memset
and memcpy, either as calls to external functions or inline loops depending on
details of the version of gcc and the target.  (This too is appropriate.)

However, it does that by turning the code into the equivalent of :

memset(, 0, sizeof(struct X));
memcpy(, s, sizeof(struct S));

It /should/ be doing :

memset(, 0, sizeof(struct X.b));
memcpy(, s, sizeof(struct S));

In other words, it is first zeroing out the entire X structure, then copying
from *s into the structure.  Only the extra part of X, the array "b", needs to
be zero'ed.

[Bug c/103660] New: Sub-optimal code with relational operators

2021-12-11 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103660

Bug ID: 103660
   Summary: Sub-optimal code with relational operators
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

I recently looked at some of gcc's "if-conversions" and other optimisations of
expressions involving relational operators - something people might use when
trying to write branchless code.  I know that it is often best to write in the
clearest way, including branches, and let the compiler handle the optimisation.
 But people do try to do this kind of thing by hand.

I tested 6 examples of ways to write a simple "min" function:


int min1(int a, int b) {
if (a < b) return a; else return b;
}

int min2(int a, int b) {
return (a < b) ? a : b;
}

int min3(int a, int b) {
return (a < b) * a | (a >= b) * b;
}

int min4(int a, int b) {
return (a < b) * a + (a >= b) * b;
}

int min5(int a, int b) {
const int c = a < b;
return c * a + (1 - c) * b;
}

int min6(int a, int b) {
const bool c = a < b;
return c * a + !c * b;
}


gcc happily optimises the first two versions.  For the next two, it uses
conditional moves for each half of the expression, then combines them with "or"
or "add".  For version 5, it generates two multiply instructions, and version 6
is even worse in trunk (gcc 12).  This last one is a regression - gcc 11
generates the same code for version 6 as for version 4 (not optimal, but not as
bad).

For comparison, clang 5+ generates optimal code for all versions.  I have tried
a number of different targets (godbolt is wonderful for this stuff), with
similar results.

[Bug c/102427] -Woverflow only works in constant expressions

2021-09-21 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102427

--- Comment #2 from David Brown  ---
Runtime detection is good - compile-time detection is much, much better when it
is possible.  (IMHO, of course.)

[Bug c/102427] New: -Woverflow only works in constant expressions

2021-09-21 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102427

Bug ID: 102427
   Summary: -Woverflow only works in constant expressions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

The "-Woverflow" warning only works with constant expressions.  Should it not
also work in cases where the compiler knows the value at compile time, even if
it is not a constant expression?

int foo(void) {
const int a = 534567891;
int b = a * 6;
return b;
} 

(With gcc -O2 -Wall -Wextra)

When "a" is declared "const", the compiler warns about the overflow.  If the
"const" is removed, the compiler still does the same calculations at compile
time, but does not give a warning even though it knows there was a signed
integer overflow.

Could "-Woverflow" be changed to warn here, when there is clearly undefined
behaviour?

[Bug c/101279] New: Function attributes often block inlining

2021-07-01 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101279

Bug ID: 101279
   Summary: Function attributes often block inlining
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

When using many function attributes, the attributes only apply when the
function is not inlined.  And in some cases, they actively disable inlining
which would normally be expected.

For example, a programmer may have occasional need for two's complement
wrapping semantics for signed integers.  They don't want to use "-fwrapv"
globally, because that would lose some optimisation opportunities for the rest
of the code.  So they define functions like "wrapped_add" :

  __attribute__((optimize("-fwrapv")))
  static inline int wrapped_add(int a, int b) {
  return a + b;
  }

  int foo(int x, int y, int z) {
  return wrapped_add(wrapped_add(x, y), z);
  }

When used in a function like "foo" which does not have "-fwrapv" active,
function calls are made rather than the expected inlining.  The result is
significantly poorer code than the programmer would predict.

If "foo" also has "-fwrapv" from a compiler flag, attribute or pragma, the
generated code is inlined nicely.


The same issues apply to a range of function attributes.  (Obviously some, such
as "section", make no sense to inline to functions that don't share the same
attribute.)  For some attributes, such as "access", inlining disables the
attribute rather than the attribute disabling inlining.

Steadily more code is inlined in modern programming - helped by LTO.  Often the
code works the same whether code is inlined or not, with only minor differences
in speed.  But in template-heavy C++ code, inlining is essential as you have
large numbers of functions that generate very little (if any) code - failure to
inline as expected can have severe efficiency effects.  Conversely, if inlining
disables checking attributes like "access" or "nonnull" the result is a false
sense of security in the programmer.


I think it is important for the manual to be updated to reflect these
limitations as a first step.

Then of course it would be best to remove the limitations!

Ref. mailing list conversation
<https://gcc.gnu.org/pipermail/gcc/2021-June/236592.html> and Martin Sebor's
blog article
<https://developers.redhat.com/articles/2021/06/25/use-source-level-annotations-help-gcc-detect-buffer-overflows>.

(Thanks to Martin for his article and discussion that made me aware of these
points.)

[Bug c/100890] New: The "div" standard library functions could be optimised as builtins

2021-06-03 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100890

Bug ID: 100890
   Summary: The "div" standard library functions could be
optimised as builtins
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

The C standard library has a family of functions for calculating the division
and remainder as a pair - "div", "ldiv", "lldiv", "imaxdiv".  These could be
optimised by the compiler using inline code, in the same manner as many other
library functions such as the "abs" family (listed under "Other builtins" in
the manual).

For example, "div" can be implemented as:

div_t div(int a, int b) {
div_t d;
d.quot = a / b;
d.rem = a % b;
return d;
}

Inlining this in place of a call to "div" results in code that is smaller and
faster on most targets, as well as providing far more optimisation
opportunities (constant propagation, re-arranging with other code, etc.).

[Bug c/100702] Strict overflow warning regression in gcc 8 onwards

2021-05-20 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100702

--- Comment #2 from David Brown  ---
Runtime diagnostics can be very useful - but they are a different kind of
warning.  In particular, they only show what errors have occurred during your
testing - they don't show what errors /might/ occur.

There is a general rule that the earlier you find your problems, the cheaper,
faster and easier they are to handle.  Compile-time checks are better than
run-time checks when all else is equal.  Clearly compile-time checks have more
risk of false-positives - that's why it's important that most are optional or
can be disabled by pragmas or particular code constructs.  But when they /can/
be used, they are preferable.

In this particular example, there is no doubt of the undefined behaviour and
the infinite loop.  The compiler knows about them.  It would not be a false
positive to issue a warning in such cases.


Another limitation of runtime diagnostics is their use in embedded systems. 
gcc is heavily used in microcontrollers, where you often do not have much in
the way of a console output, no file system, etc.  Runtime diagnostic
opportunities are far more limited in such cases.


I fully appreciate that compile-time warnings are difficult, especially
avoiding false positives, and if you say that a warning here would be too niche
or too difficult to justify the effort, I am happy to accept that.  But
run-time diagnostics are not a replacement - they are an additional and
complementary tool.  Please do not consider sanitizers as a substitute for
static analysis and compile-time error checking and warnings.

[Bug c/100702] New: Strict overflow warning regression in gcc 8 onwards

2021-05-20 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100702

Bug ID: 100702
   Summary: Strict overflow warning regression in gcc 8 onwards
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Incorrect code like this example relies on two's complement wrapping to do what
the programmer wanted:

void foo(int *a)
{
  int i;
  for (i=0; i

[Bug target/98884] Implement empty struct optimisations on ARM

2021-02-01 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

--- Comment #6 from David Brown  ---
(In reply to Jakub Jelinek from comment #3)
> If GCC and Clang are ABI incompatible on this, then one of the two compilers
> is buggy.  So, it is needed to look at the EABI and find out which case it
> is.

I've had a look at the ARM C++ ABI, to the best of my abilities:



Section 4.1 has this to say:

GC++ABI §2.27POD Data Types

The GC++ABI defines the way in which empty class types are laid out.  For the
purposes of parameter passing in [AAPCS], a parameter whose type is an empty
class shall be treated as if its type were an aggregate with a single member of
type unsigned byte.

Note: Of course, the single member has undefined content.



(This references )


If my reading is correct, then gcc is correct and clang is wrong here - empty
classes are treated as containing a single unsigned byte, and then expanded to
a 32-bit type before passing.  (There is still no need to put a zero in these
parameters, as the value is unspecified.)

It may be that the x86 gcc port is wrong here, but I haven't looked at the
details of x86 calling conventions.


I hope someone can check this out, and a perhaps file a bug report for clang so
that they can correct it.  (Alternatively, file a bug report with ARM so that
they can change the ABI!)


However, in this particular case, if clang is wrong then I don't want to be
right.  I can see no benefit, and significant cost, in passing zeros for these
empty tag structs.  I'd be quite happy with an explicitly non-conforming switch
to enable such optimisations (just like "-fshort-enums" or other switches that
mess with caller and callee registers).  Or I'd be even happier to find that
clang is wrong and gcc ARM gets optimised without a flag :-)

[Bug target/98884] Implement empty struct optimisations on ARM

2021-02-01 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

--- Comment #5 from David Brown  ---
(In reply to Jakub Jelinek from comment #4)
> Note, for ABI compatibility or incompatibility it might be better to check
> what happens when some argument is passed after the empty structs.  Because
> at least in some ABIs one could get away with just pretending the stack
> slots (or registers) are there even when they aren't actually allocated on
> the stack, but one would need to have guarantees the callee e.g. will never
> modify those stack slots (in most ABIs the call argument stack slot are
> owned by the callee).

Good point.  I tried with:

void needs_tags(int x, Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5, int y);

and

needs_tags(12345, t1, t2, t3, t4, t5, 200);

gcc (trunk - gcc 11) on x86 puts 12345 in edi and 200 in esi, just as if the
empty tags didn't exist.

So does clang for the ARM (putting them in r0 and r1 respectively).

gcc 9 for the ARM generates code as though t1 .. t5 were type "int" and value
0.

[Bug target/98884] Implement empty struct optimisations on ARM

2021-01-29 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

--- Comment #2 from David Brown  ---
Yes, ABI issues were my initial thought too.  If so, then optimising away the
assignments while leaving the stack manipulation (and possibly register
allocations) in place would still be a significant improvement.

However, I note that clang has no problem with generating ideal code here for
the ARM - it is not bothered by the ABI.  There could be several reasons for
that.  Perhaps the clang folk got the ABI wrong and the optimisation is not
valid for the ARM EABI.  Maybe the EABI used on the "arm (none)" target doesn't
specify these details, meaning the optimisation is valid there even if it is
not valid on "arm (linux)" targets.  I don't know the details of the ABIs well
enough to answer.

If it is an ABI issue, then I'd be quite happy with an ARM-specific flag to
enable an variation on the ABI that lets the compiler skip empty types
entirely.  When compiling for the Cortex-M devices, you rarely link much to
pre-compiled code (other than the C library) and it's usually fine to break
standards a bit to get more optimal code (like using "-ffast-math").

It would of course be best to have a general solution that works for all ARM
users (and ideally other targets too) without needing a flag.

[Bug c++/98884] New: Implement empty struct optimisations on ARM

2021-01-29 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

Bug ID: 98884
   Summary: Implement empty struct optimisations on ARM
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Empty "tag" structs (or classes) are useful for strong typing, function
options, and so on.  The rules of C++ require these to have a non-zero size (so
that addresses of different instances are valid and distinct), but they contain
no significant data.  Ideally, therefore, the compiler will not generate code
that sets values or copies values when passing around such types. 
Unfortunately, that is not quite the case.

Consider these two examples, with foo1 creating a tag type, and foo2 passing it
on:

struct Tag {
friend Tag make_tag();
private:
Tag() {}
};

Tag make_tag() { 
return Tag{}; 
};

void needs_tag(Tag);

void foo1(void) {
Tag t = make_tag();
needs_tag(t);
}


struct Tag1 {};
struct Tag2 {};
struct Tag3 {};
struct Tag4 {};
struct Tag5 {};

void needs_tags(int x, Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5);

void foo2(Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5)
{
needs_tags(12345, t1, t2, t3, t4, t5);
}


(Here is a godbolt link for convenience: <https://godbolt.org/z/o5K78h>)

On x86, since gcc 8, this has been quite efficient (this is all with -O2):

make_tag():
xor eax, eax
ret
foo1():
jmp needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
mov edi, 12345
jmp needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)

The contents of the tag instances are basically ignored.  The exception is on
"make_tag", where the return is given the value 0 unnecessarily.

But on ARM it is a different matter.  This is for the Cortex-M4:


make_tag():
mov r0, #0
bx  lr
foo1():
mov r0, #0
b   needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
push{lr}
sub sp, sp, #12
mov r2, #0
mov r3, r2
strbr2, [sp, #4]
strbr2, [sp]
mov r1, r2
movwr0, #12345
bl  needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)
add sp, sp, #12
ldr pc, [sp], #4

The needless register and stack allocations, initialisations and copying mean
that this technique has a significant overhead for something that should really
"disappear in the compilation".

The x86 port manages this well.  Is it possible to get such optimisations into
the ARM port too?


Oh, and for comparison, clang with the same options (-std=c++17 -Wall -Wextra
-O2 -mcpu=cortex-m4) gives:

make_tag():
bx  lr
foo1():
b   needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
movwr0, #12345
b   needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)

[Bug c/98313] New: Allow attribute((cleanup)) in types

2020-12-16 Thread david at westcontrol dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98313

Bug ID: 98313
   Summary: Allow __attribute__((cleanup)) in types
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

The gcc __attribute__((cleanup)) feature can be useful for making the
equivalent of a C++ destructor in C code, as a way of ensuring resources are
freed even if you exit a function early with return.  Typical use-cases are
malloc'ed memory and locks.

In many situations, attaching the cleanup attribute to a type would give neater
code with fewer chances of error.  For example:

void indirect_free(void ** p)  {
free(*p);
*p = 0;
}

extern void do_something(int * p);

void foo(void) {
void * p = malloc(16);
do_something(p);
free(p);
}

void foo2(void) {
__attribute__((cleanup(indirect_free)))
void * p = malloc(16);
do_something(p);
}

Here, the cleanup attribute in "foo2" means you don't have to remember to free
p after calling do_something() - and if the function has multiple allocations,
this results in neater code with lower risk of bugs than a "traditional"
goto-based error handling.  But you still have to remember to put the cleanup
attribute on each relevant pointer.

If instead you had:

typedef __attribute__((cleanup(indirect_free))) void* auto_voidp;

auto_voidp auto_malloc(size_t size) {
return malloc(size);
}


then your code could be:

void foo3(void) {
void * p = auto_malloc(16);
do_something(p);
}


It would not remove all possibilities of error - copying an auto_voidp here
could easily lead to double-free errors.  (But this is C, not C++ - we need to
leave /some/ fun for the people debugging the code...)


Basically, all I'd like here is that if you attach a cleanup attribute to a
type (struct, typedef, etc.), then the attribute is automatically applied to
any definitions of variables of that type.

[Bug c/96284] Outdated C features should be made errors with newer standards

2020-07-28 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96284

--- Comment #6 from David Brown  ---
(In reply to Eric Gallager from comment #5)
> (In reply to David Brown from comment #0)

> > Could this be made an error by default
> > (-Werror=implicit-function-declarations) ?  Let those who want to compile
> > old code with implicit function declarations, do so with an explicit flag.
> 
> I think Florian Weimer tried this and it broke the majority of configure
> scripts in existence...

Fixing existing code and build systems is always hard - backwards compatibility
is C's biggest strength and its biggest weakness.

I'm not bothered about my own code - I have makefiles with the relevant options
set in case I make mistakes.  My hope is for gcc to be able to have stricter
warnings to reduce bugs in code written now and in the future.  But I
understand that it's also important not to cause trouble for existing code and
build systems.

[Bug c/61579] -Wwrite-strings does not behave as a warning option

2020-07-22 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61579

David Brown  changed:

   What|Removed |Added

 CC||david at westcontrol dot com

--- Comment #7 from David Brown  ---
Could "-Wwrite-strings" be split into two options?  The warning could remain
(and become part of -Wall for C as well as C++) if the compiler can spot and
warn about attempts to write to string literals, while keeping these of type
"char[len]" as required by C.

A new option "-fconst-strings" could be put under "Code Gen Options" which
makes C string literals be type "const char[len]" for those that want it,
encouraging a slightly safer code style that is not standard C.

[Bug c/96284] New: Outdated C features should be made errors with newer standards

2020-07-22 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96284

Bug ID: 96284
   Summary: Outdated C features should be made errors with newer
standards
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

While C has tried to remain backwards compatible with each new standards
revision, some changes have been made so that particularly unsafe features from
old code are no longer supported.  gcc has (reasonably enough) tried to keep
support for old features, but when something has been deprecated for decades,
perhaps it is time for it to be treated as an error by default and require an
explicit flag.  (This is in the same style as bug 85678 making "-fno-common"
the default.)

For example, implicit function declarations from K C were made obsolescent in
C90, and removed from the language in C99.  But by default, they still only
cause a warning (-Wimplicit-function-declaration) in gcc, no matter what
standard is picked.

Could this be made an error by default (-Werror=implicit-function-declarations)
?  Let those who want to compile old code with implicit function declarations,
do so with an explicit flag.

[Bug c/85678] -fno-common should be default

2020-07-22 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85678

David Brown  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #15 from David Brown  ---
This has been implemented in gcc 10, and -fno-common is now the default.  This
"bug" can presumably now be closed.

Many thanks to the gcc developers here.

[Bug c++/92811] New: Odd optimisation differences with lambdas

2019-12-04 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92811

Bug ID: 92811
   Summary: Odd optimisation differences with lambdas
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

While playing with lambdas for higher order functions (in particular, functions
that manipulate functions and return new ones), I saw some differences in the
optimisations depending on seemingly minor changes to how functions are
defined.  This is my sample code:

template 
auto twice(F f)
{
return [f](int v) { return f(f(v)); };
};

int g1(int i)
{
return i + 3;
};

auto g2 = [](int i) { return i + 3; };

int test(int i) {
auto tw = twice(g1);
return tw(i);
}

int test2(int i) {
auto tw = twice(g1);
auto tw2 = twice(tw);
return tw2(i);
}

int test3(int i) {
auto tw = twice(g2);
return tw(i);
}

int test4(int i) {
auto tw = twice(g2);
auto tw2 = twice(tw);
return tw2(i);
}


The generated assembly (x86-64) with -O2 or -O3 is:

g1(int):
leal3(%rdi), %eax
ret
test(int):
leal6(%rdi), %eax
ret
test2(int):
addl$6, %edi
callg1(int)
movl%eax, %edi
jmp g1(int)
test3(int):
leal6(%rdi), %eax
ret
test4(int):
leal12(%rdi), %eax
ret

With -O1, test3 and test4 (where the original function is a lambda) were still
fully optimised, while test1 and test2 had 2 and 4 calls to g1.

(For comparison, clang requires -O2 to get good code, but fully optimises all 4
test functions.)

[Bug c++/92727] Idea for better error messages

2019-11-29 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92727

--- Comment #8 from David Brown  ---
I don't think there is anything more I can add at the moment.  Thank you for
your efforts.

[Bug c++/92727] Idea for better error messages

2019-11-29 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92727

--- Comment #6 from David Brown  ---
I can see it is a challenge to get enough detail in the messages to be good for
the more advanced users, and still simple enough and clear enough for more
casual or inexperienced users.

The static assertion looks good to me, and is clearly the easiest way to
implement such a feature.  But I appreciate that it would not be ideal for some
people, and would hide useful messages from them.

Perhaps the a better place to focus would be on IDE's, maybe especially with
parsing of the new JSON format, to help people see exactly where the problem
lies in their code.

[Bug c++/92727] Idea for better error messages

2019-11-29 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92727

--- Comment #3 from David Brown  ---
I may not have been very clear here.  Let me try and take a step back here.

>From the user's viewpoint, the problem is that they have made a class that
can't be copied, and they have written code that tries to copy it.  With the
current error messages, point comes across - but it is hidden amongst a lot of
messages about "allocators" that will be unfamiliar to many users, and
unhelpful to most users.

I fully appreciate the problem occurs when the allocator that tries to use the
copy constructor.  But the user has not specified an allocator, or picked an
allocator, or mentioned allocators in their code - and perhaps does not even
know what they are.  Most users are happy to accept "it goes in memory
somewhere", or "it is on the heap", without worrying about the details.  What I
am hoping for here is some way to help the user get the part of the error
information that is important to them, and will help them fix their code,
without unnecessary detail.

What I meant about default template parameters is that - as you say - some
people /do/ want all the details of the messages about the allocator.  These
are mostly in two groups - people using non-default allocators, and people who
want to deal with the details of the standard library.  The first group can be
identified by their use of explicit allocators in the template instantiation
rather than using the defaults, and the second could perhaps use an additional
compiler switch (a little like the current "-Wsystem-headers" switch).

So yes, I /am/ suggesting that the error location should be given as the
"v.push_back(x);" line.  And yes, I am aware that it would not be ideal for
experts - but I think it would be clearer to a majority of users.  And that
would be novel!  Part of the job of a compiler is to help people write correct
code, and clear error and warning messages are essential for that.  (Let me
also say that gcc does a fine job here, and has improved enormously over the
years that I have used it - though it could still be better.)

Perhaps this could be better handled using concepts?  Maybe "push_back" could
be declared to take a parameter with a concept requiring either a type that
could be copied into the vector or a type that is a compatible rvalue
reference?  Or perhaps concepts and "if constexpr" in C++17 would make it
practical to make the "push_back(const T&)" overload unavailable if T is not
copyable?  That would get you straight to a "no matching function call for
std::vector::push_back(X&)" message on the offending line.

[Bug c++/92727] New: Idea for better error messages

2019-11-29 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92727

Bug ID: 92727
   Summary: Idea for better error messages
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Consider this code:

#include 
#include 

class Y {int i;};

class X {
std::unique_ptr sp;
int i;
};

int main() {
std::vector v;
X x;
v.push_back(x);// Line A
// v.push_back(std::move(x));  // Line B
}


It is a simple mistake - class X has a unique_ptr and can't be copied, but the
push_back tries to copy it.

The errors generated, however, contain a lot of unhelpful information about
allocators for the vector.  They hide the useful part of the messages, which
is:

note: 'X::X(const X&)' is implicitly deleted because the default definition
would be ill-formed

It is an extra note, rather than the error message, but is key for the user to
identify the problem.

Would it be possible to hide the messages concerning the allocator here?  In
general, would it be possible to hide the messages that deal with template
parameters that are unchanged from the default?  That would let the user see
messages that are relevant to the code /they/ have written, rather than
implementation details in the libraries and headers.

Even better, in this case, would be a hint that the user needs a std::move, as
shown in line B.


(Yes, I know I may be asking for the impossible here, or at least for the very
difficult.  But it is the kind of thing that makes C++ development harder than
it should be.)

[Bug c++/92659] Suggestions for bitshift

2019-11-26 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92659

--- Comment #6 from David Brown  ---
(In reply to Xi Ruoyao from comment #3)
> (In reply to Jonny Grant from comment #2)
> > (In reply to Xi Ruoyao from comment #1)
> > > Is it appropriate?
> > > 
> > > Though on both 32-bit and 64-bit x86 "1ul" is good for a size_t, but I
> > > believe there is some platform where "1ull" is necessary.
> > > 
> > > Maybe I'm wrong.  But if I'm correct, suggesting "1ul" is encouraging bad
> > > code.  I'll use "(size_t) 1 << 32" for this.
> > 
> > UL means Unsigned Long, so if that type is also 64bit like size_t, then it
> > is fine.
> 
> That is true on *your platform*.
> 
> I can't find any specification in C standard saying "the bitwidth of long
> should >= the bitwidth of size_t".  So at least theoretically it may be
> insufficient.
> 
> Writing unportable thing is OK (if you don't care about other platforms) but
> *suggesting* unportable thing is bad.

There is no requirement for "size_t" to be of any particular size in relation
to the other integer types.  On 64-bit Windows, for example, size_t is 64 bits
but unsigned long is 32 bits.

So there is no portable integer constant suffix that would suit for "size_t".

However, the "size_t" here is a red herring - the problem is that the
expression "1 << 32" overflows, but it would not overflow if the "1" was of a
64 bit type.  The most portable choice here would be "1ull" or "1ll" (the
compiler has no way to guess which you want).  But that would not work with
pre-C99 standards.

All in all, the whole idea sounds counter-productive to me.  If you need
spoon-feeding about the details of C here, you would be better off reading a
book on the language than using trial and error and guessing from compiler
messages.  And if you don't need spoon-feeding but just made a little mistake
in your code (as we all do on occasion), then the current warning is fine.

[Bug c++/92659] Suggestions for bitshift

2019-11-26 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92659

David Brown  changed:

   What|Removed |Added

 CC||david at westcontrol dot com

--- Comment #4 from David Brown  ---
(In reply to Jonny Grant from comment #2)
> (In reply to Xi Ruoyao from comment #1)
> > Is it appropriate?
> > 
> > Though on both 32-bit and 64-bit x86 "1ul" is good for a size_t, but I
> > believe there is some platform where "1ull" is necessary.
> > 
> > Maybe I'm wrong.  But if I'm correct, suggesting "1ul" is encouraging bad
> > code.  I'll use "(size_t) 1 << 32" for this.
> 
> UL means Unsigned Long, so if that type is also 64bit like size_t, then it
> is fine.
> 
> 
> I would rather use the real type, if the compiler is too stupid to start
> with a type big enough...  the same code with 5147483647 works fine, because
> the compiler starts with the number as a 'long int' which is already 64bit,
> no suffix required.
> 

I recommend you learn the details of how C works before declaring the compiler
"stupid".  This sort of thing is not up to the compiler.  When you write "1 <<
32", the "1" is of type "int".  The compiler is not allowed to choose a
different type - the best it can do is give you a warning.  And the compiler
already /does/ give a warning - a perfectly good warning.  It is not the
compiler's job to teach you how to program in C.

[Bug c/85678] -fno-common should be default

2019-11-25 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85678

--- Comment #11 from David Brown  ---
Reliance on -fcommon has not been correct or compatible with any C standard (I
don't think it was even right for K C).  If the code is written correctly
(with at most one definition of all externally linked symbols) then -fcommon
does not make it incorrect or conflict with the standards.  But if your code
requires -fcommon to compile correctly, it does not conform to C89 or anything
newer.

Personally, I'd like to see many old and dangerous C practices give errors by
default.  Along with common symbols, I'd include non-prototype function
declarations and definitions, and calling functions without declaring them. 
These are the kind of thing you'd expect to see in pre-ANSI C - other than
that, they  are probably errors in the code.  It is good that gcc still
supports such code, but I think making them errors by default will reduce bugs
in current code.  And surely that is worth the cost of adding a "-fold-code"
flag to ancient build recipes?  (The "-fold-code" flag could probably also
imply "-fno-strict-aliasing -fwrapv" to deal with other assumptions sometimes
made in older code.)

There is precedence for this.  The default standard for gcc changed from
"gnu89" to "gnu11".  While most "gnu89" code will compile with the same
semantics in "gnu11" mode, there are a fair number of incompatibilities. 
Changing the default to "-fno-common" (and ideally "-Werror=strict-prototypes
-Werror=old-style-declaration -Werror=missing-parameter-type") would have a lot
smaller impact than changing the default standard.

[Bug c/92507] False positives with -Wsign-compare

2019-11-14 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92507

--- Comment #2 from David Brown  ---
I am aware that "n" here is not a constant integer expression - it should not
need to be for a compiler warning.  What the compiler needs in order to
eliminate the false positive is a knowledge of the ranges of "i" and "n" here -
that they are both within the common overlap between "int" and "unsigned int".

I don't know if the compiler has such range information available at the time
the -Wsign-compare warning is checked.  If it does not, then I can see this
issue being difficult to fix.

[Bug c/92507] New: False positives with -Wsign-compare

2019-11-14 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92507

Bug ID: 92507
   Summary: False positives with -Wsign-compare
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

The -Wsign-compare warning is useful for spotting potentially dangerous code. 
It is not always easy to know how the operands of a comparison will be
promoted, and whether the comparison will have the same meaning in the C
language as it has mathematically.  But if the values on both sides are
non-negative, then there is never going to be a problem.

This pattern here occurs in a lot of code, where you have an "int" loop
variable and an unsigned type for the limit - perhaps coming from a sizeof
expression:

void foo(void) {
const unsigned int n = 10;
for (int i = 0; i < n; i++) { }
} 


Typically, the compiler knows the value of the limit at compile time. 
Certainly it knows it is non-negative (it is an unsigned type).  And the
compiler can see that the loop variable is always non-negative (as long as
-fwrapv is not in effect).

This kind of code is a big cause of false positives for -Wsign-compare, and
means you either need to use more unsigned types than might be natural, or add
casts, or disable the warning.

Another possible way to handle this is for the compiler to treat "n" in the
example above as an "int" for the comparison.  The compiler knows the value of
"n", and can see that this transformation would be safe.  (And if "n" is too
big to fit in an "int", then we really do want a warning!)

[Bug c/60523] Warning flag for octal literals [-Woctal-literals]

2019-10-29 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60523

--- Comment #10 from David Brown  ---
(In reply to Eric Gallager from comment #9)
> (In reply to Eric Gallager from comment #8)
> > *** Bug 70952 has been marked as a duplicate of this bug. ***
> 
> While this was a mistake, it still might be worth grouping the flag proposed
> in that bug, -Woctal-escapes, and the flag proposed in this bug,
> -Woctal-literals, under an umbrella flag called just -Woctal

That makes a lot of sense.  I expect users who want one of these warnings would
want both, so combining them would save effort for users and gcc developers.

[Bug c/90404] No warning on attempts to modify a const object

2019-05-09 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90404

--- Comment #3 from David Brown  ---
Yes, false positives are always a risk with warnings.  We already have a
warning here that would catch pretty much any case, but with a big risk of
false positives - "-Wcast-qual".  My hope is for a warning with low risk of
false positive, just catching any "you are definitely doing something wrong"
cases, which the optimiser has already spotted here.

I do understand that sometimes optimiser passes and warning passes are done at
different times, so it is not always a matter of "the optimiser sees this, so
it should be easy to get a warning".

And there is always the caveat that even though a function has clear and
unavoidable undefined behaviour (and thus a warning seems appropriate), the
compiler might not know that the function will actually be called.  (It is fine
to have "x / 0" in code that is never run.)  But I feel that with -Wall, and
certainly with -Wextra, the user is asking the compiler to be generous with
warnings, and it should warn on functions that are defined, even if they may
not be called.

[Bug c/90404] New: No warning on attempts to modify a const

2019-05-09 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90404

Bug ID: 90404
   Summary: No warning on attempts to modify a const
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

The compiler knows that you can't change an object declared "const", even via a
pointer cast.  And it (rightfully and helpfully) uses that knowledge for
optimisation.  But it should also be able to warn about it.

const int c = 20;

int foo(void) {
*((int*) ) = 10;

return c;
}


Current gcc, with -O2, compiles this to:

foo:
mov DWORD PTR c[rip], 10
mov eax, 20
ret


The write to c is unnecessary, but could trigger a helpful fault message on
many target systems (and who cares about the efficiency of incorrect code?). 
And the compiler correctly optimises the return value to 20 as it knows the
write to "c" cannot succeed.

However, there is no warning or error message here - even with -Wall -Wextra.

With -Wcast-qual, you get a warning - but that will have false positives in
cases like :

int v = 20;

int foo2(void) {
const int* p = 
*((int*) p) = 10;

return v;
}

foo2:
mov DWORD PTR v[rip], 10
mov eax, 10
ret


The optimiser can see the difference here - I would like warnings to show the
difference too.

[Bug c/89035] Request - builtins for unspecified and arbitrary values

2019-01-24 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89035

--- Comment #2 from David Brown  ---
Yes, "int x = x;" does give an unspecified value without a warning.  But to me,
this looks much more like a workaround - while "int x =
__builtin_unspecified();" is clear in its intentions.

[Bug c/89035] New: Request - builtins for unspecified and arbitrary values

2019-01-24 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89035

Bug ID: 89035
   Summary: Request - builtins for unspecified and arbitrary
values
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Occasionally it is useful to have an unspecified or arbitrary value in C (and
C++) programming.  For example, you might want to have a function that
calculates a particular value for valid inputs, and returns an unspecified
value on invalid inputs.

At the moment, gcc has __builtin_unreachable() to indicate intentional manually
created undefined behaviour - it is useful for getting optimal code, and also
for documenting code.  And it provides a reliable way of saying "this can't
happen", unlike other undefined behaviour (such as writing "1/0;").

Unspecified and arbitrary values are different from undefined values - the
compiler is free to pick them in any way it likes, but the values are valid and
non-trapping, and operations using them are defined behaviour, though with
possibly unspecified or arbitrary results.

An unspecified value can be created by making a local variable and leaving it
uninitialised - but that will usually be flagged by a warning from gcc, and it
is not clear in the code.  An arbitrary value can be generated by inline
assembly, such as " int x;  asm("" : "=r" (x)); " which generates no code. 
However, that does not give the compiler freedom to pick a different arbitrary
value of its liking.

I would like to suggest there be builtin functions __builtin_unspecified() and
__builtin_arbitrary().  They would both return an "int" - I don't think there
is need for any other types.

The difference between this can be seen with code like this:

int x = __builtin_arbitrary();
int y = x - x;// y is guaranteed to be 0.

int x = __builtin_unspecified();
int y = x - x;// y is unspecified.

An initial implementation could be very simple.

__builtin_unspecified() can use the existing unspecified value logic in the
compiler, or could be viewed as __builtin_arbitrary().

__builtin_arbitrary() can work like the inline assembly shown above, or just
return 0.

Of course, a better implementation would allow more optimisations - the
compiler could pick different values that result in more efficient code later.

[Bug c/88391] New: Request - attribute for adding tags to functions

2018-12-06 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88391

Bug ID: 88391
   Summary: Request - attribute for adding tags to functions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Sometimes it would be very useful to have "tags" on functions, which can be
used to restrict the kinds of functions that can call them or be called by
them.  I can think of two use-cases from the embedded world, and one more
general case:

1. A function that can only be called when interrupts are disabled, has special
powers (such as direct access to locked or atomic data, at least on single-core
systems) and special responsibilities (it should be fast, should not block,
etc.).  It should only be called by a function that disables interrupts.

2. A function that needs to run from ram should not call other functions that
are not also in ram.  This might be for programming code flash, or perhaps to
minimise speed variation in a real-time system with slow external flash.

3. A function that should only be called when you have a certain lock or
resource.

There are many other uses possible uses.  I think that a few general attributes
could allow this to be handled in the C (and C++) front-end, with the compiler
issuing warnings or errors when the "tag" rules are broken.  It would not
affect code generation, name mangling, etc. - it's just a mechanism to help
spot incorrect code.  I am not suggesting that we have attributes that actually
disable interrupts, or place code in ram (though these might be nice!) - all
embedded targets already have mechanisms for that, such as "section"
attributes, and we don't need to duplicate functionality that can be handled by
a macro or two.


My suggestions for function attributes are:

1. __attribute__((tagged("tagname")))

Mark a function as being tagged with "tagname" (or multiple tag names).

2. __attribute__((caller_tag("tagname")))

Mark a function as only being callable by a function with this tag.

3. __attribute__((callee_tag("tagname")))

Mark a function as only being allowed to call functions with this tag.

4. __attribute__((caller_notag("tagname")))

Mark a function as not being callable by a function with this tag.

5. __attribute__((callee_notag("tagname")))

Mark a function as not being allowed to call functions with this tag.


There would also need to be a statement attribute:

__attribute__((tag("tagname")))

Mark the statement as having this tag (and therefore being allowed to call
matching "caller_tag" functions, and not allowed to call matching
"caller_notag" functions).



Example 1
=

__attribute__((caller_tag("irq_disabled"), tagged("irq_disabled")))
void criticalFunction(void)
{
   // Some code that should never be interrupted
}

void normalFunction(void) {
   // Normal code
   disableInterrupts();
   __attribute__((tag("irq_disabled"))) criticalFunction();
   enableInterrupts();
}


Example 2
=

__attribute__((tagged("in_ram")), callee_tag("in_ram")))
__attribute__((section("ramcode")))
void flashCommand(uint32_t cmd)
{
   // Write command to flash programming peripheral
}

__attribute__((tagged("in_ram")), callee_tag("in_ram")))
__attribute__((section("ramcode")))
void flashErase(void)
{
   flashCommand(0x12345678);
}

void updateSoftware(void) {
downloadNewImage();
flashErase();
flashProgram();
}


Example 3
=

__attribute__((caller_notag("got_lock")))
void getLock(void)
{
mtx_lock();
}

__attribute__((caller_tag("got_lock")))
void releaseLock(void)
{
mtx_unlock();
}

__attribute__((caller_tag("got_lock"), tagged("got_lock")))
void accessData(void)
{
// Use critical data
}

void normalCode(void)
{
// Collect data
getLock();
__attribute__((tag("got_lock")) {
accessData();
releaseLock();  // Still tag "got_lock"
}
}



One big question is the compiler target library.  In small embedded systems,
apparently simple operations like "x = y / z;" may result in a library call if
the processor does not have hardware instructions for the operation.  It is
important in "callee_tag" functions that any such library calls also issue a
warning.

[Bug c/87248] New: Bad code for masked operations involving signed ints

2018-09-07 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87248

Bug ID: 87248
   Summary: Bad code for masked operations involving signed ints
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

I am always wary of saying there might be a compiler bug - usually it is a bug
in the user code.  But this time I am very suspicious.  The example here comes
from a discussion in the comp.lang.c Usenet group.

Here is the code I have been testing:


unsigned char foo_u(unsigned int v) {
return (v & 0x7f) | (v & ~0x7f ? 0x80 : 0);
}

unsigned char foo_s(int v) {
return (v & 0x7f) | (v & ~0x7f ? 0x80 : 0);
}

unsigned char bar_s(int v) {
int x = v & ~0x7f;
return (v & 0x7f) | (x ? 0x80 : 0);
}


int test_u_1(void) {
return foo_u(0x01);// Expect 0x01 = 1
}

int test_u_2(void) {
return foo_u(0x1001);// Expect 0x81 = 129
}

int test_s_1(void) {
return foo_s(0x01);// Expect 0x01 = 1
}

int test_s_2(void) {
return foo_s(0x1001);// Expect 0x81 = 129
}


The assembly code generated for this is:

foo_u(unsigned int):
mov eax, edi
mov edx, edi
or  edx, -128
and eax, 127
and edi, -128
cmovne  eax, edx
ret
foo_s(int):
mov eax, edi
ret
bar_s(int):
mov eax, edi
mov edx, edi
or  edx, -128
and eax, 127
and edi, 127
cmovne  eax, edx
ret
test_u_1():
mov eax, 1
ret
test_u_2():
mov eax, 129
ret
test_s_1():
mov eax, 1
ret
test_s_2():
mov eax, 1
ret



When the code uses "int" rather than "unsigned int", in "foo_s", the compiler
thinks the function can be optimised to a simple byte extraction.  I cannot see
how that interpretation is valid.  And gcc cannot see it either, if there is an
intermediate variable (in "bar_s").  (The type of the intermediate "x" in
"bar_s" does not matter - "int", "unsigned int", "auto", "__auto_type", const
or not).

This effect is happening in the front-end.  It is independent of optimisation
levels (-O gives the same results), the target processor (I tried a half-dozen
targets), the compiler version (from gcc 4.4 upwards - gcc 4.1 gives the same
code for foo_s as foo_u.  I haven't tried gcc 4.2 or 4.3).  There are minor
variations in the generated code (such as the use of conditional move
instructions) between different versions - those don't affect the results.  The
"-fwrapv" flag has no effect either, nor does the choice of C or C++.

The results from the compiler evaluations in the "test_" functions shows that
it this happens in the compiler analysis - it is not a code generation issue.

(<https://godbolt.org> is great for this kind of testing!)

(This bug report was first sent to the mailing list
<https://gcc.gnu.org/ml/gcc/2018-09/msg00028.html> - the report here has a few
minor corrections.)

[Bug c/85709] New: Consistent variable, type and function attributes across ports

2018-05-09 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85709

Bug ID: 85709
   Summary: Consistent variable, type and function attributes
across ports
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

gcc supports many different targets.  For some of these, there can be special
attributes (variable, type, function) for particular features.  This is
important to programming low-level and bare-metal code.  Often these are for
particular features of the particular architecture, and that's fine.

However, they can also be more general attributes that might be of interest or
use across a range of ports.  There are also cases where different ports have
attributes for the same feature, but are inconsistent about naming or other
details.

Would it be possible to tidy this up a little, giving more features and greater
consistency for people who work with multiple gcc targets?

Examples of attributes that could be generalised and used on many ports :

"naked" function attribute, for SPU, RISC-V, MSP430, AVR, and a few others.

"critical" function attribute for MSP430, which disables global interrupts
during the function.  (Clearly this is not appropriate for all targets - but it
would be useful on many more devices than just the MSP430.)

"noinit" variable attribute for MSP430 that puts the variable in a different
section (not .bss) which is not cleared at startup.  This can improve startup
times when you have large static arrays, and lets you keep data across resets. 
It would be useful on many microcontrollers and small systems.  


Examples of inconsistent attributes :

"interrupt" function attributes handled in different ways in different ports,
with different naming and details.

Function attributes to explicitly save all volatile registers (useful for
functions called from assembly or interrupt handlers), or to say that no
registers need to be saved (useful for RTOS thread task functions) - some ports
have these, with different names, but they are inconsistent and could be useful
on many targets.

[Bug c/85678] New: -fno-common should be default

2018-05-07 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85678

Bug ID: 85678
   Summary: -fno-common should be default
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

The use of a "common" section to match up tentative object definitions in
different translation units in a program is contrary to the C standards
(section 6.9p5 in C99 and C11, section 6.7 in C90), gives a high risk of error,
may lead to poorer quality code, and provides no benefits except backwards
compatibility with ancient code.

Unix C compilers have traditionally allowed "int x;" in file a.c to be
allocated by the linker to the same storage as "int x;" in file b.c when these
two C files are linked together.  gcc supports this if "-fcommon" is enabled,
which is the default on most targets.  But it is contrary to the C (and C++)
standards that say there shall be exactly /one/ definition of each object.  And
it is very easy to have errors in code by accidentally having duplicate names
in different files (if the programmer has not used "static"), perhaps even of
different types.  With "-fno-common", such errors are caught at link time.

Also, data in common sections will hinder some types of optimisation, such as
"-fsection-anchors".

Surely it is time to make "-fno-common" the default, at least when a modern C
standard is specified indicating that the code is modern?  People who need the
old behaviour can always get it with "-fcommon".

[Bug c/85676] New: Obsolete function declarations should have warnings by default

2018-05-07 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85676

Bug ID: 85676
   Summary: Obsolete function declarations should have warnings by
default
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

There are a number of features of C that are still legal code even in C11, but
have been obsolete for many generations of C because they are unnecessary and
have a high risk of errors.  gcc has to be able to support old code, but it is
perfectly valid to give warnings on such old code.

A prime example is non-prototype function declarations, such as "void foo()". 
This has been obsolete since C89, almost 30 years ago, yet if you have:

// a.c
void foo();

gcc -std=c11 -Wall -Wextra -pedantic -c a.c

it compiles without warning.  You need "-Wstrict-prototypes" to get a warning. 
Surely this warning ought to be part of -Wall, or enabled by default when a C11
or C99 standard is chosen?

The same applies to K style function definitions.

[Bug c/65892] gcc fails to implement N685 aliasing of union members

2018-04-21 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

--- Comment #37 from David Brown  ---
(In reply to Martin Sebor from comment #35)
> Here are the proposed changes:
> 
> Pointer Provenance:
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2219.htm#proposed-technical-
> corrigendum
> 
> Trap Representations:
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2220.htm#proposed-technical-
> corrigendum
> 
> Unspecified Values:
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2221.htm#proposed-technical-
> corrigendum

I am a little unsure of the suggestions for unspecified values here.  Can I
give some examples, to see if my interpretation is correct?

Let's use a new gcc builtin "__builtin_unspecified()" that returns an
unspecified value of int type, with no possible traps (no gcc target has trap
representations for int types, AFAIK).

int x = __builtin_unspecified();
int y = __builtin_unspecified();

if (x == y) doThis();  // The compiler can skip doThis()
if (x != y) doThat();  // The compiler can skip doThat() too

if (x == y) doThis(); else doThat();  
// The compiler can choose to doThis() or doThat(),
// but must do one or the other

if (x == x) doThis();  // This compiler must doThis()
if (x != x) doThat();  // The compiler cannot doThat()

if (x == 3) doThis();  // The compiler can choose to doThis()
// if the compiler does choose to doThis() the it fixes the value of x as 3

if (x & 0x01) doThis(); else doThat();
// The compiler can choose do doThis() or doThat(),
// but that choice fixes the LSB of x

This could allow for a range of possible optimisations, especially if there is
a nice way to make unspecified values like __builtin_unspecified(). 
(Unspecified values of other types could be made by casts.)  For example:

struct opt_int { bool valid; int value; };
struct opt_int safe_sqrt(struct opt_int x) {
opt_int y;
if (!x.valid || x.value < 0) {
y.valid = false;
y.value = __builtin_unspecified();
} else {
y.valid = true;
y.value = unsafe_sqrt(x.value);
}
return y;
 }

This kind of structure would mean minimal effort when you only need part of a
struct to contain specified values.

[Bug c/65892] gcc fails to implement N685 aliasing of union members

2018-04-21 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

--- Comment #36 from David Brown  ---
(In reply to Martin Sebor from comment #34)

> I think in the use case below:
> 
>struct { int i; char buf[4]; } s, r;
>*(float *)s.buf = 1.;
>r = s;
> 
> the aggregate copy has to be viewed as a recursive copy of each of its
> members and copying buf[4] must be viewed as a memcpy,  Char is definitely
> special (it can accesses anything with impunity, even indeterminate values).
> That said, I don't think the rules allow char arrays to be treated as
> allocated storage so while the store to s.buf via float* may be valid it
> doesn't change the effective type of s.buf and so the only way to read the
> float value stored in it is to copy it byte-by-byte (i.e., copy the float
> representation) to an object whose effective type is float.  Some of the
> papers that deal with the effective type rules might touch on this (e.g., DR
> 236, Clark's N1520

In bare metal embedded development, it is common to have to have a way to treat
static declared storage (like a char[] array) as a pool for dynamic storage. 
Often you don't want to use standard library malloc() because of requirements
on deterministic timing, etc.  What you are saying here is that this is not
possible - meaning there is no way to write such malloc replacement in normal C
code.  (It is possible, I think, to use gcc extensions such as the "may_alias"
type attribute and the "malloc" function attribute.  And -fno-strict-alias is
always a safe resort.)  It would be /very/ nice if there were a way to declare
statically allocated pools of memory that could be doled out by user-made
functions and - like malloc'ed memory - take their effective type when used.

It would be even better if there were a standard way to say that the initial
value of such memory is "unspecified".  The compiler and linker could give such
memory a static allocation (essential for small embedded systems with limited
memory, so that you can be sure of your memory usage) but there would be no
need for zeroing the memory at startup.

[Bug c/84646] New: Missed optimisation for hoisting conditions outside nested loops

2018-03-01 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

Bug ID: 84646
   Summary: Missed optimisation for hoisting conditions outside
nested loops
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

This is a missed optimisation opportunity.  In a discussion about the "best"
way to break out of a nested loop, I tested this code with gcc:

int foo(const int * p, const int * q, int m, int n) {
int sum = 0;
bool running = true;
const int max = 2;
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
if (running) {
sum += (p[i] * q[j]);
if (sum >= max) {
running = false;
sum = max;
}
}
}
}
return sum;
}


The test for "running" is hoisted outside the inner loop, so that the generated
code is changed to approximately:

int foo(const int * p, const int * q, int m, int n) {
int sum = 0;
bool running = true;
const int max = 2;
for (int i = 0; i < n; i++) {
loop:
if (running) {
for (int j = 0; j < m; j++) {
sum += (p[i] * q[j]);
if (sum >= max) {
running = false;
sum = max;
goto loop;
}
}
}
}
return sum;
}

This is definitely a good step - avoiding the check for "running" in the inner
loop, and breaking out of it when the max condition is reached is a clear win. 
But it would be even nicer if the check could be hoisted further - the
"running" flag could be completely eliminated to give the transformation:

int foo(const int * p, const int * q, int m, int n) {
int sum = 0;
//bool running = true;
const int max = 2;
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
if (running) {
sum += (p[i] * q[j]);
if (sum >= max) {
//running = false;
sum = max;
goto exit;
}
}
}
}
exit:
return sum;
}


Testing was done with -O2 and -O3, on a variety of gcc versions and targets
(thanks, godbolt.org!) up to version 8.0 (trunk at this time).  The generated
code showed approximately the same transformations and optimisations.

[Bug c/70952] Missing warning for likely-erroneous octal escapes in string literals

2018-01-29 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70952

David Brown  changed:

   What|Removed |Added

 CC||david at westcontrol dot com

--- Comment #6 from David Brown  ---
(In reply to Eric Gallager from comment #4)
> Actually dup of bug 60523
> 
> *** This bug has been marked as a duplicate of bug 60523 ***

Bug 60523 is a feature request for a warning on code that is valid but
suspicious (using octal constants other than 0), while this is a bug in the
diagnosis of invalid code (octal constants with digits 8 or 9 in them).

This one is a bug that should be fixed - 60523 is a feature request for a new
warning.

[Bug c/82845] -ftree-loop-distribute-patterns creates recursive loops on function called "memset"

2017-11-05 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82845

David Brown  changed:

   What|Removed |Added

 Resolution|DUPLICATE   |FIXED

--- Comment #2 from David Brown  ---
(In reply to Marc Glisse from comment #1)
> Richard's patch seems to have been forgotten :-(
> 
> *** This bug has been marked as a duplicate of bug 56888 ***

Sorry for the noise.  (Unless it encourages the fix from bug 56888 to be
implemented!)

[Bug c/82845] New: -ftree-loop-distribute-patterns creates recursive loops on function called "memset"

2017-11-05 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82845

Bug ID: 82845
   Summary: -ftree-loop-distribute-patterns creates recursive
loops on function called "memset"
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

(I have tested this using several gcc versions, including x86_64 and arm ports,
up to an 8.0.0 trunk test version.)

The -ftree-loop-distribute-patterns optimisation spots constructs that look
like memset, memcpy, etc., calls - and generates library calls to these
functions to give fast code using the library routines rather than generating
large amounts of inline code (so that the set or copy operations can be done in
lumps larger than one byte at a time).

But if you have have your own implementation of one of these functions (as may
be done for tracing calls, or in some embedded systems - or simply because you
are writing a C library), the call does not go to the library version - it will
go to the new local definition of the function.  For example, given this simple
memset implementation:

void *memset(void *s, int c, size_t n)
{
char *p = s;
while (n--) *p++ = c;
return s;
}

gcc on x86_64 with -O2 gives simple, solid code:

memset:
testq   %rdx, %rdx
movq%rdi, %rax
je  .L6
addq%rdi, %rdx
movq%rdi, %rcx
.L3:
addq$1, %rcx
movb%sil, -1(%rcx)
cmpq%rdx, %rcx
jne .L3
.L6:
rep ret


With -O3, which enables -ftree-loop-distribute-patterns, it gives:

memset:
testq   %rdx, %rdx
je  .L6
subq$8, %rsp
movsbl  %sil, %esi
callmemset
addq$8, %rsp
ret
.L6:
movq%rdi, %rax
ret

gcc spots that the loop is like an memset, and replaces it with a call to
memset.  But now that leads to infinite recursion.

[Bug rtl-optimization/82602] IRA considers volatile asm to be moveable

2017-10-19 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602

--- Comment #16 from David Brown  ---
(In reply to Xi Ruoyao from comment #8)
> (In reply to David Brown from comment #7)
> 
> > There is no intention to make "asm volatile" a barrier, as you get with a
> > memory clobber.  The intention is to stop it moving past other volatile code
> > (such as other asm volatiles, and volatile memory accesses).  An "asm
> > volatile" statement should still be moveable across other "normal" code.
> 
> I see... But then your code in the maillist
> 

Yes, of course if the compiler knows the contents of foo() (from LTO, or from a
definition in the same file), then it can re-arrange any non-volatile
statements there.  That is a separate issue, and a known issue.  For most
purposes this can be handled just by using memory clobbers or appropriate use
of volatile variables.  For more advanced uses, fake dependencies in asm
statements can also be used.  This is all known territory.

In order to get correct code, you need control of particular memory accesses
within a critical region like this - execution and calculations don't matter,
nor do local variables.  That's why memory clobbers are nice - they may lead to
some sub-optimal code, but they are an easy way to get it correct.

For the best possible code, you may want control of calculations and execution
as well - in particular, you want to minimise the processor time within the
critical region.  That is a lot harder, and there is currently no way to
specify things fully (though with careful use of asm dependencies, you can get
close).  But that is another issue entirely, and it is a matter of code
efficiency rather than code correctness.

However, correctness here depends on the compiler honouring the ordering of
volatile asm statements with respect to other volatile code, and this patch
fixes that.

[Bug rtl-optimization/82602] IRA considers volatile asm to be moveable

2017-10-18 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602

--- Comment #7 from David Brown  ---
(In reply to Xi Ruoyao from comment #3)
There is no intention to make "asm volatile" a barrier, as you get with a
memory clobber.  The intention is to stop it moving past other volatile code
(such as other asm volatiles, and volatile memory accesses).  An "asm volatile"
statement should still be moveable across other "normal" code.

(In reply to Xi Ruoyao from comment #4)
As for the comment in the kernel code, the gcc documentation says that an "asm"
statement with no output is automatically considered as though it were "asm
volatile".  So it should not be necessary to write "volatile" in the memory
barrier here, as the compiler should treat it that way anyway.  If there is a
compiler bug there, it should be fixed - or the documentation could be changed.
 There is certainly no harm in having the "volatile" explicit in the barrier()
definition.

[Bug rtl-optimization/82602] IRA considers volatile asm to be moveable

2017-10-18 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602

David Brown  changed:

   What|Removed |Added

 CC||david at westcontrol dot com

--- Comment #2 from David Brown  ---
For reference, the issue is also discussed here:

https://bugs.launchpad.net/gcc-arm-embedded/+bug/1722849

For this particular case, there are usable workarounds (with extra dependencies
or memory clobbers).  But "asm volatile" statements should not be moveable
across other asm volatile statements, volatile memory accesses, or other
observable behaviour.  (It is fine that they can be moved across other code,
just like other volatile accesses.)

[Bug c/65892] gcc fails to implement N685 aliasing of union members

2017-09-27 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

David Brown  changed:

   What|Removed |Added

 CC||david at westcontrol dot com

--- Comment #26 from David Brown  ---
(In reply to rguent...@suse.de from comment #24)
> On Wed, 2 Nov 2016, txr at alumni dot caltech.edu wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892
> > 
> > --- Comment #22 from Tim Rentsch  ---
> > [responding to comments from rguent...@suse.de in Comment 20]
> > 
> > > GCC already implements this if you specify -fno-strict-aliasing.
> > 
> > The main point of my comments is that the ISO C standard requires
> > the behavior in this case (and similar cases) be defined and not
> > subject to any reordering.  In other words the result must be the
> > same as an unoptimized version.  If a -fstrict-aliasing gcc /does/
> > transform the code so that the behavior is not the same as an
> > unoptimized version, then gcc is not a conforming implementation.
> 
> GCC has various optimization options that make it a not strictly
> conforming implementation (-ffast-math for example), various
> GNU extensions to the language, etc.
> 
> > Or is it your position that gcc is conforming only when operated
> > in the -fno-strict-aliasing mode?  That position seems contrary to
> > the documented description of the -fstrict-aliasing option.
> 
> Well, N685 is still disputed in this bug.  I was just pointing out
> that GCC has a switch to make it conforming to your interpretation
> of the standard (and this switch is the default at -O0 and -O1).

A key difference with non-conformance options like -ffast-math is that these
are not default options.  A user must actively choose to use them.  A user
should not need particular options in order to get correct object code from
their correct source code - or at least the user should get obvious error
messages when using default options but where their source code hits an oddity
in gcc (as they would get if they happened to use a gcc extension keyword like
"asm" as an identifier in conforming C code).  What should not happen is for
the compiler to silently break good code unless the user has given specific
flags.

I am not sure whether this particular case really is a bug or not.  However, I
wonder if there has been too much emphasis on trying to understand exactly what
the standards say.  If the gcc developers here, who are amongst the most
knowledgeable C and C++ experts around, have trouble with the details - then
consider the position of the average C developer.  Maybe it is better to try
see it from their viewpoint - would a programmer expect these accesses to alias
or not?  If it is likely that programmers would expect aliasing here, and see
that behaviour in other compilers, then the /useful/ default behaviour for gcc
would be to treat code in the way programmers expect - even with -O3.  Then
have a "-fI-know-what-I-am-doing" flag for those that want to squeeze out the
last bit of performance.

[Bug c/80872] New: There is no warning on accidental infinite loops

2017-05-24 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80872

Bug ID: 80872
   Summary: There is no warning on accidental infinite loops
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Would it be possible to add warnings on accidental infinite loops, such as:

void foo(void) {
  for (int i = 0; i <= 0x7fff; i++) {
 // ...
  }
}

The compiler (correctly) translates this to an infinite loop, in all versions
of gcc that I tested, and in both C and C++, with optimisation enabled.  But no
warning is given, even with -Wall -Wextra.  That includes the -Wtype-limits
warning, which I thought should trigger here.  Perhaps the order of passes is
such that the code is simplified to an infinite loop before the type limits
checking is done?

Replacing " <= 0x7fff" with "< 0x8000" gives triggers -Wsign-compare
but not -Wtype-limits, which is relevant because in C -Wsign-compare is in
-Wextra but not -Wall.  Exceeding the limits of unsigned int in the literal
here correctly triggers -Wtype-limits.

[Bug preprocessor/79346] -Wundef should not warn for standard macros

2017-02-03 Thread david at westcontrol dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79346

--- Comment #1 from David Brown  ---
Some more white-list suggestions for C++ :

__STDCPP_DEFAULT_NEW_ALIGNMENT__
__STDCPP_STRICT_POINTER_SAFETY__
__STDCPP_THREADS__


C++14 also has "Whether __STDC__ is predefined and if so, what its value is,
are implementation-defined".  What makes it a candidate for a white-list.

[Bug c/60523] New: Warning flag for octal literals

2014-03-14 Thread david at westcontrol dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60523

Bug ID: 60523
   Summary: Warning flag for octal literals
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com

It would be nice if there were a warning flag that triggered on octal literals. 

Octal literals are rarely used in modern C and C++ code, but can easily be
introduced by mistake - int x = 050; appears at first reading to set x to 50,
but in C and C++ the leading 0 means it is interpreted as octal and therefore
sets x to 40 (decimal).

As octal literals are seldom useful, and often confusing or accidental, they
are banned by coding standards like MISRA.  But as they are part of the
language defined by the C and C++ standards, and are used in some existing
code, they obviously cannot be removed.  As a compromise, I would like to
propose the introduction of a warning flags -Woctal which would produce a
diagnostic message when a literal is interpreted as an octal number.  Of
preference, this would be included in -Wextra (it probably should not be in
-Wall, since octal literals are valid C and C++).

[Bug c/60523] Warning flag for octal literals

2014-03-14 Thread david at westcontrol dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60523

--- Comment #5 from David Brown david at westcontrol dot com ---
I agree that warnings to match something like the MISRA coding standards would
be best done as a plugin.

But I believe that in this case, warning on octal literals would be quite a
small addition to the main gcc code.  I am not familiar enough with the gcc
source code to implement it myself (like many people, I blame lack of time) - I
therefore registered this enhancement request in the hope that someone was /is/
familiar with this part of gcc would be able to make the change quickly and
easily.

[Bug rtl-optimization/34791] [avr] optimisation of 8-bit logic sometimes fails

2010-02-16 Thread david at westcontrol dot com



--- Comment #1 from david at westcontrol dot com  2010-02-16 10:46 ---
There are many other cases where 8-bit optimisation is missing - C's integer
promotion gets in the way.  This is particularly common when dealing with a
compile-time constant - there is no way in C to say that 0x3f is 8-bit rather
than a 16-bit int.

Another example of code with this problem is:

void foo(void) {
static unsigned char count;

if (++count  0x3f) {
PORTC = ~0x01;
} else {
PORTC |= 0x01;
}
}

Both the  and the comparison with zero are done as 16-bit.


One work-around is to use this macro:

#define const_byte(x) ({ static const __attribute__((__progmem__)) \
 unsigned char v = x; v; })

Then we can write:

#define const_byte(x) ({ static const __attribute__((__progmem__)) \
 unsigned char v = x; v; })

uint8_t bar3(uint8_t x, uint8_t y) {
return data[y ^ (x  const_byte(0x0f))];
} 

147 bar3:
 148/* prologue: function */
 149/* frame size = 0 */
 150 008c 8F70  andi r24,lo8(15) ;  tmp45,
 151 008e 8627  eor r24,r22  ;  tmp45, y
 152 0090 E0E0  ldi r30,lo8(data);  tmp48,
 153 0092 F0E0  ldi r31,hi8(data);  tmp48,
 154 0094 E80F  add r30,r24  ;  tmp48, tmp45
 155 0096 F11D  adc r31,__zero_reg__ ;  tmp48
 156 0098 8081  ld r24,Z ; , data
 157/* epilogue start */
 158 009a 0895  ret
 160 


As far as I can see, this generated code is optimal.

The macro works because it forces the value to be 8-bit, rather than a 16-bit
compile-time constant.  However, the compiler is still smart enough to see that
since it's a const with known value, it's value can be used directly.  As a
side effect, the static variable must be created somewhere - by using
__progmen__, we create it in flash rather than wasting ram.  Even that waste
could be spared by garbage-collection linking, or by using a dedicated segment
rather than .progmem.data.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34791

[Bug c/34789] New: Missed optimisation on avr - sometimes the compiler keeps addresses in registers unnecessarily

2008-01-15 Thread david at westcontrol dot com

On RISC cpus with many registers, it is often helpful to keep constants (such
as addresses) in registers for later use, since loading a register with a
constant takes longer than moving data between registers, and since
register-constant operations are seldom available.  On the AVR, however,
register-constant operations are just as small and fast as register-register
operations (excluding the movw operation), so it is often better to use the
constants directly - it saves a few instructions (time and code space), and
avoids tying up a register pair unnecessarily.

Example:

extern uint16_t data[64];
extern uint16_t foo(uint16_t x);
extern uint16_t a;

uint16_t bar(void) {
uint16_t x;
x = foo(data[a]);
return foo(data[x]);
}

In this case, the compiler caches the address of data in r16:r17 :

  59bar:
  60/* prologue: frame size=0 */
  61 001a 0F93  push r16
  62 001c 1F93  push r17
  63/* prologue end (size=2) */
  64 001e E091  lds r30,a;  a, a
  65 0022 F091  lds r31,(a)+1;  a, a
  66 0026 00E0  ldi r16,lo8(data);  tmp45,
  67 0028 10E0  ldi r17,hi8(data);  tmp45,
  68 002a EE0F  lsl r30  ;  a
  69 002c FF1F  rol r31  ;  a
  70 002e E00F  add r30,r16  ;  a, tmp45
  71 0030 F11F  adc r31,r17  ;  a, tmp45
  72 0032 8081  ld r24,Z ; , data
  73 0034 9181  ldd r25,Z+1  ; , data
  74 0036 0E94  call foo ; 
  75 003a 880F  lsl r24  ;  tmp50
  76 003c 991F  rol r25  ;  tmp50
  77 003e 080F  add r16,r24  ;  tmp45, tmp50
  78 0040 191F  adc r17,r25  ;  tmp45, tmp50
  79 0042 F801  movw r30,r16 ; , tmp45
  80 0044 8081  ld r24,Z ; , data
  81 0046 9181  ldd r25,Z+1  ; , data
  82 0048 0E94  call foo ; 
  83/* epilogue: frame size=0 */
  84 004c 1F91  pop r17
  85 004e 0F91  pop r16
  86 0050 0895  ret

Better code could be generated by using data directly:

; Prologue avoided - saved 2 words code
lds r30, a
lds r31, (a) + 1
; Load of r16:r17 avoided - saved 2 words code
lsl r30 ; a
rol r31 ; a
subi r30, lo8(-(data))
sbci r31, hi8(-(data))
ld r24, z
ldd r25, Z+1
call foo
lsl r24 ; x
rol r25 ; x
; Using constant for data is just as small as register
subi r24, lo8(-(data))
sbci r25, hi8(-(data))
movw r30, r16
ld r24, z
ldd r25, Z+1
call foo
; Epilogue avoided - saved 2 words code
ret

This saves 6 code words, and corresponding time.


-- 
   Summary: Missed optimisation on avr - sometimes the compiler
keeps addresses in registers unnecessarily
   Product: gcc
   Version: 4.2.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: david at westcontrol dot com
GCC target triplet: avrgcc


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34789

[Bug c/34791] New: Missed optimisation on avr - optimisation of 8-bit logic sometimes fails

2008-01-15 Thread david at westcontrol dot com

As the avr is an 8-bit processor, it is important to handle data as 8-bit when
possible, even though the C standards force 8-bit data to be promoted to 16-bit
ints in many situations.  Common cases are when doing logic operations on chars
- when legally valid (i.e., the results are the same as for 16-bit operations),
the generated code should use 8-bit operations.  In particular, logic
operations using constants such as 0 and 0xff should be optimised.

Optimisation works well in many cases, but fails when the expressions get
slightly more complex:

extern uint8_t data[64];

uint8_t bar(uint8_t x, uint8_t y) {
return data[y ^ (x  0x0f)];
}

uint8_t bar2(uint8_t x, uint8_t y) {
return data[(y ^ x)  0x0f];
}

  40bar:
  41/* prologue: frame size=0 */
  42/* prologue end (size=0) */
  43  E82F  mov r30,r24  ;  x, x
  44 0002 F0E0  ldi r31,lo8(0)   ;  x,
  45 0004 EF70  andi r30,lo8(15) ;  x,
  46 0006 F070  andi r31,hi8(15) ;  x,
  47 0008 70E0  ldi r23,lo8(0)   ;  y,
  48 000a E627  eor r30,r22  ;  x, y
  49 000c F727  eor r31,r23  ;  x, y
  50 000e E050  subi r30,lo8(-(data));  x,
  51 0010 F040  sbci r31,hi8(-(data));  x,
  52 0012 8081  ld r24,Z ;  tmp51, data
  53 0014 90E0  ldi r25,lo8(0)   ;  result,
  54/* epilogue: frame size=0 */
  55 0016 0895  ret
  56/* epilogue end (size=1) */
  57/* function bar size 12 (11) */

  61bar2:
  62/* prologue: frame size=0 */
  63/* prologue end (size=0) */
  64 0018 E62F  mov r30,r22  ;  y, y
  65 001a E827  eor r30,r24  ;  y, x
  66 001c F0E0  ldi r31,lo8(0)   ; ,
  67 001e EF70  andi r30,lo8(15) ;  tmp46,
  68 0020 F070  andi r31,hi8(15) ;  tmp46,
  69 0022 E050  subi r30,lo8(-(data));  tmp46,
  70 0024 F040  sbci r31,hi8(-(data));  tmp46,
  71 0026 8081  ld r24,Z ;  tmp50, data
  72 0028 90E0  ldi r25,lo8(0)   ;  result,
  73/* epilogue: frame size=0 */
  74 002a 0895  ret
  75/* epilogue end (size=1) */
  76/* function bar2 size 10 (9) */

The first function bar has several clear improvements - it does all the logic
operations as 16-bit.  In the second case, the eor is nicely handled as
8-bit, but the  0x0f is done as 16-bit - there is an unnecessary andi r31,
hi8(15) instruction.


-- 
   Summary: Missed optimisation on avr - optimisation of 8-bit logic
sometimes fails
   Product: gcc
   Version: 4.2.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: david at westcontrol dot com
GCC target triplet: avrgcc


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34791

[Bug c/34790] New: Missed optimisation on avr - no sibling call optimisation

2008-01-15 Thread david at westcontrol dot com




-- 
   Summary: Missed optimisation on avr - no sibling call
optimisation
   Product: gcc
   Version: 4.2.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: david at westcontrol dot com
GCC target triplet: avrgcc


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34790

[Bug c/34790] Missed optimisation on avr - no sibling call optimisation

2008-01-15 Thread david at westcontrol dot com



--- Comment #1 from david at westcontrol dot com  2008-01-15 08:34 ---
(Sorry - I committed the bug before entering the description!)

Sibling calls are not optimised on avrgcc, even with -foptimize-sibling-calls
enabled (such as for -Os):

extern void foo(void);
void bar(void) {
foo();
}

  40bar:
  41/* prologue: frame size=0 */
  42/* prologue end (size=0) */
  43  0E94  call foo ; 
  44/* epilogue: frame size=0 */
  45 0004 0895  ret
  46/* epilogue end (size=1) */
  47/* function bar size 3 (2) */

Optimising the call foo; ret sequence to jmp foo would reduce the code size
by 1 word, the stack usage by 2 bytes (on most avrs), and save 5 cycles.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34790

[Bug c++/34792] New: Missed optimisation on avr - c++ worse than c compiler at 8-bit optimisations

2008-01-15 Thread david at westcontrol dot com

As noted in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34791 , it is important
on the avr to use 8-bit logic whenever possible.  In some cases, the c++
compiler will produce worse code than the c compiler for exactly the same
source:

extern uint8_t data[64];

uint8_t bar2(uint8_t x, uint8_t y) {
return data[(y ^ x)  0x0f];
}

Compiled as c, this gives:

  61bar2:
  62/* prologue: frame size=0 */
  63/* prologue end (size=0) */
  64 0018 E62F  mov r30,r22  ;  y, y
  65 001a E827  eor r30,r24  ;  y, x
  66 001c F0E0  ldi r31,lo8(0)   ; ,
  67 001e EF70  andi r30,lo8(15) ;  tmp46,
  68 0020 F070  andi r31,hi8(15) ;  tmp46,
  69 0022 E050  subi r30,lo8(-(data));  tmp46,
  70 0024 F040  sbci r31,hi8(-(data));  tmp46,
  71 0026 8081  ld r24,Z ;  tmp50, data
  72 0028 90E0  ldi r25,lo8(0)   ;  result,
  73/* epilogue: frame size=0 */
  74 002a 0895  ret
  75/* epilogue end (size=1) */
  76/* function bar2 size 10 (9) */

Here there is only one extra instruction, the andi r31, hi8(15).  But
compiling as c++, using the same -Os option, gives:


  61_Z4bar2hh:
  62/* prologue: frame size=0 */
  63/* prologue end (size=0) */
  64 0018 E62F  mov r30,r22  ;  y, y
  65 001a F0E0  ldi r31,lo8(0)   ;  y,
  66 001c 90E0  ldi r25,lo8(0)   ;  x,
  67 001e E827  eor r30,r24  ;  y, x
  68 0020 F927  eor r31,r25  ;  y, x
  69 0022 EF70  andi r30,lo8(15) ;  y,
  70 0024 F070  andi r31,hi8(15) ;  y,
  71 0026 E050  subi r30,lo8(-(data));  y,
  72 0028 F040  sbci r31,hi8(-(data));  y,
  73 002a 8081  ld r24,Z ;  tmp51, data
  74 002c 90E0  ldi r25,lo8(0)   ;  result,
  75/* epilogue: frame size=0 */
  76 002e 0895  ret
  77/* epilogue end (size=1) */
  78/* function uint8_t bar2(uint8_t, uint8_t) size 12 (11)
*/

This time, the compiler has clearly missed several opportunities to use 8-bit
logic instead of 16-bit logic.


-- 
   Summary: Missed optimisation on avr - c++ worse than c compiler
at 8-bit optimisations
   Product: gcc
   Version: 4.2.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: david at westcontrol dot com
GCC target triplet: avrgcc


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34792

84 matches

Mail list logo