[Bug middle-end/93487] Missed tail-call optimizations

2024-05-26 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93487

--- Comment #5 from Petr Skocik  ---
Another case of a missed tailcall which might warrant a separate mention:

struct big{ long _[10]; };
void takePtr(void *);
void takeBigAndPassItsAddress(struct big X){ takePtr(); }

This should ideally compile to just `lea 8(%rsp), %rdi; jmp takePtr;`.

The compiler might be tempted here to use the taking of an address of a local
here
as a reason not to tail call, and clang misses this optimization too, probably
for this reason,
but tailcalling here is fine as the particular local here isn't
allocated by the function but rather the callee during the call.

Icc does do this optimization: https://godbolt.org/z/a6coTzPjz

[Bug c/90181] Feature request: provide a way to explicitly select specific named registers in constraints

2024-04-17 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90181

Petr Skocik  changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #16 from Petr Skocik  ---
The current way of loading stuff into regs that don't have a specific
constraint for them also breaks on gcc (but not on clang) if the variable is
marked const.
https://godbolt.org/z/1PvYsrqG9

[Bug middle-end/112844] Branches under -Os (unlike -O{1, 2, 3}) do not respect __builtin_expect hints

2024-03-30 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112844

--- Comment #2 from Petr Skocik  ---
(In reply to Jakub Jelinek from comment #1)
> With -Os you ask the code to be small.  So, while internally the hint is
> still present in edge probabilities, -Os is considered more important and
> certain code changes based on the probabilities aren't done if they are
> known or expected to result in larger code.

Thanks. I very much like the codegen I get with gcc -Os, often better than what
I get with clang. But the sometimes counter-obvious branch layout at -Os is
annoying to me, especially considering I've measured it a couple of times as
being the source of a slowdown.
Sure you can save a (most-often-than not 2-byte) jump by conditionally jumping
over an unlikely branch instead of conditionally jumping to an unlikely branch
placed after ret and having it jump back in the function body (the latter is
what all the other compilers do at -Os), but I'd rather have the code spend the
extra two bytes and have my happy paths be fall-through as they should be.

[Bug target/114097] Missed register optimization in _Noreturn functions

2024-02-25 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097

--- Comment #4 from Petr Skocik  ---
Excellent! Thank you very much. Didn't realize the functionality was already
there, but didn't work without an explicit __attribute((noreturn)). Now I can
get rid of my most complex assembly function which I stupidly (back then I
thought cleverly) wrote. :)

[Bug rtl-optimization/10837] noreturn attribute causes no sibling calling optimization

2024-02-25 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837

Petr Skocik  changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #19 from Petr Skocik  ---
IMO(In reply to Xi Ruoyao from comment #16)

> In practice most _Noreturn functions are abort, exit, ..., i.e. they are
> only executed one time so optimizing against a cold path does not help much.
> I don't think it's a good idea to encourage people to construct some fancy
> code by a recursive _Noreturn function (why not just use a loop?!)  And if
> you must write such fancy code anyway IMO musttail attribute (PR83324) will
> be a better solution.

There's also longjmp, which may not be all that super cold and may be executed
multiple times. And while yeah, nobody will notice a single call vs jmp time
save against a process spawn/exit, for a longjmp wrapper, it'll make it a few %
faster (as would utilizing _Noreturn attributes for better register allocation:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097, which would also save a
bit of codesize too). Taillcalls can also save a bit of codesize if the target
is near.

[Bug c/114097] New: Missed register optimization in _Noreturn functions

2024-02-25 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097

Bug ID: 114097
   Summary: Missed register optimization in _Noreturn functions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Consider a never-returning functions such as this:

#include
#include
//_Noreturn
void noret(unsigned A, unsigned B, unsigned C, unsigned D, unsigned E, jmp_buf
Jb){

for(;A--;) puts("A");
for(;B--;) puts("B");
for(;C--;) puts("C");
for(;D--;) puts("D");
for(;E--;) puts("E");

longjmp(Jb,1);
}

https://godbolt.org/z/35YjrhjYq

In its prologue, gcc saves the arguments in call-preserved registers to
preserve them around the puts calls, and it does so the usual way: by (1)
pushing the old values of the call-preserved registers to the stack and (2)
actually moving the arguments into the call-preserved registers.

pushq   %r15
movq%r9, %r15
pushq   %r14
movl%edi, %r14d
pushq   %r13
movl%esi, %r13d
pushq   %r12
movl%edx, %r12d
pushq   %rbp
movl%ecx, %ebp
pushq   %rbx
movl%r8d, %ebx
pushq   %rax
//...

Since this function demonstrably never returns, step 1 can be entirely elided
as the old values of the call-preserved registers won't ever need to be
restored
(desirably, gcc does not generate the would-be-dead restoration code):


movq%r9, %r15
movl%edi, %r14d
movl%esi, %r13d
movl%edx, %r12d
movl%ecx, %ebp
movl%r8d, %ebx
pushq   %rax
//...

(Also desirable would be the unrealized tailoptimization of the longjmp call in
this case: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837)

[Bug c/114011] New: Feature request: __goto__

2024-02-20 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114011

Bug ID: 114011
   Summary: Feature request: __goto__
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Gcc has __volatile__.
I can only assume the rationale for it is so that inline asm macros can
do __asm __volatile__ and not have to worry about user-redefines of the
volatile keyword (which while not quite approved by the standard, is sometimes
practically useful).
While the __asm syntax also allows the goto keyword, there's currently no
__goto__ counterpart to __volatile__, which could similarly protect against
goto redefines.
Adding it is trivial and consistent with the already existing
volatile/__volatile__ pair. Would you consider it?

(
Why am I redefining goto? I'm basically doing it within the confines of a macro
framework to force a static context check on gotos to prevent gotos out of
scopes where doing it would be an error.
Something like:

enum { DISALLOW_GOTO_HERE = 0 }; //normally, goto is allowed
#define goto while(_Generic((int(*)[!DISALLOW_GOTO_HERE])0, int(*)[1]:1)) goto
//statically checked goto
int main(void){
goto next; next:; //OK, not disallowed in this context

#if 0 //would fail to compile
enum {DISALLOW_GOTO_HERE=1}; //disallowed in this context
goto next2; next2:;
#endif
}

While this redefine does not syntactically disturb C, it does disturb `__asm
goto()`, which I, unfortunately, have one very frequently used instance of, and
since there's no way to suppress an object macro redefine, I'd like to be able
to change it to `__asm __goto__` and have it peacefully coexist with the goto
redefine.
)

[Bug c/112844] New: Branches under -Os (unlike -O{1,2,3}) do not respect __builtin_expect hints

2023-12-04 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112844

Bug ID: 112844
   Summary: Branches under -Os (unlike -O{1,2,3}) do not respect
__builtin_expect hints
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

A simple example that demonstrates this is:

int test(void);
void yes(void);
void expect_yes(void){ if (__builtin_expect(test(),1)) yes(); else {} }
void expect_no(void){ if (__builtin_expect(test(),0)) yes(); else {} }

For an optimized x86-64 output, one should expect:
   -a fall-through to a yes() tailcall for the expect_yes() case, preceded by a
conditional jump to code doing a plain return
   -a fall-through to a plain return for the expect_no() case, preceded by a
conditional jump to a yes() tailcall (or even more preferably: a
conditional-taicall to yes() with the needed stack adjustment done once before
the test instead of being duplicated in each branch after the test)

Indeed, that's how gcc lays it out for -O{1,2,3}
(https://godbolt.org/z/rG3P3d6f7) as does clang at -O{1,2,3,s}
(https://godbolt.org/z/EcKbrn1b7) and icc at -O{1,2,3,s}
(https://godbolt.org/z/Err73eGsb).

But gcc at -Os seems to have a very strong preference to falling through to
call yes() even in 

void expect_no(void){ if (__builtin_expect(test(),0)) yes(); else {} }

and even in

void expect_no2(void){ if (__builtin_expect(!test(),1)){} else yes(); }

essentially completely disregarding any user attempts at controlling the branch
layout of the output.

[Bug ipa/106116] Missed optimization: in no_reorder-attributed functions, tail calls to the subsequent function could just be function-to-function fallthrough

2023-08-07 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106116

--- Comment #4 from Petr Skocik  ---
It would be interesting to do this at the assembler level, effectively
completely turning what's equivalent to `jmp 1f; 1:` to nothing. This would
also be in line with the GNU assembler's apparent philosophy that jmp is a
high-level variadic-length instruction (either jmp, or jmpq, whichever is
possible first => this could become: nothing, jmp, or jmpq).

I have a bunch of multiparam functions such with supporting functions
structured
as follows:

void func_A(int A){ func_AB(DEFAULT_C); }
void func_AB(int A, int B){ func_ABC(A,B,DEFAULT_C); }
void func_ABC(int A, int B, int C){ func_ABCD(A,B,C,DEFAULT_D); }
void func_ABC(int A, int B, int C, int D){
   //...
}
which could size-wise benefit from eliding the jumps, turning them into
fallthrus this way, but yeah, probably not worth the effort (unless somebody
knows how to easily hack gas to do it).

[Bug middle-end/109766] New: Passing doubles through the stack generates a stack adjustment pear each such argument at -Os/-Oz.

2023-05-08 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109766

Bug ID: 109766
   Summary: Passing doubles through the stack generates a stack
adjustment pear each such argument at -Os/-Oz.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

/*
 Passing doubles through the stack generates a stack adjustment pear each such
argument at -Os/-Oz.
 These stack adjustments are only coalesced at -O1/-O2/-O3, leaving -Os/-Oz
with larger code.
*/
#define $expr(...) (__extension__({__VA_ARGS__;}))
#define $regF0 $expr(register double x __asm("xmm0"); x)
#define $regF1 $expr(register double x __asm("xmm1"); x)
#define $regF2 $expr(register double x __asm("xmm2"); x)
#define $regF3 $expr(register double x __asm("xmm3"); x)
#define $regF4 $expr(register double x __asm("xmm4"); x)
#define $regF5 $expr(register double x __asm("xmm5"); x)
#define $regF6 $expr(register double x __asm("xmm6"); x)
#define $regF7 $expr(register double x __asm("xmm7"); x)

void func(char const*Fmt, ...);
void callfunc(char const*Fmt, double D0, double D1, double D2, double D3,
double D4, double D5, double D6, double D7){
func(Fmt,$regF0,$regF1,$regF2,$regF3,$regF4,$regF5,$regF6,$regF7,
D0,D1,D2,D3,D4,D5,D6,D7);
/*
//gcc @ -Os/-Oz
 :
   0:   50  push   %rax
   1:   b0 08   mov$0x8,%al
   3:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
   8:   66 0f d6 3c 24  movq   %xmm7,(%rsp)
   d:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  12:   66 0f d6 34 24  movq   %xmm6,(%rsp)
  17:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  1c:   66 0f d6 2c 24  movq   %xmm5,(%rsp)
  21:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  26:   66 0f d6 24 24  movq   %xmm4,(%rsp)
  2b:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  30:   66 0f d6 1c 24  movq   %xmm3,(%rsp)
  35:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  3a:   66 0f d6 14 24  movq   %xmm2,(%rsp)
  3f:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  44:   66 0f d6 0c 24  movq   %xmm1,(%rsp)
  49:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  4e:   66 0f d6 04 24  movq   %xmm0,(%rsp)
  53:   e8 00 00 00 00  callq  58 
54: R_X86_64_PLT32  func-0x4
  58:   48 83 c4 48 add$0x48,%rsp
  5c:   c3  retq
$sz(callfunc)=93

//clang @ -Os/-Oz
 :
   0:   48 83 ec 48 sub$0x48,%rsp
   4:   f2 0f 11 7c 24 38   movsd  %xmm7,0x38(%rsp)
   a:   f2 0f 11 74 24 30   movsd  %xmm6,0x30(%rsp)
  10:   f2 0f 11 6c 24 28   movsd  %xmm5,0x28(%rsp)
  16:   f2 0f 11 64 24 20   movsd  %xmm4,0x20(%rsp)
  1c:   f2 0f 11 5c 24 18   movsd  %xmm3,0x18(%rsp)
  22:   f2 0f 11 54 24 10   movsd  %xmm2,0x10(%rsp)
  28:   f2 0f 11 4c 24 08   movsd  %xmm1,0x8(%rsp)
  2e:   f2 0f 11 04 24  movsd  %xmm0,(%rsp)
  33:   b0 08   mov$0x8,%al
  35:   e8 00 00 00 00  callq  3a 
36: R_X86_64_PLT32  func-0x4
  3a:   48 83 c4 48 add$0x48,%rsp
  3e:   c3  retq   
$sz(callfunc)=63


//gcc @ -O1
 :
   0:   48 83 ec 48 sub$0x48,%rsp
   4:   f2 0f 11 7c 24 38   movsd  %xmm7,0x38(%rsp)
   a:   f2 0f 11 74 24 30   movsd  %xmm6,0x30(%rsp)
  10:   f2 0f 11 6c 24 28   movsd  %xmm5,0x28(%rsp)
  16:   f2 0f 11 64 24 20   movsd  %xmm4,0x20(%rsp)
  1c:   f2 0f 11 5c 24 18   movsd  %xmm3,0x18(%rsp)
  22:   f2 0f 11 54 24 10   movsd  %xmm2,0x10(%rsp)
  28:   f2 0f 11 4c 24 08   movsd  %xmm1,0x8(%rsp)
  2e:   f2 0f 11 04 24  movsd  %xmm0,(%rsp)
  33:   b8 08 00 00 00  mov$0x8,%eax
  38:   e8 00 00 00 00  callq  3d 
39: R_X86_64_PLT32  func-0x4
  3d:   48 83 c4 48 add$0x48,%rsp
  41:   c3  retq   
$sz(callfunc)=66
*/
}

https://godbolt.org/z/d8T3hxqWK

[Bug preprocessor/109704] New: #pragma {push,pop}_macro broken for identifiers that contain dollar signs at nonfirst positions

2023-05-02 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109704

Bug ID: 109704
   Summary: #pragma {push,pop}_macro broken for identifiers that
contain dollar signs at nonfirst positions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: preprocessor
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

This following dollarsign-less example compiles fine as expected:

#define MACRO 1
_Static_assert(MACRO,"");
#pragma push_macro("MACRO")
#undef MACRO
#define MACRO 0
_Static_assert(!MACRO,"");
#pragma pop_macro("MACRO")
_Static_assert(MACRO,""); //OK


Substituting $MACRO for MACRO still works, but with MACRO$ or M$CRO the final
assertions fail: https://godbolt.org/z/n1EoGao74

[Bug tree-optimization/93265] memcmp comparisons of structs wrapping a primitive type not as compact/efficient as direct comparisons of the underlying primitive type under -Os

2023-05-01 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93265

--- Comment #3 from Petr Skocik  ---
Here's another example (which may be summarizing it more nicely)

struct a{ char _[4]; };
#include 
int cmp(struct a A, struct a B){ return !!memcmp(,,4); }

Expected x86-64 codegen (✓ for gcc -O2/-O3 and for clang -Os/-O2/-O3)   
xor eax, eax
cmp edi, esi
setne   al
ret

gcc -Os codegen:
subq$24, %rsp
movl$4, %edx
movl%edi, 12(%rsp)
leaq12(%rsp), %rdi
movl%esi, 8(%rsp)
leaq8(%rsp), %rsi
callmemcmp
testl   %eax, %eax
setne   %al
addq$24, %rsp
movzbl  %al, %eax
ret

https://godbolt.org/z/G5eE5GYv4

[Bug c/94379] Feature request: like clang, support __attribute((__warn_unused_result__)) on structs, unions, and enums

2023-04-23 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94379

--- Comment #2 from Petr Skocik  ---
Excellent! 

For optional super extra coolness, this might work (and clang doesn't do this)
with statement expressions too so that statement expression-based macros could
be
marked warn_unused_result through it too.


typedef struct  __attribute((__warn_unused_result__)) { int x; } 
wur_retval_t;

wur_retval_t foo(void){ int x=41; return (wur_retval_t){x+1}; }
#define foo_macro() ({  int x=41; (wur_retval_t){x+1}; })

void use(void){
foo();  //warn unused result ✓
foo_macro(); //perhaps should "warn unused result" too?
}

[Bug c/109567] New: Useless stack adjustment by 16 around calls with odd stack-argument counts on SysV x86_64

2023-04-20 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109567

Bug ID: 109567
   Summary: Useless stack adjustment by 16 around calls with odd
stack-argument counts on SysV x86_64
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

For function calls with odd stack argument counts, gcc generates a useless `sub
$16, %rsp` at the beginning of the calling function.

Example (https://godbolt.org/z/Y4ErE8ee9):
#include 
int callprintf_0stk(char const*Fmt){ return printf(Fmt,0,0,0,0,0),0; }
int callprintf_1stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1),0; } 
//useless sub $0x10,%rsp
int callprintf_2stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2),0; }
int callprintf_3stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2,3),0; }
//useless sub $0x10,%rsp
int callprintf_4stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2,3,4),0;
}
int callprintf_5stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0,
1,2,3,4,5),0; } //useless sub $0x10,%rsp
int callprintf_6stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0,
1,2,3,4,5,6),0; }
int callprintf_7stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0,
1,2,3,4,5,6,7),0; } //useless sub $0x10,%rsp

[Bug middle-end/108799] Improper deprecation diagnostic for rsp clobber

2023-04-01 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108799

Petr Skocik  changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #3 from Petr Skocik  ---
Very good question. The deprecation of SP clobbers could use some explanation
if there are indeed good reasons for it. 

IMO, if listing the SP as a clobber both (1) forces a frame pointer with
frame-pointer-relative addressing of spills (and the frame pointer isn't
clobbered too) and (2) avoids the use of the red zone (and it absolutely should
continue to do both of these things in my opinion) then gcc shouldn't need to
care about redzone clobbers (as in the `pushf;pop` example) or even a wide
class of stack pointer changes (assembly-made stack allocation and frees) just
as long as no spills made by the compiler are clobbered (or opened to being
clobbered from signal handlers) by such head-of-the-stack manipulation. Even
with assembly-less standard C that uses VLAs or allocas, gcc cannot count on
being in control of the stack pointer anyway, so why be so fussy about it when
something as expert-oriented as inline assembly tries to manipulate it?

[Bug c/108194] GCC won't treat two compatible function types as compatible if any of them (or both of them) is declared _Noreturn

2022-12-22 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108194

--- Comment #6 from Petr Skocik  ---
(In reply to Petr Skocik from comment #5)
> (In reply to Andrew Pinski from comment #4)
> > Invalid as mentioned in r13-3135-gfa258f6894801a .
> 
> I believe it's still a bug for pre-c2x __typeof.
> While it is GCC's prerogative to include _Noreturn/__attribute((noreturn))
> into the type for its own __typeof (which, BTW, I think is better design
> than the standardized semantics), I think two otherwise compatible function
> types should still remain compatible if they both either have or don't have
> _Noreturn/__attribute((noreturn)). But treating `_Noreturn void
> NR_FN_A(void);` 
> as INcompatible with `_Noreturn void NR_FN_B(void);` that's just wonky, IMO.

OK, the bug was MINE after all.

For bug report archeologists: I was doing what was meant to be a full
(qualifers-including) type comparison wrong. While something like
_Generic((__typeof(type0)*)0, __typeof(type1)*:1, default:0) suffices to get
around _Generic dropping qualifs (const/volatile/_Atomic) in its controlling
expression, for function pointer types at single pointer layer of indirection,
the _Noreturn attribute will still get dropped in the controlling expression of
_Generic (I guess that makes sense because they're much more closely related to
functions that how another pointer type would be related to its target type)
and another pointer layer of indirection if required as in
`_Generic((__typeof(type0)**)0, __typeof(type1)**:1, default:0)`.

Thanks you all very much, especially jos...@codesourcery.com, who pointed me
(pun intended) to the right solution over email. :)

[Bug c/108194] GCC won't treat two compatible function types as compatible if any of them (or both of them) is declared _Noreturn

2022-12-21 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108194

Petr Skocik  changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #5 from Petr Skocik  ---
(In reply to Andrew Pinski from comment #4)
> Invalid as mentioned in r13-3135-gfa258f6894801a .

I believe it's still a bug for pre-c2x __typeof.
While it is GCC's prerogative to include _Noreturn/__attribute((noreturn)) into
the type for its own __typeof (which, BTW, I think is better design than the
standardized semantics), I think two otherwise compatible function types should
still remain compatible if they both either have or don't have
_Noreturn/__attribute((noreturn)). But treating `_Noreturn void NR_FN_A(void);` 
as INcompatible with `_Noreturn void NR_FN_B(void);` that's just wonky, IMO.

[Bug c/108194] New: GCC won't treat two compatible function types as compatible if any of them (or both of them) is declared _Noreturn

2022-12-21 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108194

Bug ID: 108194
   Summary: GCC won't treat two compatible function types as
compatible if any of them (or both of them) is
declared _Noreturn
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

(same with __attribute((noreturn))) Example (https://godbolt.org/z/ePGd95sWz):


void FN_A(void);
void FN_B(void);
_Noreturn void NR_FN_A(void);
_Noreturn void NR_FN_B(void);

_Static_assert(_Generic((__typeof(*(FN_A))*){0}, __typeof(*(FN_B))*: 1), "");
//OK ✓
_Static_assert(_Generic((__typeof(*(NR_FN_A))*){0}, __typeof(*(NR_FN_B))*: 1),
""); //ERROR ✗
_Static_assert(_Generic((__typeof(*(FN_A))*){0}, __typeof(*(NR_FN_B))*: 1),
""); //ERROR ✗

As you can see from the Compiler Explorer link, clang accepts all three, which
is as it should be as per the standard, where _Noreturn is a function specifier
(https://port70.net/~nsz/c/c11/n1570.html#6.7.4), which means it shouldn't even
go into the type.

(Personally, I don't even mind it going into the type just as long as two
otherwise identical _Noreturn functio declarations are deemed as having the
same type).

Regards,
Petr Skocik

[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-12-17 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

--- Comment #9 from Petr Skocik  ---
Regarding the size of alloca/VLA-generated code under -fstack-clash-protection.
I've played with this a little bit and while I love the feature, the code size
increases seem quite significant and unnecessarily so.

Take a simple

void ALLOCA_C(size_t Sz){ char buf[Sz]; asm volatile ("" : : "r"([0])); }

gcc -fno-stack-clash-protection: 17 bytes
gcc -fstack-clash-protection: 72 bytes

clang manages with less of an increase:

-fno-stack-clash_protection: 26 bytes
-stack-clash-protection: 45 bytes

Still this could be as low as 11 bytes  for the -fclash-stack-protection
version (less than for the unprotected one!) all by using a simple call to an
assembly function, whose code can be no-clobber without much extra effort.

Linked in compiler explorer is a crack at the idea along with benchmarks: 
https://godbolt.org/z/f8rhG1ozs

The performance impact of the call seems negligible (practically less than 1ns,
though in the above quick-and-dirty benchmark it fluctuates a tiny bit,
sometimes even giving the non-inline version an edge).

I originally suggested popping the address of the stack and repushing before
calling returning. Ended up just repushing -- the old return address becomes
part of the alloca allocation. The concern that this could mess up the return
stack buffer of the CPU seems valid but all the benchmarks indicate it
doesn't--not even when the ret address is popped--just as long as the return
target address is the same. 

(When it isn't, the performance penalty is rather significant: measured a 19
times slowdown of that for comparison (it's also in the linked benchmarks)).

The (x86-64) assembly function:
#define STR(...) STR__(__VA_ARGS__) //{{{
#define STR__(...) #__VA_ARGS__ //}}}
asm(STR(
.global safeAllocaAsm;
safeAllocaAsm: //no clobber, though does expect 16-byte aligned at entry as
usual
push %r10;
cmp $16, %rdi;
ja .LsafeAllocaAsm__test32;
push 8(%rsp);
ret;
.LsafeAllocaAsm__test32:
push %r10;
push %rdi;
mov %rsp, %r10;
sub $17, %rdi;
and $-16, %rdi; //(-32+15)&(-16) //substract the 32 and 16-align, rounding
up
jnz .LsafeAllocaAsm__probes;
.LsafeAllocaAsm__ret:
lea (3*8)(%r10,%rdi,1), %rdi;
push (%rdi);
mov -8(%rdi), %r10;
mov -16(%rdi), %rdi;
ret;
.LsafeAllocaAsm__probes:
sub %rdi, %r10;  //r10 is the desired rsp
.LsafeAllocaAsm__probedPastDesiredSpEh:
cmp %rsp, %r10; jge .LsafeAllocaAsm__pastDesiredSp;
orl $0x0,(%rsp);
sub $0x1000,%rsp;
jmp .LsafeAllocaAsm__probedPastDesiredSpEh;
.LsafeAllocaAsm__pastDesiredSp:
mov %r10, %rsp; //set the desired sp
jmp .LsafeAllocaAsm__ret;
.size safeAllocaAsm, .-safeAllocaAsm;
));

Cheers, 
Petr Skocik

[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-11-24 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

--- Comment #7 from Petr Skocik  ---
(In reply to Jakub Jelinek from comment #4)
> Say for
> void bar (char *);
> void
> foo (int x, int y)
> {
>   __attribute__((assume (x < 64)));
>   for (int i = 0; i < y; ++i)
> bar (__builtin_alloca (x));
> }
> all the alloca calls are known to be small, yet they can quickly cross pages.
> Similarly:
> void
> baz (int x)
> {
>   if (x >= 512) __builtin_unreachable ();
>   char a[x];
>   bar (a);
>   char b[x];
>   bar (b);
>   char c[x];
>   bar (c);
>   char d[x];
>   bar (d);
>   char e[x];
>   bar (e);
>   char f[x];
>   bar (f);
>   char g[x];
>   bar (g);
>   char h[x];
>   bar (h);
>   char i[x];
>   bar (i);
>   char j[x];
>   bar (j);
> }
> All the VLAs here are small, yet together they can cross a page.
> So, we'd need to punt for dynamic allocations in loops and for others
> estimate
> the maximum size of all the allocations together (+ __builtin_alloca
> overhead + normal frame size).

I think this shouldn't need probes either (unless you tried to coalesce the
allocations) on architectures where making a function call touches the stack.
Also alloca's of less than or equal to half a page intertwined with writes
anywhere to the allocated blocks should be always safe (but I guess I'll just
turn stack-clash-protection off in the one file where I'm making such clearly
safe dynamic stack allocations).

[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-11-23 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

--- Comment #6 from Petr Skocik  ---
(In reply to Jakub Jelinek from comment #2)
> (In reply to Petr Skocik from comment #1)
> > Sidenote regarding the stack-allocating code for cases when the size is not
> > known to be less than pagesize: the code generated for those cases is quite
> > large. It could be replaced (at least under -Os) with a call to a special
> > assembly function that'd pop the return address (assuming the target machine
> > pushes return addresses to the stack), allocate adjust and allocate the
> > stack size in a piecemeal fashion so as to not skip guard pages, the repush
> > the return address and return to caller with the stacksize expanded.
> 
> You certainly don't want to kill the return stack the CPU has, even if it
> results in a few saved bytes for -Os.

That's a very interesting point  because I have written x86_64 assembly
"functions" that  did pop the return address, pushed something to the stack,
and then repushed the return address and returned. In a loop, it doesn't seem
to perform badly compared to inline code, so I figure it shouldn't be messing
with the return stack buffer. After all, even though the return happens through
a different place in the callstack, it's still returning to the original
caller. The one time I absolutely must have accidentally messed with the return
stack buffer was when I wrote context switching routine and originally tried to
"ret" to the new context. It turned out to be very measurably many times slower
that `pop %rcx; jmp *%rcx;` (also measured on a loop), so that's why I think
popping a return address, allocating on the stack, and then pushing and
returning is not really a performance killer (on my Intel CPU anyway). If it
was messing with the return stack buffer, I think would be getting  similar
slowdowns to what I got with context switching code trying to `ret`.

[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-11-23 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

--- Comment #1 from Petr Skocik  ---
Sidenote regarding the stack-allocating code for cases when the size is not
known to be less than pagesize: the code generated for those cases is quite
large. It could be replaced (at least under -Os) with a call to a special
assembly function that'd pop the return address (assuming the target machine
pushes return addresses to the stack), allocate adjust and allocate the stack
size in a piecemeal fashion so as to not skip guard pages, the repush the
return address and return to caller with the stacksize expanded.

[Bug c/107831] New: Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-11-23 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

Bug ID: 107831
   Summary: Missed optimization: -fclash-stack-protection causes
unnecessary code generation for dynamic stack
allocations that are clearly less than a page
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

I'm talking allocations such as

char buf [ (uint8_t)size ];

The resulting code for this should ideally be the same with or without
-fstack-clash-protection as this can clearly never skip a whole page.

But gcc generates a big loop trying to touch every page-sized subpart of that
allocation.

https://godbolt.org/z/G8EbzbshK

[Bug c/106116] New: Missed optimization: in no_reorder-attributed functions, tail calls to the subsequent function could just be function-to-function fallthrough

2022-06-28 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106116

Bug ID: 106116
   Summary: Missed optimization: in no_reorder-attributed
functions, tail calls to the subsequent function could
just be function-to-function fallthrough
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Example:

__attribute((noinline,no_reorder))
int fnWithExplicitArg(int ExplicitArg);

__attribute((noinline,no_reorder))
int fnWithDefaultArg(void){ return fnWithExplicitArg(42); }

int fnWithExplicitArg(int ExplicitArg){
int useArg(int);
return 12+useArg(ExplicitArg);
}

Generated fnWithDefaultArg:

fnWithDefaultArg:
mov edi, 42
jmp fnWithExplicitArg
fnWithExplicitArg:
//...

Desired fnWithDefaultArg


fnWithDefaultArg:
mov edi, 42
//fallthru
fnWithExplicitArg:
//...

https://gcc.godbolt.org/z/Ph3onxoh9

[Bug target/85927] ud2 instruction generated starting with gcc 8

2021-12-31 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85927

Petr Skocik  changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #5 from Petr Skocik  ---
I think it'd be more welcome if gcc just put nothing there like clang does.

[Bug c/102096] New: Gcc unnecessarily initializes indeterminate variables passed across function boundaries

2021-08-27 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102096

Bug ID: 102096
   Summary: Gcc unnecessarily initializes indeterminate variables
passed across function boundaries
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Compared to clang where:

long ret_unspec(void){ auto long rv; return rv; }

void take6(long,long,long,long,long,long);

void call_take6(void)
{
//6 unnecessary XORs on GCC
auto long id0; //indeterminate
auto long id1; //indeterminate
auto long id2; //indeterminate
auto long id3; //indeterminate
auto long id4; //indeterminate
auto long id5; //indeterminate
take6(id0,id1,id2,id3,id4,id5);
}

yields (x86_64):

ret_unspec:# @ret_unspec2
retq
call_take6: # @call_take6
jmp take6

(1+5 bytes), GCC compiles the above to
ret_unspec2:
xorl%eax, %eax
ret
call_take6:
xorl%r9d, %r9d
xorl%r8d, %r8d
xorl%ecx, %ecx
xorl%edx, %edx
xorl%esi, %esi
xorl%edi, %edi
jmp take6

(3+19 bytes), unnecessarily 0-initializing  the indeterminate
return-value/arguments.

Type casting the called function can often be hackishly used to get the same
assembly but doing so is technically UB and not as generic as supporting the
passing of unspecified arguments/return values, which can be used to omit
argument register initializations not just for arguments at the end of an
argument pack but also in the middle.

TL;DR: Allowing to passing/return indeterminate variables without generating
initializing code for them would be nice. Clang already does it.

[Bug c/98418] Valid integer constant expressions based on expressions that trigger -Wshift-overflow are treated as non-constants

2020-12-24 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98418

pskocik at gmail dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from pskocik at gmail dot com ---
You're right. The bug was in my code.

struct foo { unsigned bit: (0xll<<40)!=0; };

is indeed UB due to http://port70.net/~nsz/c/c11/n1570.html#6.5.7p4, but

struct foo { unsigned bit: (0xull<<40)!=0; };

isn't and GCC accepts it without complaint.

Apologies for the false alarm.

[Bug c/98418] Valid integer constant expressions based on expressions that trigger -Wshift-overflow are treated as non-constants

2020-12-24 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98418

--- Comment #2 from pskocik at gmail dot com ---
You're right. The bug was in my code.

struct foo { unsigned bit: (0xll<<40)!=0; };

is indeed UB due to http://port70.net/~nsz/c/c11/n1570.html#6.5.7p4, but

struct foo { unsigned bit: (0xull<<40)!=0; };

isn't and GCC accepts it without complaint.

Apologies for the false alarm.

[Bug c/98418] New: Valid integer constant expressions based on expressions that trigger -Wshift-overflow are treated as non-constants

2020-12-22 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98418

Bug ID: 98418
   Summary: Valid integer constant expressions based on
expressions that trigger -Wshift-overflow are treated
as non-constants
   Product: gcc
   Version: 6.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

This causes things like:

struct foo { unsigned bit: (0xll<<40)!=0; };

to elicit a -pedantic warning about the bitfield width not being a proper
integer constant expression, even though it is.

In other contexts, a complete compilation error might ensue:

extern int bar[ (0xll<<40)!=0 ]; //seen as an invalid VLA


https://gcc.godbolt.org/z/7zfz96

Neither clang nor gcc <= 5 appear to have this bug.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93241 seems related.