[Bug middle-end/93487] Missed tail-call optimizations

2024-05-26 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93487

--- Comment #5 from Petr Skocik  ---
Another case of a missed tailcall which might warrant a separate mention:

struct big{ long _[10]; };
void takePtr(void *);
void takeBigAndPassItsAddress(struct big X){ takePtr(); }

This should ideally compile to just `lea 8(%rsp), %rdi; jmp takePtr;`.

The compiler might be tempted here to use the taking of an address of a local
here
as a reason not to tail call, and clang misses this optimization too, probably
for this reason,
but tailcalling here is fine as the particular local here isn't
allocated by the function but rather the callee during the call.

Icc does do this optimization: https://godbolt.org/z/a6coTzPjz

[Bug c/90181] Feature request: provide a way to explicitly select specific named registers in constraints

2024-04-17 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90181

Petr Skocik  changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #16 from Petr Skocik  ---
The current way of loading stuff into regs that don't have a specific
constraint for them also breaks on gcc (but not on clang) if the variable is
marked const.
https://godbolt.org/z/1PvYsrqG9

[Bug middle-end/112844] Branches under -Os (unlike -O{1, 2, 3}) do not respect __builtin_expect hints

2024-03-30 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112844

--- Comment #2 from Petr Skocik  ---
(In reply to Jakub Jelinek from comment #1)
> With -Os you ask the code to be small.  So, while internally the hint is
> still present in edge probabilities, -Os is considered more important and
> certain code changes based on the probabilities aren't done if they are
> known or expected to result in larger code.

Thanks. I very much like the codegen I get with gcc -Os, often better than what
I get with clang. But the sometimes counter-obvious branch layout at -Os is
annoying to me, especially considering I've measured it a couple of times as
being the source of a slowdown.
Sure you can save a (most-often-than not 2-byte) jump by conditionally jumping
over an unlikely branch instead of conditionally jumping to an unlikely branch
placed after ret and having it jump back in the function body (the latter is
what all the other compilers do at -Os), but I'd rather have the code spend the
extra two bytes and have my happy paths be fall-through as they should be.

[Bug target/114097] Missed register optimization in _Noreturn functions

2024-02-25 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097

--- Comment #4 from Petr Skocik  ---
Excellent! Thank you very much. Didn't realize the functionality was already
there, but didn't work without an explicit __attribute((noreturn)). Now I can
get rid of my most complex assembly function which I stupidly (back then I
thought cleverly) wrote. :)

[Bug rtl-optimization/10837] noreturn attribute causes no sibling calling optimization

2024-02-25 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837

Petr Skocik  changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #19 from Petr Skocik  ---
IMO(In reply to Xi Ruoyao from comment #16)

> In practice most _Noreturn functions are abort, exit, ..., i.e. they are
> only executed one time so optimizing against a cold path does not help much.
> I don't think it's a good idea to encourage people to construct some fancy
> code by a recursive _Noreturn function (why not just use a loop?!)  And if
> you must write such fancy code anyway IMO musttail attribute (PR83324) will
> be a better solution.

There's also longjmp, which may not be all that super cold and may be executed
multiple times. And while yeah, nobody will notice a single call vs jmp time
save against a process spawn/exit, for a longjmp wrapper, it'll make it a few %
faster (as would utilizing _Noreturn attributes for better register allocation:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097, which would also save a
bit of codesize too). Taillcalls can also save a bit of codesize if the target
is near.

[Bug c/114097] New: Missed register optimization in _Noreturn functions

2024-02-25 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097

Bug ID: 114097
   Summary: Missed register optimization in _Noreturn functions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Consider a never-returning functions such as this:

#include
#include
//_Noreturn
void noret(unsigned A, unsigned B, unsigned C, unsigned D, unsigned E, jmp_buf
Jb){

for(;A--;) puts("A");
for(;B--;) puts("B");
for(;C--;) puts("C");
for(;D--;) puts("D");
for(;E--;) puts("E");

longjmp(Jb,1);
}

https://godbolt.org/z/35YjrhjYq

In its prologue, gcc saves the arguments in call-preserved registers to
preserve them around the puts calls, and it does so the usual way: by (1)
pushing the old values of the call-preserved registers to the stack and (2)
actually moving the arguments into the call-preserved registers.

pushq   %r15
movq%r9, %r15
pushq   %r14
movl%edi, %r14d
pushq   %r13
movl%esi, %r13d
pushq   %r12
movl%edx, %r12d
pushq   %rbp
movl%ecx, %ebp
pushq   %rbx
movl%r8d, %ebx
pushq   %rax
//...

Since this function demonstrably never returns, step 1 can be entirely elided
as the old values of the call-preserved registers won't ever need to be
restored
(desirably, gcc does not generate the would-be-dead restoration code):


movq%r9, %r15
movl%edi, %r14d
movl%esi, %r13d
movl%edx, %r12d
movl%ecx, %ebp
movl%r8d, %ebx
pushq   %rax
//...

(Also desirable would be the unrealized tailoptimization of the longjmp call in
this case: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837)

[Bug c/114011] New: Feature request: __goto__

2024-02-20 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114011

Bug ID: 114011
   Summary: Feature request: __goto__
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Gcc has __volatile__.
I can only assume the rationale for it is so that inline asm macros can
do __asm __volatile__ and not have to worry about user-redefines of the
volatile keyword (which while not quite approved by the standard, is sometimes
practically useful).
While the __asm syntax also allows the goto keyword, there's currently no
__goto__ counterpart to __volatile__, which could similarly protect against
goto redefines.
Adding it is trivial and consistent with the already existing
volatile/__volatile__ pair. Would you consider it?

(
Why am I redefining goto? I'm basically doing it within the confines of a macro
framework to force a static context check on gotos to prevent gotos out of
scopes where doing it would be an error.
Something like:

enum { DISALLOW_GOTO_HERE = 0 }; //normally, goto is allowed
#define goto while(_Generic((int(*)[!DISALLOW_GOTO_HERE])0, int(*)[1]:1)) goto
//statically checked goto
int main(void){
goto next; next:; //OK, not disallowed in this context

#if 0 //would fail to compile
enum {DISALLOW_GOTO_HERE=1}; //disallowed in this context
goto next2; next2:;
#endif
}

While this redefine does not syntactically disturb C, it does disturb `__asm
goto()`, which I, unfortunately, have one very frequently used instance of, and
since there's no way to suppress an object macro redefine, I'd like to be able
to change it to `__asm __goto__` and have it peacefully coexist with the goto
redefine.
)

[Bug c/112844] New: Branches under -Os (unlike -O{1,2,3}) do not respect __builtin_expect hints

2023-12-04 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112844

Bug ID: 112844
   Summary: Branches under -Os (unlike -O{1,2,3}) do not respect
__builtin_expect hints
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

A simple example that demonstrates this is:

int test(void);
void yes(void);
void expect_yes(void){ if (__builtin_expect(test(),1)) yes(); else {} }
void expect_no(void){ if (__builtin_expect(test(),0)) yes(); else {} }

For an optimized x86-64 output, one should expect:
   -a fall-through to a yes() tailcall for the expect_yes() case, preceded by a
conditional jump to code doing a plain return
   -a fall-through to a plain return for the expect_no() case, preceded by a
conditional jump to a yes() tailcall (or even more preferably: a
conditional-taicall to yes() with the needed stack adjustment done once before
the test instead of being duplicated in each branch after the test)

Indeed, that's how gcc lays it out for -O{1,2,3}
(https://godbolt.org/z/rG3P3d6f7) as does clang at -O{1,2,3,s}
(https://godbolt.org/z/EcKbrn1b7) and icc at -O{1,2,3,s}
(https://godbolt.org/z/Err73eGsb).

But gcc at -Os seems to have a very strong preference to falling through to
call yes() even in 

void expect_no(void){ if (__builtin_expect(test(),0)) yes(); else {} }

and even in

void expect_no2(void){ if (__builtin_expect(!test(),1)){} else yes(); }

essentially completely disregarding any user attempts at controlling the branch
layout of the output.

[Bug ipa/106116] Missed optimization: in no_reorder-attributed functions, tail calls to the subsequent function could just be function-to-function fallthrough

2023-08-07 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106116

--- Comment #4 from Petr Skocik  ---
It would be interesting to do this at the assembler level, effectively
completely turning what's equivalent to `jmp 1f; 1:` to nothing. This would
also be in line with the GNU assembler's apparent philosophy that jmp is a
high-level variadic-length instruction (either jmp, or jmpq, whichever is
possible first => this could become: nothing, jmp, or jmpq).

I have a bunch of multiparam functions such with supporting functions
structured
as follows:

void func_A(int A){ func_AB(DEFAULT_C); }
void func_AB(int A, int B){ func_ABC(A,B,DEFAULT_C); }
void func_ABC(int A, int B, int C){ func_ABCD(A,B,C,DEFAULT_D); }
void func_ABC(int A, int B, int C, int D){
   //...
}
which could size-wise benefit from eliding the jumps, turning them into
fallthrus this way, but yeah, probably not worth the effort (unless somebody
knows how to easily hack gas to do it).

[Bug middle-end/109766] New: Passing doubles through the stack generates a stack adjustment pear each such argument at -Os/-Oz.

2023-05-08 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109766

Bug ID: 109766
   Summary: Passing doubles through the stack generates a stack
adjustment pear each such argument at -Os/-Oz.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

/*
 Passing doubles through the stack generates a stack adjustment pear each such
argument at -Os/-Oz.
 These stack adjustments are only coalesced at -O1/-O2/-O3, leaving -Os/-Oz
with larger code.
*/
#define $expr(...) (__extension__({__VA_ARGS__;}))
#define $regF0 $expr(register double x __asm("xmm0"); x)
#define $regF1 $expr(register double x __asm("xmm1"); x)
#define $regF2 $expr(register double x __asm("xmm2"); x)
#define $regF3 $expr(register double x __asm("xmm3"); x)
#define $regF4 $expr(register double x __asm("xmm4"); x)
#define $regF5 $expr(register double x __asm("xmm5"); x)
#define $regF6 $expr(register double x __asm("xmm6"); x)
#define $regF7 $expr(register double x __asm("xmm7"); x)

void func(char const*Fmt, ...);
void callfunc(char const*Fmt, double D0, double D1, double D2, double D3,
double D4, double D5, double D6, double D7){
func(Fmt,$regF0,$regF1,$regF2,$regF3,$regF4,$regF5,$regF6,$regF7,
D0,D1,D2,D3,D4,D5,D6,D7);
/*
//gcc @ -Os/-Oz
 :
   0:   50  push   %rax
   1:   b0 08   mov$0x8,%al
   3:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
   8:   66 0f d6 3c 24  movq   %xmm7,(%rsp)
   d:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  12:   66 0f d6 34 24  movq   %xmm6,(%rsp)
  17:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  1c:   66 0f d6 2c 24  movq   %xmm5,(%rsp)
  21:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  26:   66 0f d6 24 24  movq   %xmm4,(%rsp)
  2b:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  30:   66 0f d6 1c 24  movq   %xmm3,(%rsp)
  35:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  3a:   66 0f d6 14 24  movq   %xmm2,(%rsp)
  3f:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  44:   66 0f d6 0c 24  movq   %xmm1,(%rsp)
  49:   48 8d 64 24 f8  lea-0x8(%rsp),%rsp
  4e:   66 0f d6 04 24  movq   %xmm0,(%rsp)
  53:   e8 00 00 00 00  callq  58 
54: R_X86_64_PLT32  func-0x4
  58:   48 83 c4 48 add$0x48,%rsp
  5c:   c3  retq
$sz(callfunc)=93

//clang @ -Os/-Oz
 :
   0:   48 83 ec 48 sub$0x48,%rsp
   4:   f2 0f 11 7c 24 38   movsd  %xmm7,0x38(%rsp)
   a:   f2 0f 11 74 24 30   movsd  %xmm6,0x30(%rsp)
  10:   f2 0f 11 6c 24 28   movsd  %xmm5,0x28(%rsp)
  16:   f2 0f 11 64 24 20   movsd  %xmm4,0x20(%rsp)
  1c:   f2 0f 11 5c 24 18   movsd  %xmm3,0x18(%rsp)
  22:   f2 0f 11 54 24 10   movsd  %xmm2,0x10(%rsp)
  28:   f2 0f 11 4c 24 08   movsd  %xmm1,0x8(%rsp)
  2e:   f2 0f 11 04 24  movsd  %xmm0,(%rsp)
  33:   b0 08   mov$0x8,%al
  35:   e8 00 00 00 00  callq  3a 
36: R_X86_64_PLT32  func-0x4
  3a:   48 83 c4 48 add$0x48,%rsp
  3e:   c3  retq   
$sz(callfunc)=63


//gcc @ -O1
 :
   0:   48 83 ec 48 sub$0x48,%rsp
   4:   f2 0f 11 7c 24 38   movsd  %xmm7,0x38(%rsp)
   a:   f2 0f 11 74 24 30   movsd  %xmm6,0x30(%rsp)
  10:   f2 0f 11 6c 24 28   movsd  %xmm5,0x28(%rsp)
  16:   f2 0f 11 64 24 20   movsd  %xmm4,0x20(%rsp)
  1c:   f2 0f 11 5c 24 18   movsd  %xmm3,0x18(%rsp)
  22:   f2 0f 11 54 24 10   movsd  %xmm2,0x10(%rsp)
  28:   f2 0f 11 4c 24 08   movsd  %xmm1,0x8(%rsp)
  2e:   f2 0f 11 04 24  movsd  %xmm0,(%rsp)
  33:   b8 08 00 00 00  mov$0x8,%eax
  38:   e8 00 00 00 00  callq  3d 
39: R_X86_64_PLT32  func-0x4
  3d:   48 83 c4 48 add$0x48,%rsp
  41:   c3  retq   
$sz(callfunc)=66
*/
}

https://godbolt.org/z/d8T3hxqWK

[Bug preprocessor/109704] New: #pragma {push,pop}_macro broken for identifiers that contain dollar signs at nonfirst positions

2023-05-02 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109704

Bug ID: 109704
   Summary: #pragma {push,pop}_macro broken for identifiers that
contain dollar signs at nonfirst positions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: preprocessor
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

This following dollarsign-less example compiles fine as expected:

#define MACRO 1
_Static_assert(MACRO,"");
#pragma push_macro("MACRO")
#undef MACRO
#define MACRO 0
_Static_assert(!MACRO,"");
#pragma pop_macro("MACRO")
_Static_assert(MACRO,""); //OK


Substituting $MACRO for MACRO still works, but with MACRO$ or M$CRO the final
assertions fail: https://godbolt.org/z/n1EoGao74

[Bug tree-optimization/93265] memcmp comparisons of structs wrapping a primitive type not as compact/efficient as direct comparisons of the underlying primitive type under -Os

2023-05-01 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93265

--- Comment #3 from Petr Skocik  ---
Here's another example (which may be summarizing it more nicely)

struct a{ char _[4]; };
#include 
int cmp(struct a A, struct a B){ return !!memcmp(,,4); }

Expected x86-64 codegen (✓ for gcc -O2/-O3 and for clang -Os/-O2/-O3)   
xor eax, eax
cmp edi, esi
setne   al
ret

gcc -Os codegen:
subq$24, %rsp
movl$4, %edx
movl%edi, 12(%rsp)
leaq12(%rsp), %rdi
movl%esi, 8(%rsp)
leaq8(%rsp), %rsi
callmemcmp
testl   %eax, %eax
setne   %al
addq$24, %rsp
movzbl  %al, %eax
ret

https://godbolt.org/z/G5eE5GYv4

[Bug c/94379] Feature request: like clang, support __attribute((__warn_unused_result__)) on structs, unions, and enums

2023-04-23 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94379

--- Comment #2 from Petr Skocik  ---
Excellent! 

For optional super extra coolness, this might work (and clang doesn't do this)
with statement expressions too so that statement expression-based macros could
be
marked warn_unused_result through it too.


typedef struct  __attribute((__warn_unused_result__)) { int x; } 
wur_retval_t;

wur_retval_t foo(void){ int x=41; return (wur_retval_t){x+1}; }
#define foo_macro() ({  int x=41; (wur_retval_t){x+1}; })

void use(void){
foo();  //warn unused result ✓
foo_macro(); //perhaps should "warn unused result" too?
}

[Bug c/109567] New: Useless stack adjustment by 16 around calls with odd stack-argument counts on SysV x86_64

2023-04-20 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109567

Bug ID: 109567
   Summary: Useless stack adjustment by 16 around calls with odd
stack-argument counts on SysV x86_64
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

For function calls with odd stack argument counts, gcc generates a useless `sub
$16, %rsp` at the beginning of the calling function.

Example (https://godbolt.org/z/Y4ErE8ee9):
#include 
int callprintf_0stk(char const*Fmt){ return printf(Fmt,0,0,0,0,0),0; }
int callprintf_1stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1),0; } 
//useless sub $0x10,%rsp
int callprintf_2stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2),0; }
int callprintf_3stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2,3),0; }
//useless sub $0x10,%rsp
int callprintf_4stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2,3,4),0;
}
int callprintf_5stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0,
1,2,3,4,5),0; } //useless sub $0x10,%rsp
int callprintf_6stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0,
1,2,3,4,5,6),0; }
int callprintf_7stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0,
1,2,3,4,5,6,7),0; } //useless sub $0x10,%rsp

[Bug middle-end/108799] Improper deprecation diagnostic for rsp clobber

2023-04-01 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108799

Petr Skocik  changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #3 from Petr Skocik  ---
Very good question. The deprecation of SP clobbers could use some explanation
if there are indeed good reasons for it. 

IMO, if listing the SP as a clobber both (1) forces a frame pointer with
frame-pointer-relative addressing of spills (and the frame pointer isn't
clobbered too) and (2) avoids the use of the red zone (and it absolutely should
continue to do both of these things in my opinion) then gcc shouldn't need to
care about redzone clobbers (as in the `pushf;pop` example) or even a wide
class of stack pointer changes (assembly-made stack allocation and frees) just
as long as no spills made by the compiler are clobbered (or opened to being
clobbered from signal handlers) by such head-of-the-stack manipulation. Even
with assembly-less standard C that uses VLAs or allocas, gcc cannot count on
being in control of the stack pointer anyway, so why be so fussy about it when
something as expert-oriented as inline assembly tries to manipulate it?

[Bug c/108194] GCC won't treat two compatible function types as compatible if any of them (or both of them) is declared _Noreturn

2022-12-22 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108194

--- Comment #6 from Petr Skocik  ---
(In reply to Petr Skocik from comment #5)
> (In reply to Andrew Pinski from comment #4)
> > Invalid as mentioned in r13-3135-gfa258f6894801a .
> 
> I believe it's still a bug for pre-c2x __typeof.
> While it is GCC's prerogative to include _Noreturn/__attribute((noreturn))
> into the type for its own __typeof (which, BTW, I think is better design
> than the standardized semantics), I think two otherwise compatible function
> types should still remain compatible if they both either have or don't have
> _Noreturn/__attribute((noreturn)). But treating `_Noreturn void
> NR_FN_A(void);` 
> as INcompatible with `_Noreturn void NR_FN_B(void);` that's just wonky, IMO.

OK, the bug was MINE after all.

For bug report archeologists: I was doing what was meant to be a full
(qualifers-including) type comparison wrong. While something like
_Generic((__typeof(type0)*)0, __typeof(type1)*:1, default:0) suffices to get
around _Generic dropping qualifs (const/volatile/_Atomic) in its controlling
expression, for function pointer types at single pointer layer of indirection,
the _Noreturn attribute will still get dropped in the controlling expression of
_Generic (I guess that makes sense because they're much more closely related to
functions that how another pointer type would be related to its target type)
and another pointer layer of indirection if required as in
`_Generic((__typeof(type0)**)0, __typeof(type1)**:1, default:0)`.

Thanks you all very much, especially jos...@codesourcery.com, who pointed me
(pun intended) to the right solution over email. :)

[Bug c/108194] GCC won't treat two compatible function types as compatible if any of them (or both of them) is declared _Noreturn

2022-12-21 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108194

Petr Skocik  changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #5 from Petr Skocik  ---
(In reply to Andrew Pinski from comment #4)
> Invalid as mentioned in r13-3135-gfa258f6894801a .

I believe it's still a bug for pre-c2x __typeof.
While it is GCC's prerogative to include _Noreturn/__attribute((noreturn)) into
the type for its own __typeof (which, BTW, I think is better design than the
standardized semantics), I think two otherwise compatible function types should
still remain compatible if they both either have or don't have
_Noreturn/__attribute((noreturn)). But treating `_Noreturn void NR_FN_A(void);` 
as INcompatible with `_Noreturn void NR_FN_B(void);` that's just wonky, IMO.

[Bug c/108194] New: GCC won't treat two compatible function types as compatible if any of them (or both of them) is declared _Noreturn

2022-12-21 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108194

Bug ID: 108194
   Summary: GCC won't treat two compatible function types as
compatible if any of them (or both of them) is
declared _Noreturn
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

(same with __attribute((noreturn))) Example (https://godbolt.org/z/ePGd95sWz):


void FN_A(void);
void FN_B(void);
_Noreturn void NR_FN_A(void);
_Noreturn void NR_FN_B(void);

_Static_assert(_Generic((__typeof(*(FN_A))*){0}, __typeof(*(FN_B))*: 1), "");
//OK ✓
_Static_assert(_Generic((__typeof(*(NR_FN_A))*){0}, __typeof(*(NR_FN_B))*: 1),
""); //ERROR ✗
_Static_assert(_Generic((__typeof(*(FN_A))*){0}, __typeof(*(NR_FN_B))*: 1),
""); //ERROR ✗

As you can see from the Compiler Explorer link, clang accepts all three, which
is as it should be as per the standard, where _Noreturn is a function specifier
(https://port70.net/~nsz/c/c11/n1570.html#6.7.4), which means it shouldn't even
go into the type.

(Personally, I don't even mind it going into the type just as long as two
otherwise identical _Noreturn functio declarations are deemed as having the
same type).

Regards,
Petr Skocik

[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-12-17 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

--- Comment #9 from Petr Skocik  ---
Regarding the size of alloca/VLA-generated code under -fstack-clash-protection.
I've played with this a little bit and while I love the feature, the code size
increases seem quite significant and unnecessarily so.

Take a simple

void ALLOCA_C(size_t Sz){ char buf[Sz]; asm volatile ("" : : "r"([0])); }

gcc -fno-stack-clash-protection: 17 bytes
gcc -fstack-clash-protection: 72 bytes

clang manages with less of an increase:

-fno-stack-clash_protection: 26 bytes
-stack-clash-protection: 45 bytes

Still this could be as low as 11 bytes  for the -fclash-stack-protection
version (less than for the unprotected one!) all by using a simple call to an
assembly function, whose code can be no-clobber without much extra effort.

Linked in compiler explorer is a crack at the idea along with benchmarks: 
https://godbolt.org/z/f8rhG1ozs

The performance impact of the call seems negligible (practically less than 1ns,
though in the above quick-and-dirty benchmark it fluctuates a tiny bit,
sometimes even giving the non-inline version an edge).

I originally suggested popping the address of the stack and repushing before
calling returning. Ended up just repushing -- the old return address becomes
part of the alloca allocation. The concern that this could mess up the return
stack buffer of the CPU seems valid but all the benchmarks indicate it
doesn't--not even when the ret address is popped--just as long as the return
target address is the same. 

(When it isn't, the performance penalty is rather significant: measured a 19
times slowdown of that for comparison (it's also in the linked benchmarks)).

The (x86-64) assembly function:
#define STR(...) STR__(__VA_ARGS__) //{{{
#define STR__(...) #__VA_ARGS__ //}}}
asm(STR(
.global safeAllocaAsm;
safeAllocaAsm: //no clobber, though does expect 16-byte aligned at entry as
usual
push %r10;
cmp $16, %rdi;
ja .LsafeAllocaAsm__test32;
push 8(%rsp);
ret;
.LsafeAllocaAsm__test32:
push %r10;
push %rdi;
mov %rsp, %r10;
sub $17, %rdi;
and $-16, %rdi; //(-32+15)&(-16) //substract the 32 and 16-align, rounding
up
jnz .LsafeAllocaAsm__probes;
.LsafeAllocaAsm__ret:
lea (3*8)(%r10,%rdi,1), %rdi;
push (%rdi);
mov -8(%rdi), %r10;
mov -16(%rdi), %rdi;
ret;
.LsafeAllocaAsm__probes:
sub %rdi, %r10;  //r10 is the desired rsp
.LsafeAllocaAsm__probedPastDesiredSpEh:
cmp %rsp, %r10; jge .LsafeAllocaAsm__pastDesiredSp;
orl $0x0,(%rsp);
sub $0x1000,%rsp;
jmp .LsafeAllocaAsm__probedPastDesiredSpEh;
.LsafeAllocaAsm__pastDesiredSp:
mov %r10, %rsp; //set the desired sp
jmp .LsafeAllocaAsm__ret;
.size safeAllocaAsm, .-safeAllocaAsm;
));

Cheers, 
Petr Skocik

[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-11-24 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

--- Comment #7 from Petr Skocik  ---
(In reply to Jakub Jelinek from comment #4)
> Say for
> void bar (char *);
> void
> foo (int x, int y)
> {
>   __attribute__((assume (x < 64)));
>   for (int i = 0; i < y; ++i)
> bar (__builtin_alloca (x));
> }
> all the alloca calls are known to be small, yet they can quickly cross pages.
> Similarly:
> void
> baz (int x)
> {
>   if (x >= 512) __builtin_unreachable ();
>   char a[x];
>   bar (a);
>   char b[x];
>   bar (b);
>   char c[x];
>   bar (c);
>   char d[x];
>   bar (d);
>   char e[x];
>   bar (e);
>   char f[x];
>   bar (f);
>   char g[x];
>   bar (g);
>   char h[x];
>   bar (h);
>   char i[x];
>   bar (i);
>   char j[x];
>   bar (j);
> }
> All the VLAs here are small, yet together they can cross a page.
> So, we'd need to punt for dynamic allocations in loops and for others
> estimate
> the maximum size of all the allocations together (+ __builtin_alloca
> overhead + normal frame size).

I think this shouldn't need probes either (unless you tried to coalesce the
allocations) on architectures where making a function call touches the stack.
Also alloca's of less than or equal to half a page intertwined with writes
anywhere to the allocated blocks should be always safe (but I guess I'll just
turn stack-clash-protection off in the one file where I'm making such clearly
safe dynamic stack allocations).

[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-11-23 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

--- Comment #6 from Petr Skocik  ---
(In reply to Jakub Jelinek from comment #2)
> (In reply to Petr Skocik from comment #1)
> > Sidenote regarding the stack-allocating code for cases when the size is not
> > known to be less than pagesize: the code generated for those cases is quite
> > large. It could be replaced (at least under -Os) with a call to a special
> > assembly function that'd pop the return address (assuming the target machine
> > pushes return addresses to the stack), allocate adjust and allocate the
> > stack size in a piecemeal fashion so as to not skip guard pages, the repush
> > the return address and return to caller with the stacksize expanded.
> 
> You certainly don't want to kill the return stack the CPU has, even if it
> results in a few saved bytes for -Os.

That's a very interesting point  because I have written x86_64 assembly
"functions" that  did pop the return address, pushed something to the stack,
and then repushed the return address and returned. In a loop, it doesn't seem
to perform badly compared to inline code, so I figure it shouldn't be messing
with the return stack buffer. After all, even though the return happens through
a different place in the callstack, it's still returning to the original
caller. The one time I absolutely must have accidentally messed with the return
stack buffer was when I wrote context switching routine and originally tried to
"ret" to the new context. It turned out to be very measurably many times slower
that `pop %rcx; jmp *%rcx;` (also measured on a loop), so that's why I think
popping a return address, allocating on the stack, and then pushing and
returning is not really a performance killer (on my Intel CPU anyway). If it
was messing with the return stack buffer, I think would be getting  similar
slowdowns to what I got with context switching code trying to `ret`.

[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-11-23 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

--- Comment #1 from Petr Skocik  ---
Sidenote regarding the stack-allocating code for cases when the size is not
known to be less than pagesize: the code generated for those cases is quite
large. It could be replaced (at least under -Os) with a call to a special
assembly function that'd pop the return address (assuming the target machine
pushes return addresses to the stack), allocate adjust and allocate the stack
size in a piecemeal fashion so as to not skip guard pages, the repush the
return address and return to caller with the stacksize expanded.

[Bug c/107831] New: Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-11-23 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

Bug ID: 107831
   Summary: Missed optimization: -fclash-stack-protection causes
unnecessary code generation for dynamic stack
allocations that are clearly less than a page
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

I'm talking allocations such as

char buf [ (uint8_t)size ];

The resulting code for this should ideally be the same with or without
-fstack-clash-protection as this can clearly never skip a whole page.

But gcc generates a big loop trying to touch every page-sized subpart of that
allocation.

https://godbolt.org/z/G8EbzbshK

[Bug c/106116] New: Missed optimization: in no_reorder-attributed functions, tail calls to the subsequent function could just be function-to-function fallthrough

2022-06-28 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106116

Bug ID: 106116
   Summary: Missed optimization: in no_reorder-attributed
functions, tail calls to the subsequent function could
just be function-to-function fallthrough
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Example:

__attribute((noinline,no_reorder))
int fnWithExplicitArg(int ExplicitArg);

__attribute((noinline,no_reorder))
int fnWithDefaultArg(void){ return fnWithExplicitArg(42); }

int fnWithExplicitArg(int ExplicitArg){
int useArg(int);
return 12+useArg(ExplicitArg);
}

Generated fnWithDefaultArg:

fnWithDefaultArg:
mov edi, 42
jmp fnWithExplicitArg
fnWithExplicitArg:
//...

Desired fnWithDefaultArg


fnWithDefaultArg:
mov edi, 42
//fallthru
fnWithExplicitArg:
//...

https://gcc.godbolt.org/z/Ph3onxoh9

[Bug target/85927] ud2 instruction generated starting with gcc 8

2021-12-31 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85927

Petr Skocik  changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #5 from Petr Skocik  ---
I think it'd be more welcome if gcc just put nothing there like clang does.

[Bug c/102096] New: Gcc unnecessarily initializes indeterminate variables passed across function boundaries

2021-08-27 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102096

Bug ID: 102096
   Summary: Gcc unnecessarily initializes indeterminate variables
passed across function boundaries
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Compared to clang where:

long ret_unspec(void){ auto long rv; return rv; }

void take6(long,long,long,long,long,long);

void call_take6(void)
{
//6 unnecessary XORs on GCC
auto long id0; //indeterminate
auto long id1; //indeterminate
auto long id2; //indeterminate
auto long id3; //indeterminate
auto long id4; //indeterminate
auto long id5; //indeterminate
take6(id0,id1,id2,id3,id4,id5);
}

yields (x86_64):

ret_unspec:# @ret_unspec2
retq
call_take6: # @call_take6
jmp take6

(1+5 bytes), GCC compiles the above to
ret_unspec2:
xorl%eax, %eax
ret
call_take6:
xorl%r9d, %r9d
xorl%r8d, %r8d
xorl%ecx, %ecx
xorl%edx, %edx
xorl%esi, %esi
xorl%edi, %edi
jmp take6

(3+19 bytes), unnecessarily 0-initializing  the indeterminate
return-value/arguments.

Type casting the called function can often be hackishly used to get the same
assembly but doing so is technically UB and not as generic as supporting the
passing of unspecified arguments/return values, which can be used to omit
argument register initializations not just for arguments at the end of an
argument pack but also in the middle.

TL;DR: Allowing to passing/return indeterminate variables without generating
initializing code for them would be nice. Clang already does it.

[Bug c/98418] Valid integer constant expressions based on expressions that trigger -Wshift-overflow are treated as non-constants

2020-12-24 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98418

pskocik at gmail dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from pskocik at gmail dot com ---
You're right. The bug was in my code.

struct foo { unsigned bit: (0xll<<40)!=0; };

is indeed UB due to http://port70.net/~nsz/c/c11/n1570.html#6.5.7p4, but

struct foo { unsigned bit: (0xull<<40)!=0; };

isn't and GCC accepts it without complaint.

Apologies for the false alarm.

[Bug c/98418] Valid integer constant expressions based on expressions that trigger -Wshift-overflow are treated as non-constants

2020-12-24 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98418

--- Comment #2 from pskocik at gmail dot com ---
You're right. The bug was in my code.

struct foo { unsigned bit: (0xll<<40)!=0; };

is indeed UB due to http://port70.net/~nsz/c/c11/n1570.html#6.5.7p4, but

struct foo { unsigned bit: (0xull<<40)!=0; };

isn't and GCC accepts it without complaint.

Apologies for the false alarm.

[Bug c/98418] New: Valid integer constant expressions based on expressions that trigger -Wshift-overflow are treated as non-constants

2020-12-22 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98418

Bug ID: 98418
   Summary: Valid integer constant expressions based on
expressions that trigger -Wshift-overflow are treated
as non-constants
   Product: gcc
   Version: 6.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

This causes things like:

struct foo { unsigned bit: (0xll<<40)!=0; };

to elicit a -pedantic warning about the bitfield width not being a proper
integer constant expression, even though it is.

In other contexts, a complete compilation error might ensue:

extern int bar[ (0xll<<40)!=0 ]; //seen as an invalid VLA


https://gcc.godbolt.org/z/7zfz96

Neither clang nor gcc <= 5 appear to have this bug.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93241 seems related.

[Bug c/96625] New: Unnecessarily large assembly generated when a bit-offsetted higher-end end of a uint64_t-backed bitfield is shifted toward the high end (left) by its bit-offset

2020-08-15 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96625

Bug ID: 96625
   Summary: Unnecessarily large assembly generated when a
bit-offsetted higher-end end of a uint64_t-backed
bitfield is shifted toward the high end (left) by its
bit-offset
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

(Bitfields backed by 32-bit unsigneds are handled well.)

My example (https://gcc.godbolt.org/z/Yac38T):

#include 
#define FRONTSZ 3
#define UTYPE uint64_t
struct s{ union {
UTYPE whole;
struct {
UTYPE front:FRONTSZ,
  tail:8*sizeof(UTYPE)-FRONTSZ; };
};};

UTYPE hiShifted_tail(struct s X) { return X.tail<>FRONTSZ< (14 bytes):
   0:   48 b8 f8 ff ff ff ff ff ff 1f   movabs rax,0x1ff8
   a:   48 21 f8andrax,rdi
   d:   c3  ret


  (8 bytes):
   0:   48 89 f8movrax,rdi
   3:   48 83 e0 f8 andrax,0xfff8
   7:   c3  ret

The codegen follows the same pattern for other front-sizes.
hiShifted_tail() on clang (regardless of whether uint64_t or uint32_t is used
as the backing type) and on gcc with uint32_t rather than uin64_t used as the
bitfield-backing-type follows the smaller codegen patter of
hiShifted_tail{2,3}.

[Bug c/96420] New: -Wsign-extensions warnings are generated from system header macros

2020-08-02 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96420

Bug ID: 96420
   Summary: -Wsign-extensions warnings are generated from system
header macros
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Gcc doesn't silence -Wsign-conversion warnings in the expansion of
system-header macros (e.g., in the expansion of Musl's/Cygwin's FD_SET) unlike
other warnings in system-header macros.

E.g.,

#include 
void f(int X)
{
fd_set set;
FD_ZERO();
FD_SET(X,);
FD_CLR(X+1,);
(void)FD_ISSET(X+2,);
}

generates -Wsign-conversion warnings when compiled with musl-gcc or with gcc on
Cygwin.

Arguably, this should be fixed in the respective c libs, but the treatment of
-Wsign-conversion in system-header macro expansion does seem inconsistent with
that of other warnings in that context.

[Bug c/95857] New: Silencing an unused label warning with (void)& can make gcc segfault

2020-06-24 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95857

Bug ID: 95857
   Summary: Silencing an unused label warning with (void)&
can make gcc segfault
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Created attachment 48777
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48777=edit
preprocessed reproducer that crashes gcc >= 8.1 at -O2/-O3/-Os

In certain more complex contexts and with optimization on (>= -O2), silencing
-Wunused-label warnings  with
(void)& will make gcc segfault.

The attached example ( https://gcc.godbolt.org/z/iEhgL2 ) obtained with creduce
 crashes gcc >= 8.1 when compiled at -O2/-O3/-Os.

I haven't observed the bug in older versions of gcc.

[Bug c/95126] New: Missed opportunity to turn static variables into immediates

2020-05-14 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95126

Bug ID: 95126
   Summary: Missed opportunity to turn static variables into
immediates
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Example:

For:

struct small{ short a,b; signed char c; };

void call_func(void)
{
extern int func(struct small X);
static struct small const s = { 1,2,0 };
func(s);
}

clang renders (x86_64):

 :
   0:   bf 01 00 02 00  movedi,0x20001
   5:   e9 00 00 00 00  jmpa 6:
R_X86_64_PLT32   func-0x4

whereas gcc renders:

 :
   0:   0f b7 3d 00 00 00 00movzx  edi,WORD PTR [rip+0x0]# 7
3: R_X86_64_PC32.rodata-0x2
   7:   0f b7 05 00 00 00 00movzx  eax,WORD PTR [rip+0x0]# e
a: R_X86_64_PC32.rodata-0x4
   e:   48 c1 e7 10 shlrdi,0x10
  12:   48 09 f8or rax,rdi
  15:   0f b7 3d 00 00 00 00movzx  edi,WORD PTR [rip+0x0]# 1c
  18: R_X86_64_PC32   .rodata
  1c:   48 c1 e7 20 shlrdi,0x20
  20:   48 09 c7or rdi,rax
  23:   e9 00 00 00 00  jmp28   24:
R_X86_64_PLT32  func-0x4


https://gcc.godbolt.org/z/Qxq6Rh

[Bug middle-end/94703] Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union

2020-05-13 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703

--- Comment #11 from pskocik at gmail dot com ---
Thanks for the shot at a fix, Richard Biener.

Since I have reported this, I think I should mentioned a related suboptimality
that should probably be getting fixed alongside with this (if this one is
getting fixed), namely that while


int64_t zextend_int_to_int64_nospill(int *X) 
{ 
union { int64_t _; } r = {0}; return memcpy(_,X,sizeof(*X)),r._;
}

(and hopefully later even 

int64_t zextend_int_to_int64_spill(int *X) { int64_t r = {0}; return
memcpy(,X,sizeof(*X)),r; }
)

generates, on x86_64, the optimal

zextend_int_to_int64_nospill:
mov eax, DWORD PTR [rdi]
ret

for zeroextending promotions of sub-int types, an extra xor instruction gets
generated, e.g.:


int64_t zextend_short_to_int64_nospill_but_suboptimal(short *X) 
{
union { int64_t _; } r ={0}; return memcpy(_,X,sizeof(*X)),r._;
}

=>

zextend_short_to_int64_nospill_but_suboptimal:
xor eax, eax
mov ax, WORD PTR [rdi]
ret

which was surprising to me because it doesn't happen with zero-extending
memcpy-based promotion from {,u}ints to larger types ({,u}{,l}longs).

https://gcc.godbolt.org/z/ZjXaCw

[Bug c/94703] New: Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union

2020-04-21 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703

Bug ID: 94703
   Summary: Small-sized  memcpy leading to unnecessary register
spillage unless done through a dummy union
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

The problem, demonstrated in code examples below, can be suppressed by
memcpying into a union (possibly just a one-member union), but that seems like
a silly workaround that shouldn't be required.

Examples:

#include 
#include 

uint64_t get4_1(void const *X)
{
//spills
uint64_t r = 0; memcpy(,X,4); return r;
}

uint64_t get4_nospill(void const *X)
{
//doesn't spill
union { uint64_t u64; } u = {0};
memcpy(,X,sizeof(uint32_t));
return u.u64;
}

uint64_t get2_1(void const *X)
{
//spills
uint64_t r = 0; memcpy(,X,2); return r;
}


uint64_t get2_nospill(void const *X)
{
//doesn't spill
union { uint64_t u64; } u = {0};
memcpy(,X,sizeof(uint16_t));
return u.u64;
}

void backend(void const*Src, size_t Sz);
static inline void valInPtrInl(void *Src, size_t Sz)
{
if(Sz<=sizeof(void const*)){
#if 1 //spills
void const*inlSrc; 
memcpy(,Src,Sz);
backend(inlSrc,Sz); return;
#else
//doesn't spill
union{ void const*inlSrc; } u;
memcpy(,Src,Sz);
backend(u.inlSrc,Sz); return;
#endif
}

backend(Src,Sz);
return;

}
void valInPtr(int X) { valInPtrInl(,sizeof(X)); }

GCC 9.3 output on x86_64:

get4_1:
mov QWORD PTR [rsp-8], 0
mov eax, DWORD PTR [rdi]
mov DWORD PTR [rsp-8], eax
mov rax, QWORD PTR [rsp-8]
ret
get4_nospill:
mov eax, DWORD PTR [rdi]
ret
get2_1:
mov QWORD PTR [rsp-8], 0
movzx   eax, WORD PTR [rdi]
mov WORD PTR [rsp-8], ax
mov rax, QWORD PTR [rsp-8]
ret
get2_nospill:
xor eax, eax
mov ax, WORD PTR [rdi]
ret
valInPtr:
mov DWORD PTR [rsp-16], edi
mov rdi, QWORD PTR [rsp-16]
mov esi, 4
jmp backend

Clang 3.1 output on x86_64:

get4_1: # @get4_1
mov EAX, DWORD PTR [RDI]
ret

get4_nospill:   # @get4_nospill
mov EAX, DWORD PTR [RDI]
ret

get2_1: # @get2_1
movzx   EAX, WORD PTR [RDI]
ret

get2_nospill:   # @get2_nospill
movzx   EAX, WORD PTR [RDI]
ret

valInPtr:   # @valInPtr
mov EDI, EDI
mov ESI, 4
jmp backend # TAILCALL


https://gcc.godbolt.org/z/rwq2UY

[Bug middle-end/93487] Missed tail-call optimizations

2020-04-20 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93487

--- Comment #3 from pskocik at gmail dot com ---
The gist of this along with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93540
is "please make trivial aggregates (i.e., aggregates, which are ultimately a
native type) a true zero-cost abstraction".

I feel like we shouldn't have to pay for `struct Int { int _; };` (or a union
of int with some <= types) over `int`, but on gcc (contrast with clang), you
effectively have to:

/
int intfunc(void);
int intfuncwrap(void) { return intfunc(); }

=>
jmp5 

/
struct Int { int x; };
struct Int intfuncwrap2(void) { return (struct Int){intfunc()}; }

=>
push   rax
call   6 
poprdx
ret
/

Clang has been doing this right since clang 3 (and Compiler Explorer doesn't
have an older version):  https://gcc.godbolt.org/z/VSUHs_ .

Here's a related, but opposite, example where a trivial (one-member) union gets
optimized better than its contained type when used directly: 
https://gcc.godbolt.org/z/egXRjJ . 

These trivial type wrappings shouldn't affect codegen positively or negatively,
but they do on gcc.

[Bug c/66425] (void) cast doesn't suppress __attribute__((warn_unused_result))

2020-03-28 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66425

pskocik at gmail dot com changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #38 from pskocik at gmail dot com ---
I like this behavior. I use (void) casts to suppress warnings about unused
parameters and variables, but I'd rather suppressing WUR weren't that simple
because of functions whose return result represents an allocated resource
(allocated memory, FILE, filedescriptor, etc.), in which case the suppression
is in 99% cases erroneous.

Of course, WUR is also useful as an aid in enforcing consistent error checking
but a codebase using WUR like that might as well define an custom IGNORE macro
(which assigns the result to a properly typed temporary and then voids it) and
make sure such a macro only works on return values which are truly safe to
ignore (e.g., rather than returning plain int, long, etc., you might return
struct ignorable_int { int ignorable_retval; };, struct ignorable_long { long
ignorable_retval; }, etc. and have your ignore macro try and access the
specifically named member).

(An ability to directly attach WUR to such types, which clang has gcc currently
doesn't (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94379), would also go
nicely with this un-void-able WURs feature (although WURs are void-able on
clang)).

[Bug c/94379] New: Feature request: like clang, support __attribute((__warn_unused_result__)) on structs, unions, and enums

2020-03-28 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94379

Bug ID: 94379
   Summary: Feature request: like clang, support
__attribute((__warn_unused_result__)) on structs,
unions, and enums
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Clang supports applying the warn_unused_result attribute to enums, structs, and
unions, which has the effect that functions returning such an attributed
enum/struct/union behaves as if it itself had the warn_unused_result attribute.


Example:

typedef struct __attribute__((__warn_unused_result__)) aStructType{ int x; }
aStructType;
aStructType getStruct(void);

typedef union __attribute__((__warn_unused_result__)) aUnionType{ int x; }
aUnionType;
aUnionType getUnion(void);

typedef enum __attribute__((__warn_unused_result__)) anEnumType{
anEnumarationConstant } anEnumType;
anEnumType getEnum(void);

int main()
{
getEnum();
getStruct();
getUnion();
}
// https://gcc.godbolt.org/z/jyHhLx

I find this to be a very useful feature, and it would be nice if gcc had it
(along with its current un-void-able warn_unused_result
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66425)).

[Bug tree-optimization/87313] attribute malloc not used for alias analysis when it could be

2020-02-16 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87313

pskocik at gmail dot com changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #4 from pskocik at gmail dot com ---
If (when?) this optimization is implemented, it would also be great if
returning `type *restrict`, `struct somestruct { /*...*/ type *restrict p;
/*...*/ }`, or an equivalent of these via a pointer (e.g., as in `void
my_malloc(void *restrict*Result, size_t Sz);`) resulted in the same
optimization being applied ( unless I'm mistaken in that `restrict` applied in
these context implies the same (__attribute((malloc))-like) semantics).

[Bug c/93540] New: Attributes pure and const not working with aggregate return types, even trivial ones

2020-02-01 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93540

Bug ID: 93540
   Summary: Attributes pure and const not working with aggregate
return types, even trivial ones
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Example:

#define SIMPLE 0
#include 
#if SIMPLE
typedef int TYPE;
#else
typedef struct TYPE { int a; } TYPE;
#endif

//__attribute((pure))
__attribute((const))
TYPE get(void);

void TEST(void)
{
#if !SIMPLE  // :( generates repeated calls
if(get().a==0) abort();
if(get().a==0) abort();
if(get().a==0) abort();
if(get().a==0) abort();
#else  //OK, 1 call
if(get()==0) abort();
if(get()==0) abort();
if(get()==0) abort();
if(get()==0) abort();
#endif
}
///

https://gcc.godbolt.org/z/N79MCx

[Bug c/93487] New: Missed tail-call optimizations

2020-01-29 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93487

Bug ID: 93487
   Summary: Missed tail-call optimizations
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Given, for example:

#include 
typedef union lp_tp {
long l;
void *p;
} lp_tp ;
typedef union ilp_tp{
int i;
long l;
void *p;
lp_tp lp;
} ilp_tp;

long lcallee(void);
int icallee(void);
void *pcallee(void);
lp_tp lpcallee(void);

//tail calls on clang but not on gcc
int l2i(void){ return lcallee(); }
ilp_tp l_caller(void) { long rc = lcallee(); return (ilp_tp){.l=rc}; }
ilp_tp p_caller(void) { void *rc= pcallee(); return (ilp_tp){.p=rc}; }
ilp_tp lp_caller(void) { lp_tp rc = lpcallee(); return (ilp_tp){.lp =
rc}; }
ilp_tp lp_caller2(void) { lp_tp rc = lpcallee(); return (ilp_tp){.p =
rc.p}; }

struct foo* p2p_caller(void) { return pcallee(); } //optimized on both
uintptr_t p2up_caller(void) { return (uintptr_t)pcallee(); }
//optimized on both

//not optimized by either
ilp_tp i_caller(void) { int rc = icallee(); ilp_tp r; r.i=rc; return r;
}

clang (x86_64) is able to turn all of these calls except the last one into tail
calls but gcc tailcall-optimizes only the pointer-to-pointer conversions.

https://gcc.godbolt.org/z/Lw9-D2

[Bug tree-optimization/93447] New: Value range propagation not working at -Os

2020-01-26 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93447

Bug ID: 93447
   Summary: Value range propagation not working at -Os
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

I have a lot of cases where I'd like to translate boolean conditions to negated
errno codes (possibly wrapped in an struct or a trivial union).

If this translation is inlinable, I'd expect that undoing it to get a boolean
again (bool => negated errno => bool) would eliminate the roundtrip.

GCC indeed does this at -O2 and -O3, but the optimization's failing to kick in
at -Os, leading to code size increases.

Clang succeeds at eliminating the roundrip at -Os (and it does this
optimization already at -O1).

A simple example that generates unnecessary code at -Os:

#include 

_Bool addb_simple(int A, int B, int *Rp)
{
int ec=0;
if(__builtin_add_overflow(A,B,Rp)) 
ec = -ERANGE;
return !!ec;
}

https://gcc.godbolt.org/z/tGkbtD


Thanks for looking into it!

[Bug c/93441] New: _Generic selections ought to be treated as parenthesized expressions as far as -Wlogical-not-parentheses is concerned

2020-01-26 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93441

Bug ID: 93441
   Summary: _Generic selections ought to be treated as
parenthesized expressions as far as
-Wlogical-not-parentheses is concerned
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

int x [ _Generic(0,int: !0) < 10 ]; //falsely triggers
-Wlogical-not-parentheses
int y [ (_Generic(0,int: !0)) < 10 ]; //OK

[Bug middle-end/26724] __builtin_constant_p fails to recognise function with constant return

2020-01-22 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26724

pskocik at gmail dot com changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #3 from pskocik at gmail dot com ---
I don't know if this is related, but what's been bugging me is that gcc's
__builtin_constant_p (clang's also, but in different situations) fails to
recognize the constness/non-constness of a `memcmp` call when it's used for
equality comparisons.

In such situations, I would like to use
__builtin_constant_p(!memcmp(...))?!memcmp(...):!my_memcmp(...) to call a
custom backend function (one not named `memcmp`) if real `memcmp` would be
called, but inline a constant otherwise.

Unfortunately there seem to be edge cases where this doesn't work and while the
assembly for a !memcmp(...) expression shows it's been folded to a constant, a
__builtin_constant_p around such an expression doesn't reflect that.

E.g., in functions eq_eh{2,3}_cexprEh in 
https://gcc.godbolt.org/z/6oefRX (on clang the correspondence breaks in
eq_eh1_cexprEh).

[Bug target/91298] $ at the beginging causing Error: junk `(%rip)' after expression

2020-01-22 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91298

--- Comment #7 from pskocik at gmail dot com ---
(In reply to CVS Commits from comment #6)
> The master branch has been updated by Jakub Jelinek :

Thank you for the fix!

[Bug target/91298] $ at the beginging causing Error: junk `(%rip)' after expression

2020-01-16 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91298

pskocik at gmail dot com changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #4 from pskocik at gmail dot com ---
Related https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45591 .

I've played with it and this simple patch

diff --git a/gcc/final.c b/gcc/final.c
index fefc4874b24a..ba7425afa667 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -4087,11 +4087,20 @@ output_addr_const (FILE *file, rtx x)
 case SYMBOL_REF:
   if (SYMBOL_REF_DECL (x))
assemble_external (SYMBOL_REF_DECL (x));
-#ifdef ASM_OUTPUT_SYMBOL_REF
-  ASM_OUTPUT_SYMBOL_REF (file, x);
-#else
-  assemble_name (file, XSTR (x, 0));
-#endif
+
+ {
+ bool dollar_eh = XSTR(x,0)[0] == '$';
+ if (dollar_eh) fputc('(',file);
+
+   #ifdef ASM_OUTPUT_SYMBOL_REF
+ ASM_OUTPUT_SYMBOL_REF (file, x);
+   #else
+ assemble_name (file, XSTR (x, 0));
+   #endif
+
+ if (dollar_eh) fputc(')',file);
+ }
+
   break;

 case LABEL_REF:

seems to fix it, at least for x86-64. Basically you need parentheses around
names of globals (at least those that start with `$`) when they're used as
operands.

The parentheses is what clang does.

Both clang and tinycc have no problem with this. It would be great if gcc could
catch up.

[Bug c/93239] Enhancement: allow unevaluated statement expressions at filescope

2020-01-16 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93239

--- Comment #1 from pskocik at gmail dot com ---
Fixing this seems as simple as removing/commenting-out:

gcc/c/c-parser.c:8195 /* If we've not yet started the current function's
statement list,
gcc/c/c-parser.c:8196or we're in the parameter scope of an old-style
function
gcc/c/c-parser.c:8197declaration, statement expressions are not
allowed.  */
gcc/c/c-parser.c:8198 if (!building_stmt_list_p () ||
old_style_parameter_scope ())
gcc/c/c-parser.c:8199   {
gcc/c/c-parser.c:8200 error_at (loc, "braced-group within expression
allowed "
gcc/c/c-parser.c:8201   "only inside a function");
gcc/c/c-parser.c:8202 parser->error = true;
gcc/c/c-parser.c:8203 c_parser_skip_until_found (parser,
CPP_CLOSE_BRACE, NULL);
gcc/c/c-parser.c:8204 c_parser_skip_until_found (parser,
CPP_CLOSE_PAREN, NULL);
gcc/c/c-parser.c:8205 expr.set_error ();
gcc/c/c-parser.c:8206 break;
gcc/c/c-parser.c:8207   }

This would be both very useful, and it makes all kind of sense, because other
expression constructs (function calls, comma expressions, ...) aren't
restricted syntactically either (just semantically), which means they _can_ be
inside untaken branches of constant-forming _Generic/__builtin_choose_expr, and
I can think of no good reason why statement expressions shouldn't be allowed
there too.

[Bug c/92935] typeof() on an atomic type doesn't always return the corresponding unqualified type

2020-01-14 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92935

--- Comment #3 from pskocik at gmail dot com ---
jos...@codesourcery.com, that's interesting, but it seems like an unnecessary,
weird special case, considering that gcc already has a qualifier-dropping
mechanism that doesn't necessitate special-casing __typeof for
_Atomic-qualified types.

Casting an expression to its own type (which on gcc works for aggregates too)
doe it.

Compilable example:

#if __clang__
#define DROP_Q(X) ((void)0,X) //clang rejects the casts for aggregate
types
#else
#define DROP_Q(X) (__extension__({ (__typeof(X))(X) ; }))
//__extension__ so aggregates are accepted
//even under -pedantic
#endif
int main(void)
{
#define TEST_RVAL_CONV(Tp) \
do{ \
_Atomic const volatile  Tp abar; \
const volatile  Tp bar; \
__typeof(DROP_Q(bar)) noqualif_bar; \
__typeof(DROP_Q(abar)) noqualif_abar; \
_Generic(_bar, Tp*: (void)0); \
_Generic(_abar, Tp*: (void)0); \
}while(0)

TEST_RVAL_CONV(int);
TEST_RVAL_CONV(__typeof(int*));
typedef struct s_tp { int x; } s_tp;
TEST_RVAL_CONV(s_tp);
}

https://gcc.godbolt.org/z/UtMyxM

I think all lvalue-ness-dropping expressions (e.g., the comma operator or ?: )
ought to drop top-level qualifs too (and they do on clang), and such a
qualif-dropping operation wouldn't then be dependent on the gcc extension of
allowing casts to non-scalar types, but unfortunately, gcc does not drop
top-level qualifs in rvalue-conversions, which means the clang implementation
of the qualifier-dropping macro doesn't work on gcc.

[Bug c/92935] typeof() on an atomic type doesn't always return the corresponding unqualified type

2020-01-14 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92935

pskocik at gmail dot com changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #1 from pskocik at gmail dot com ---
I don't think typeof is supposed to lose qualifiers. _Generic(Expr,...) loses
them for Expr in an rvalue conversion (also decays arrays to pointer), but
__typeof is supposed to preserve everything--it does preserve qualifiers in
other compilers (clang/tinycc) and in gcc:

_Atomic const int aci=0;
_Generic(, _Atomic const int*: (void)0); //ok
_Atomic typeof(const int) aci2=0;
_Generic(, _Atomic const int*: (void)0); //ok

but there does seem to be a bug in gcc in how typeof combines with pointer
symbols (*) and other qualifiers where gcc appears to be curiously dropping all
qualifiers if (and only if) one of the original qualifiers was _Atomic

_Generic((typeof(aci)*)0, _Atomic const int*: (void)0); //gcc error (int*),
ok on clang
_Generic((typeof(aci2)*)0, _Atomic const int*: (void)0); //gcc error
(int*), ok on clang
 _Generic((typeof(aci2) volatile*)0, _Atomic const volatile int*: (void)0);
//gcc error (int volatile*), ok on clang


Clang doesn't do this, and neither gcc or clang typeof drops any qualifiers if
there's no _Atomic among them:

 //no qualifs dropped if no _Atomic was involved
 const int ci=0;
_Generic(,  const int*: (void)0); //ok
 typeof(const int) ci2=0;
_Generic(,  const int*: (void)0); //ok


_Generic((typeof(ci)*)0,  const int*: (void)0); //ok
_Generic((typeof(ci2)*)0,  const int*: (void)0);  //ok
 _Generic((typeof(ci2) volatile*)0,  const volatile int*: (void)0); //ok

https://gcc.godbolt.org/z/TwtEGP

[Bug c/93265] New: memcmp comparisons of structs wrapping a primitive type not as compact/efficient as direct comparisons of the underlying primitive type under -Os

2020-01-14 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93265

Bug ID: 93265
   Summary: memcmp comparisons of structs wrapping a primitive
type not as compact/efficient as direct comparisons of
the underlying primitive type under -Os
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

`memcmp` comparisons of types such as `struct Int { int x; };` generate full
`memcmp` calls under `-Os` (also `-O1`)
These are much larger (/less efficient) than the direct primitive-type
comparisons that could have been used.

Example code:

#include 
//a contiguous struct wrapping a primitive type
typedef struct a_tp { int x; }a_tp; 
_Static_assert(sizeof(a_tp)==sizeof(int),"");

//compare a contiguous lvalue
#define CONTIG_EQ_EH(Ap,Bp) (!memcmp(Ap,Bp,sizeof(*(1?(Ap):(Bp) 

/
//>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
//A FULL MEMCPY :( under -Os (and -O1)
_Bool a_is42(a_tp X) {return CONTIG_EQ_EH(,&(a_tp const){42});}
//>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
/

_Bool i_is42(int X) {return X==42; } //direct cmp
_Bool i2_is42(a_tp X) {return X.x==42; } //same
_Bool i3_is42(a_tp X) {return CONTIG_EQ_EH(,&(int const){42});} //still a
direct cmp

https://gcc.godbolt.org/z/BC_QsN

[Bug c/93241] New: _Bool casts in dead branches of integer constant expressions cause undesirable warnings under -pedantic iff the dead branch contains overflow

2020-01-12 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93241

Bug ID: 93241
   Summary: _Bool casts in dead branches of integer constant
expressions cause undesirable warnings under -pedantic
iff the dead branch contains overflow
   Product: gcc
   Version: 5.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

About the simplest example of this I could get:

//erroneously warns about non-constness under -pedantic
_Static_assert( 0? (_Bool)(INT_MAX+1) : 1 ,"");

https://gcc.godbolt.org/z/W_tvTS

The problem seems to have existed since gcc 5.

[Bug c/93239] New: Enhancement: allow unevaluated statement expressions at filescope

2020-01-12 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93239

Bug ID: 93239
   Summary: Enhancement: allow unevaluated statement expressions
at filescope
   Product: gcc
   Version: 7.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

I've noticed gcc seems to syntactically disallow statement expressions at
filescope even in contexts where they wouldn't be evaluated such as:

   1) inside sizeof/__typeof/_Alignof
   2) inside _Generic branches that aren't taken

I'm currently trying to use 2) to implement some generic numerical macros that
evaluate to an integer constant expression iff their arguments are integer
constant expressions and at the same time don't double-evaluate their arguments
(if those aren't integer constant expressions).

A simple example would be:

#define SQ(X) _Generic(0?(void*)((X)*0):(int*)0, \
int*: /*isconstexpr(X)==1*/ (X)*(X), \
void *: /*isconstexpr(X)==0*/ (__extension__({
__typeof(X) SQ = (X); SQ *= (X); }))  )


Interestingly this works on tinycc (a much more primitive compiler) where it
can be used in filescope to give enum values, array sizes, or bit-field widths
or inside static asserts at filescope, but on gcc/clang, all of these must be
inside a function.

Of course, this can worked around by using an (inline) function for each
integer type and a second _Generic in the non-constexpr branch of the macro
that enumerates the helper functions, but that seems like a rather bloated
workaround necessitated only by what seems to be an unnecessary restriction in
the compiler.

[Bug c/93180] const function pointers placed in a custom section are causing that custom section to become writable

2020-01-07 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93180

--- Comment #5 from pskocik at gmail dot com ---
Jakub Jelinek, I later asked how this worked on Stack Overflow
(https://stackoverflow.com/questions/59629946/why-do-gcc-and-clang-place-custom-sectioned-const-funcptr-symbols-into-writable).
Got no answer there (yet), but your comment explains it nicely! Thanks!

[Bug c/93180] const function pointers placed in a custom section are causing that custom section to become writable

2020-01-06 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93180

--- Comment #3 from pskocik at gmail dot com ---
Thanks for explaining. Yes, -fPIC does cause the section to become writable on
clang.

I'm currently toying with using a custom section to gather const
function-pointers, but this -fPIC stuff is causing these const-pointers to be
effectively writable via __start_mysection/__stop_mysection, which is weird.

I thought the const data would get relocated all once at load time and then
become readonly, but it is staying writable with -fPIC.

Anyway, apologies for the false alarm.

Best regards,
Petr Skocik

[Bug c/93180] New: const function pointers placed in a custom section are causing that custom section to become writable

2020-01-06 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93180

Bug ID: 93180
   Summary: const function pointers placed in a custom section are
causing that custom section to become writable
   Product: gcc
   Version: 7.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

__attribute((__section__("mysection"))) int const cx = -42;

(or with multiple const data variables in the `mysection` section) results in
assembly output containing

.sectionmysection,"a",@progbits

Adding a const function pointer as in

#include 
__attribute((__section__("mysection"))) int (* const p)(char const*) = 

causes the section to be mapped to a writable segment

.sectionmysection,"aw",@progbits

Since the pointer is const, I think the section ought to remain read-only (on
clang it does).

[Bug c/91669] New: #pragma's and _Pragma's work but _Pragma's used in an equivalent macro don't

2019-09-05 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91669

Bug ID: 91669
   Summary: #pragma's and _Pragma's work but _Pragma's used in an
equivalent macro don't
   Product: gcc
   Version: 5.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

The problem appears to exist on all gcc versions.

Example Code:

#define BX_gcc_push(Category,...) BX_pragma(GCC Category push ) BX_pragma(GCC
Category __VA_ARGS__)
#define BX_gcc_pop(Category) BX_pragma(GCC Category pop)
#define BX_nodiag_push(DiagStr) BX_gcc_push(diagnostic, ignored DiagStr)
#define BX_nodiag_pop() BX_gcc_pop(diagnostic)
#define BX_pragma(...) _Pragma(#__VA_ARGS__)


int foo(void)
{

//This silences -Wreturn-type on the closing curly as it should
BX_nodiag_push("-Wreturn-type")
}
BX_nodiag_pop()



#define BX_retundef(Rbr) /*{{{*/ \
BX_nodiag_push("-Wreturn-type") \
Rbr \
BX_nodiag_pop()
/*}}}*/

int bar(void)
{
//This FAILS to silence -Wreturn on the closing curly
//(works on clang and the code obtained from text-expanding the macro (gcc -E)
//works on gcc too) 
BX_retundef(})

[Bug c/90552] New: attribute((optimize(3))) not overriding -Os

2019-05-20 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90552

Bug ID: 90552
   Summary: attribute((optimize(3))) not overriding -Os
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

I test-compiled ( https://gcc.godbolt.org/z/8bhbNa ):

__attribute((optimize(3))) int div(int X) { return X/3; }

with -O{0,1,2,3,s}, expecting to get the same assembly in all cases, but
__attribute((optimize(3))) is failing to override the last case, namely -Os.

(I'd like the function to not use the idiv instruction even if the rest of the
file is compiled with -Os).

Please correct me if I'm wrong to expect `__attribute((optimize(3)))` to be
able to override `-Os`.

This behavior appears to exist on all gcc versions.

[Bug c/39985] Type qualifiers not actually ignored on function return type

2019-03-15 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39985

pskocik at gmail dot com changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #8 from pskocik at gmail dot com ---
(In reply to Eric Gallager from comment #6)
> (In reply to jos...@codesourcery.com from comment #5)
> > In C, in C11 mode, type qualifiers are completely ignored on function 
> > return types, including not affecting type compatibility, after my commit:
> > 
> > r236231 | jsm28 | 2016-05-13 21:35:39 + (Fri, 13 May 2016) | 46 lines
> > 
> > Implement C11 DR#423 resolution (ignore function return type qualifiers).
> 
> So can this be closed then?

As of 8.2, it doesn't appear to work properly yet.

It looks like the top level qualifs on the return type aren't being ignored
if the return type is sealed in a typedef or __typeof.

typedef int const ic_tp;
int const f(); //ignores the const here
ic_tp f(); //breaks because the const isn't ignored here

Same with:

int const f(); //ignored here
__typeof(int const) f(); //not ignored here

The examples in Godbolt: https://gcc.godbolt.org/z/GVvkmJ

[Bug c/65455] typeof _Atomic fails

2019-03-01 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65455

pskocik at gmail dot com changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #22 from pskocik at gmail dot com ---
(In reply to Marek Polacek from comment #18)
> So this looks like a dup of PR39985.  It seems that, if anything, we should
> modify __typeof to drop all qualifiers.  I.e. that all of the following
> __typeofs yield "int":
> 
> const int a;
> volatile int b;
> const volatile c;
> _Atomic int d;
> int *restrict e;
> __typeof (a) x;
> __typeof (b) y;
> __typeof (c) q;
> __typeof (d) r;
> __typeof (const int) z;
> __typeof (volatile const int) w;
> __typeof (volatile int) v;
> __typeof (_Atomic volatile int) t;
> __typeof (*e) *s;
> 
> Or is that not so?
> 
> What should we do for C++?

As a user, I can always force top-level-qualifier dropping by rvalue conversion
(e.g., with , or ?:) but it(In reply to Jens Gustedt from comment #20)
> I would be much happier with a generic operator that makes any object into
> an rvalue. One way that comes close would be `1 ? (X) : (X)`. This is an
> expression that transforms any expression `X` that is not a narrow integer
> type into an rvalue. 
> 
> Unfortunately it is too ugly that anybody ever will systematically write
> `__typeof__(1?(X):(X))`. But a macro
> 
> #define __typeof_unqual__(X) __typeof__(1?(X):(X))
> 
> could do. (And one could fix the finite number of cases that are not covered
> with `_Generic`.)
> 
> I'd like to have prefix `+` for that. This could be useful in `__typeof__`
> but also in `_Generic`. Maybe gcc could extend that operator to be
> applicable to all types.

(In reply to Jens Gustedt from comment #20)
> I would be much happier with a generic operator that makes any object into
> an rvalue. One way that comes close would be `1 ? (X) : (X)`. This is an
> expression that transforms any expression `X` that is not a narrow integer
> type into an rvalue. 
> 
> Unfortunately it is too ugly that anybody ever will systematically write
> `__typeof__(1?(X):(X))`. But a macro
> 
> #define __typeof_unqual__(X) __typeof__(1?(X):(X))
> 
> could do. (And one could fix the finite number of cases that are not covered
> with `_Generic`.)
> 
> I'd like to have prefix `+` for that. This could be useful in `__typeof__`
> but also in `_Generic`. Maybe gcc could extend that operator to be
> applicable to all types.

I agree __typeof should keep all top level qualifs (clang's __typeof does). But
I'd rather the unary + were not extended to non-numeric types. I frequently
rely on it to throw comptime errors when applied to non-numerics. I think the
comma should be able to accomplish the job (__typeof(0,X)) with similar brevity
as that of the unary +.

[Bug c/66918] Disable "inline function declared but never defined" warning

2019-02-19 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66918

--- Comment #8 from pskocik at gmail dot com ---
I'd also very much welcome a way to silence this (like with
-Wno-undefined-inline on clang).

My reason for wanting it is I'd like to prototype a non-static inline function
in one header (a fast-to-include header), define it in another (a
slower-to-parse header that might not always be needed), and have both headers
includable in the same translation unit.

Dummy example:

/*first.h*/
inline void f(void);

/*second.h*/
//#include "first.h"
inline void f(void){}

Unfortunately, if only the first header is included, gcc's generating this
unsilencable warning unless I drop the `inline` from the prototype, but if I do
and if I then also include the second header with the definition, then the
prototype without the inline will turn into an unwanted instantiation and
linker errors down the road.

[Bug c/66918] Disable "inline function declared but never defined" warning

2019-02-19 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66918

pskocik at gmail dot com changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #7 from pskocik at gmail dot com ---
I'd also very much welcome a way to silence this (like with
-Wno-undefined-inline on clang).

My reason for wanting it is I'd like to prototype a non-static inline function
in one header (a fast-to-include header), define it in another (a
slower-to-parse header that might not always be needed), and have both headers
includable in the same translation unit.

Dummy example:

/*first.h*/
inline void f(void);

/*second.h*/
//#include "first.h"
inline void f(void){}

Unfortunately, if only the first header is included, gcc's generating this
unsilencable warning unless I drop the `inline` from the prototype, but if I do
and if I then also include the second header with the definition, then the
prototype without the inline will turn into an unwanted instantiation and
linker errors down the road.

[Bug c/89264] New: Incorrect bitfield type in -Wconversion warnings

2019-02-09 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89264

Bug ID: 89264
   Summary: Incorrect bitfield type in -Wconversion warnings
   Product: gcc
   Version: 7.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

void f() { struct{ unsigned x:1; }x = { (unsigned){0} }; }

warns about a conversion to `unsigned char:1`. It should say `unsigned int:1`.

[Bug c/89265] New: Incorrect bitfield type in -Wconversion warnings

2019-02-09 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89265

Bug ID: 89265
   Summary: Incorrect bitfield type in -Wconversion warnings
   Product: gcc
   Version: 7.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

void f() { struct{ unsigned x:1; }x = { (unsigned){0} }; }

warns about a conversion to `unsigned char:1`. It should say `unsigned int:1`.

[Bug target/45591] gcc generates illegal asm at -O2 with -fdollars-in-identifiers

2019-02-05 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45591

pskocik at gmail dot com changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #3 from pskocik at gmail dot com ---
I think I've run into the same problem.

If I compile

int $ident(int X) { return X; }
int main() { return $ident(1); }

the generated assembly won't translate.

gcc generates 

call$ident

where clang would have parenethesized the $-containing identifier. The missing
parens result in assembler error "operand type mismatch for call".

[Bug c/88301] New: Optimization regression with undefined unsigned overflow

2018-12-01 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88301

Bug ID: 88301
   Summary: Optimization regression with undefined unsigned
overflow
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

I noticed gcc 7.* did a really nice optimization that allowed me to communicate
I want even some unsigned overflows to be
undefined:

#define ADD_NW(A,B) (__extension__({ __typeof(A+B) R;
if(__builtin_add_overflow(A,B,)) __builtin_unreachable(); R ;}))
_Bool a_b(unsigned A,  unsigned B) { return A+B >= B; }
_Bool a_b2(unsigned A,  unsigned B) { return ADD_NW(A,B) >= B; }

resulted in:

a_b:
add edi, esi
setnc   al
ret
a_b2:
mov eax, 1
ret

But on gcc 8.* it's

a_b:
add edi, esi
setnc   al
ret
a_b2:
add edi, esi
setnc   al
ret

again.

[Bug c/88131] New: `gcc -S pp_assembly.S - o OutputFile.s` writes to STDOUT instead of `OutputFile.s`

2018-11-21 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88131

Bug ID: 88131
   Summary: `gcc -S pp_assembly.S - o OutputFile.s` writes to
STDOUT instead of `OutputFile.s`
   Product: gcc
   Version: 7.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

`gcc -S pp_assembly.S -o OutFile.s` or
`gcc -S pp_assembly.sx -o OutFile.s` or should behave the same as
`gcc -E pp_assembly.S -o OutFile.s` or `gcc -E pp_assembly.sx -o OutFile.s`
respectively but in the `-S` case, the `-o` option is ignored. (Clang does it
correctly.)

[Bug preprocessor/82335] Incorrect _Pragma expansion in complex macro expressions

2018-11-16 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82335

--- Comment #1 from pskocik at gmail dot com ---
This problem still persists in gcc 7.3.0. It appears pasting a macro containing
`_Pragma`s into
another macro is what's causing the displacement of the generated `#pragma`s.

I've cleaned up the example to make it clearer:


#define PRAGMA(...) _Pragma(#__VA_ARGS__)
#define PASTE(Expr)  Expr
#define PUSH_IGN(X) PRAGMA(GCC diagnostic push) PRAGMA(GCC diagnostic
ignored X)
#define POP() PRAGMA(GCC diagnostic pop)

#define SIGNED_EH(X) \
({ PUSH_IGN("-Wtype-limits") \
_Bool SIGNED_EH = ((__typeof(X))-1 < 0); \
 POP() \
 SIGNED_EH; })

int main();
{
unsigned x;
SIGNED_EH(x);   //OK; #pragmas generated around the
assignment:
#if 0 //generated:
 ({
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wtype-limits"
  _Bool SIGNED_EH = ((__typeof(x))-1 < 0);
#pragma GCC diagnostic pop
  SIGNED_EH; });
#endif


PASTE(SIGNED_EH(x)); //OOPS generates:

#if 0 //generated:
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wtype-limits"
#pragma GCC diagnostic pop
 ({ _Bool SIGNED_EH = ((__typeof(x))-1 < 0); SIGNED_EH;
});
#endif
}

Clang's preprocessor generates correct code even for the `PASTE(SIGNED_EH(x))`
case.

[Bug c/82335] New: Incorrect _Pragma expansion with in complex macro expressions

2017-09-26 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82335

Bug ID: 82335
   Summary: Incorrect _Pragma expansion with in complex macro
expressions
   Product: gcc
   Version: 5.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Created attachment 42239
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42239=edit
reproducer for "Incorrect _Pragma expansion with in complex macro expressions";
compile with -Wall -Wextra

Basically, _Pragma's expansions (as shown with gcc -E) seem to get shifted out
of some slightly "more complex" macro expressions, which renders them
ineffective.

Attached is one piece of code that reproduces the problem. When compiled with
`-Wall -Wextra`, the warning which should've been silenced isn't, because the
#pragma push-pop pair gets shifted out of the expression.

It's hard to pinpoint exactly what causes this, and the problem goes away with
minor complexity reductions (such as replacing a macro (e.g., the tof macro)
with what it expands to), but it seems to stick in more complex contexts.

clang handles everything fine.

[Bug pch/15351] Add option for caching headers

2017-07-16 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15351

pskocik at gmail dot com changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #3 from pskocik at gmail dot com ---
>From my reading of the manual, which talks about a *.gch precompiled-header
directory, it seems to me like gcc would be in the perfect position to manage
that directory itself. If the directory has a header matching the current
compiler config, it should use it, otherwise, it should create a new entry and
use that.

The user could simply tell gcc which header it wants precompiled and gcc could
take care of creating the precompiled versions in the appropriate gch directory
as needed.

(If gcc were to manage the *.gch directory itself, it wouldn't also need to try
all directory entries until a match is found -- it could aim directly, based
its established naming system for the entries. The naming system could be such
so that entries from uninstalled compiler versions could be automatically or
manually deleted.)

The cache directories could be in the same directory as the found header, or in
a per-user system-directory that pararellizes the path of the found header in
case the directory of the found header isn't writable by the current user.

[Bug c/78036] New: -MM suppresses error detection

2016-10-19 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78036

Bug ID: 78036
   Summary: -MM suppresses error detection
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

Example:

touch in.h
gcc -x c -include in.h - -MD -MF /dev/stdout <<<'int main(){x; return 42;}

fails as it should.

Changing -MD to -MM causes the failure to go undetected (no stderr output, no
nonzero exit status), making it look as if the compilation succeeded.

(Notes: Changing -MF /dev/stdout to -MF regular_file makes no difference.
Clang has this behavior too)

[Bug c/77487] gcc reports "file shorter than expected" for regular files on stdin when the offset of fd 0 isn't 0

2016-09-05 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77487

pskocik at gmail dot com changed:

   What|Removed |Added

 CC||pskocik at gmail dot com

--- Comment #1 from pskocik at gmail dot com ---
Created attachment 39564
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39564=edit
example test

a self-compiling c script -- passes self at an offset to gcc and then runs the
a.out

it should print:

c: hello world

but instead, there's also the 

cc1: warning:  is shorter than expected [enabled by default]

line in there

[Bug c/77487] New: gcc reports "file shorter than expected" for regular files on stdin when the offset of fd 0 isn't 0

2016-09-05 Thread pskocik at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77487

Bug ID: 77487
   Summary: gcc reports "file shorter than expected" for regular
files on stdin when the offset of fd 0 isn't 0
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pskocik at gmail dot com
  Target Milestone: ---

My program calls `gcc -x c - ` with the offset of filedescriptor 0 being larger
than 0.

Consequently I get:

  cc1: warning:  is shorter than expected [enabled by default]

(clang reports no errors).

I think this is caused by:

libcpp/files.c:741

  if (regular && total != size && STAT_SIZE_RELIABLE (file->st))
cpp_error_at (pfile, CPP_DL_WARNING, loc,
   "%s is shorter than expected", file->path);

not taking the filedescriptor offset into account and that changing it to 

if (regular && total != size && STAT_SIZE_RELIABLE (file->st) -
lseek(file->fd, 0, SEEK_CUR) /*should always succeed?*/ )
cpp_error_at (pfile, CPP_DL_WARNING, loc,
   "%s is shorter than expected", file->path);

should fix it.

I hope I'm making sense.

Best regards,
Petr Skocik