[Bug target/55522] -funsafe-math-optimizations is unexpectedly harmful, especially w/ -shared

2016-02-02 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522

Andy Lutomirski  changed:

   What|Removed |Added

   Severity|enhancement |major

--- Comment #9 from Andy Lutomirski  ---
I'm changing this from an enhancement to a bug.  It's a poor design, no one
seems to think it's a good idea, and it's broken more than one thing in real
life.

[Bug target/70146] missed-optimization: i386 hidden references should use PC32 relocations instead of GOTOFF

2016-03-10 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70146

--- Comment #3 from Andy Lutomirski  ---
I think a similar optimization could be applied to default-visibility
references as well.  For example, gcc can generate things like this:

call__x86.get_pc_thunk.ax
addl$_GLOBAL_OFFSET_TABLE_, %eax
movlfoo@GOT(%eax), %eax

The addl is unnecessary.

[Bug target/70146] missed-optimization: i386 hidden references should use PC32 relocations instead of GOTOFF

2016-03-11 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70146

--- Comment #6 from Andy Lutomirski  ---
The PLT case could be reduced significantly by -fno-plt, but the TLS issue is
just sad.  I don't suppose it could be fixed with just toolchain changes while
retaining compatibility with existing libc versions?

[Bug target/70146] New: missed-optimization: i386 hidden references should use PC32 relocations instead of GOTOFF

2016-03-08 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70146

Bug ID: 70146
   Summary: missed-optimization: i386 hidden references should use
PC32 relocations instead of GOTOFF
   Product: gcc
   Version: 5.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: luto at kernel dot org
  Target Milestone: ---

If I compile:

extern char foo[] __attribute__((visibility("hidden")));
char *get_foo(void)
{
return foo;
}

with gcc -m32 -O2 -fPIC, I get (annocataions trimmed for brevity):

get_foo:
call__x86.get_pc_thunk.ax
addl$_GLOBAL_OFFSET_TABLE_, %eax
lealfoo@GOTOFF(%eax), %eax
ret

I think this should generate:

get_foo:
call__x86.get_pc_thunk.ax
.Lpcbase:
lealfoo-.Lpcbase(%eax), %eax
ret

That would be smaller, faster, and result in fewer relocations and thus less
time spend linking.

[Bug target/70148] New: Feature request: allow overriding the SSP canary location

2016-03-08 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70148

Bug ID: 70148
   Summary: Feature request: allow overriding the SSP canary
location
   Product: gcc
   Version: 5.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: luto at kernel dot org
  Target Milestone: ---

The 32-bit x86 Linux kernel jumps through ridiculous hoops to but the SSP
canary where GCC expects it.  (If I had been paying attention when that code
was written, I would have NAKed it outright -- it's a disaster.)  The 64-bit
kernel jumps through less ridiculous hoops.

Can we have the ability to tell GCC what segment override to use and what
offset to use?  It would be even better if we could tell GCC what segment
override to use and make GCC emit an absolute symbol reference for the offset
so we could relocate it.

(The 32-bit issue is that GCC insists on using %gs, and Linux would *much*
prefer %fs.)

[Bug rtl-optimization/66795] Incorrect and missed optimizations of __builtin_frame_address

2016-07-18 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66795

--- Comment #2 from Andy Lutomirski  ---
(In reply to Andrew Pinski from comment #1)
> The code below is invalid as __builtin_frame_address is only for reading
> from the current frame.

If this code is invalid, then what exactly is valid code that uses
__builtin_frame_address?  This is way outside the scope of the C standard. 
Apparently one cannot reliably write to pointers derived from
__builtin_frame_address?  Can one reliably *read* them?

What I'm trying to do is actually the most well-defined think I can think of to
do with __builtin_frame_address.  Roughly:

__attribute__((externally_visible)) void func(void)
{
unsigned long *ptr = (unsigned long *)__builtin_frame_address(0) - 2;
*ptr = some_other_value;
}

pushq some_value
call func
; the top of the stack should now contain some_other_value

As it stands, gcc can't usefully compile this because the store is deleted.  

> Also -O2 enables -fomit-frame-pointer which might
> cause the frame pointer to go away.

It's the other way around.  Using __builtin_frame_address(0) prevents the frame
pointer from being omitted.  This is an unnecessary constraint, at least for my
use case.  If __builtin_frame_pointer(0) returned a pointer to the stack one
slot below the saved return address regardless of whether the frame pointer
exists, it would work for me.

Would it make sense to add a new, well-defined intrinsic that returns the
address of the current function's return address?  As long as the function is
noinline, this would have sensible semantics.  (Arguably it should be an error
to use such an intrinsic in a function that isn't either noinline or forcibly
inlined.)

[Bug target/81708] The x86 stack canary location should be customizable

2017-08-03 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708

--- Comment #2 from Andy Lutomirski  ---
(In reply to H. Peter Anvin from comment #1)
> Well, you can choose between "__stack_chk_guard(%rip)" and "%gs:40"... :)

Wow, I guess I didn't even consider the former.  That would be option 5:
symbol(%rip).  Let's not use it for the kernel.

For x86_32, we want %fs:symbol.  %fs:constant would be tolarable, too.  For
x86_64, it depends on the end game for percpu access, I suppose.

[Bug target/81708] New: The x86 stack canary location should be customizable

2017-08-03 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708

Bug ID: 81708
   Summary: The x86 stack canary location should be customizable
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: luto at kernel dot org
  Target Milestone: ---

There are four credible ways to find the stack canary:

1. %fs:symbol
2. %fs:symbol(%rip) [with a PCREL relocation]
3. %gs:symbol
4. %gs:symbol(%rip)

(Obviously the %rip variants only work on x86_64.)

The current code is roughly equivalent to (1) or (3) where symbol is an
absolute symbol equal to 0x28 or similar.

Please give a command line option to choose any of the four modes and specify
the symbol name.  (Or just hardcode the symbol name to __gcc_stack_canary or
whatever if the option is set.)

My perferred solution would be -mstack-protector-cookie=gs:symname or
-mstack-protector-cookie=gs:symname(%rip) or -mstack-protector-cookie=gs:0x28
depending on what's desired.

I personally consider it to have been a mistake for Linux to support a stack
canary without insisting that GCC fix this issue first.  The x86_32 case, in
particular, is a collossal mess in the kernel that slows kernel entries down
and seriously overcomplicates the kernel code because the stack canary
addressing mode that GCC chooses is nonsensical.

[Bug target/81490] x86: Handling of symbol ranges for __seg_fs/__seg_gs

2017-08-03 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490

--- Comment #21 from Andy Lutomirski  ---
Re: the stack canary, I filed bug 81708.  It seems to me that __seg_gs is
analogous and users should be able to directly specify the addressing mode.

[Bug target/81490] x86: Handling of symbol ranges for __seg_fs/__seg_gs

2017-08-03 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490

--- Comment #20 from Andy Lutomirski  ---
We have issues putting modules more than 2G from the main kernel no matter
what, but I don't see what this has to do with %gs addressing.

I still think that GCC should let us directly control the addressing mode for
__seg accesses.

Can someone elaborate on what the @GPREL suffix actually means?

[Bug target/81825] New: x86_64 stack realignment code is suboptimal

2017-08-11 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81825

Bug ID: 81825
   Summary: x86_64 stack realignment code is suboptimal
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: luto at kernel dot org
  Target Milestone: ---

I compiled this:

void func()
{
int var __attribute__((aligned(32)));
asm volatile ("" :: "m" (var));
}

using gcc (GCC) 7.1.1 20170622 (Red Hat 7.1.1-3).  I got (after stripping CFI
stuff):

func:
leaq8(%rsp), %r10
andq$-32, %rsp
pushq   -8(%r10)
pushq   %rbp
movq%rsp, %rbp
pushq   %r10
popq%r10
popq%rbp
leaq-8(%r10), %rsp
ret

I have three objections to this code.

1. The push and immediate pop of %r10 seems pointless.  Maybe it's due to some
weird DWARF limitation?  A register allocation limitation sounds more likely,
though.

2. The addressing modes used for r10 are suboptimal.  Shouldn't the first
instruction be just movq %rsp, %r10?  By my count, this would save 12 bytes of
text.

3. Couldn't the whole thing just be:

pushq %rbp
movq %rsp, %rbp
andq $-32, %rsp

function body here.  rbp can't be used to locate stack variables, but rsp can.

leaveq
ret

[Bug target/81490] x86: Handling of symbol ranges for __seg_fs/__seg_gs

2017-07-20 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490

--- Comment #14 from Andy Lutomirski  ---
(In reply to H. Peter Anvin from comment #13)
> On July 20, 2017 10:47:54 AM PDT, ubizjak at gmail dot com
>  wrote:
> >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490
> >OTOH, we can avoid RIP relative addresses for non-default address
> >spaces, when
> >-mcmodel=kernel compile flag is in effect.
> 
> It's not just about the kernel model.  The example I have is for the
> small-pic model.  We are considering moving away from the kernel model.

The kernel, and possibly future userspace, can use very strange relocations for
non-standard segments.  The actual percpu space is some contiguous (presumably)
block that starts at some offset from the actual GS base, but that offset is
flexible.  There are several possible models:

a) Percpu data starts at %gs:0 (or %gs:small number) and access uses absolute
addressing.  This means that percpu symbols will resolve to numbers between 0
and 2G and do *not* change if the kernel is relocated.

b) Percpu data starts at %gs:0 (or %gs:small number) and access uses
rip-relative addressing.  This requires that the kernel be located at a very
high address so that the percpu data is reachable.  Percpu symbols are largish
numbers but still less than 2G.  The symbols do change if the kernel is
relocated (which is totally the opposite of normal PIC addressing, where PIC
offsets are not relocated).

(a) and (b) could plausibly coexist with appropriate magic in the symbol
tables.

c) Percpu data starts at %gs:x where x is the kernel load address plus a
smallish constant.  Access is like %gs:symbol(%rip).  No dynamic relocations. 
Also no constraints on where the kernel is loaded.

d) Percpu data starts at %gs:x where x is the kernel load address plus a
smallish constant.  Access is like %gs:symbol.  There are dynamic relocations.

Different choices here have different tradeoffs, and, for any given kernel code
model, more than one of these is possible.  I don't think -mcmodel should be
the determining factor.

(Models c and d would require gcc to allow a flexible stack cookie location.)

[Bug target/81490] x86: Handling of symbol ranges for __seg_fs/__seg_gs

2017-07-19 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490

Andy Lutomirski  changed:

   What|Removed |Added

 CC||luto at kernel dot org

--- Comment #8 from Andy Lutomirski  ---
(In reply to H. Peter Anvin from comment #7)
> Thinking about this some more, this is really not an aspect of __seg_* but
> rather the section the symbol is placed in.  An embedded system kernel, for
> example, could quite possibly want to access an absolute section while the
> kernel itself is position-independent; maybe it is even running XIP.
> 
> As such perhaps a better solution would be to have attributes that control
> these specific parameters.  A quick off-the-top-of-my-head proposal:
> 
> __attribute__((absolute))
> 
> ... this symbol should be accessed using absolute address references.
> 
> __attribute__((range(from,to)))
> 
> ... specifies the valid address range for this symbol.

I would like to see something along these lines but even stronger: a way to
instruct GCC to use a particular type of access and relocation.  For example, I
think I should be able to ask GCC to reference symbol "foo" using a PC-relative
relocation or using an absolute relocation at my option.

[Bug target/81490] x86: Handling of symbol ranges for __seg_fs/__seg_gs

2017-08-04 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490

--- Comment #28 from Andy Lutomirski  ---
hjl, I don't understand what problem you're trying to solve here.  Your patch
makes it relatively straightforward to access global variables using __seg_gs
if the environment sets up the %gs base exactly the way GCC wants it to be set
up.

Why is this useful?

As far as I know, anyone using __seg_gs is trying to do something that is
outside the scope of what normal C code does.  They have some addressing model
that they want to use, and they're probably trying to access data that is *not*
a normal global variable.  On Linux, we want to use __seg_gs to access *percpu*
data, which is allocated by a fancy percpu allocator and relocated by the
kernel's internal relocation processing.  The latter is indeed based on data
emitted by binutils, but it is *not* a conventional ELF relocation engine.  It
will, of course, adapt as needed.

When I do:

extern int __seg_gs [insert more addressing details here?] variable;

int val = variable;

I am *not* trying to access something that would necessarily be credibly
defined by:

int __seg_gs [whatever] variable;

I'm trying to access something that is defined by a mechanism that is
completely invisible to gcc, may involve linker script magic, and may well
involve magic page table manipulation.

Also, please do not consider the stack cookie at %gs:0x28 to be a design
constraint in any respect.  We're talking about using a new GCC version to
solve problems that aren't cleanly solved on existing GCC.  That new GCC can
and should fix the silly stack cookie addressing, too, so it stops getting in
the way.

[Bug target/81708] The x86 stack canary location should be customizable

2017-08-08 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708

--- Comment #7 from Andy Lutomirski  ---
Hmm.  This is a big improvement, but it's still going to be awkward to use --
if we want to use a normal Linux percpu variable, we're stuck putting it in a
fixed location that's known at compile time as opposed to just at link time. 
It also still prevents Linux from switching to PC-relative percpu addressing to
reduce text size.

Can we do -mstack-protector-offset=[symbol name]?  And could it be extended to
ask for PC-relative addressing?

[Bug target/68491] libgcc calls __get_cpuid with 0 level breaks on early 486

2017-06-27 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68491

Andy Lutomirski  changed:

   What|Removed |Added

 CC||luto at kernel dot org

--- Comment #9 from Andy Lutomirski  ---
I'm a bit late to the party, but this patch seems dubious to me. 
__get_cpuid_max() fails to distinguish between CPUs that have max level 0
(although I doubt that any such CPUs exist) and CPUs that don't have CPUID at
all.

Shouldn't __get_cpuid_max return -1 if CPUID isn't there or, alternatively,
just generally return the last level + 1 (and 0 if CPUID isn't there)?

[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'

2018-08-14 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908

--- Comment #13 from Andy Lutomirski  ---
I find this whole discussion very confusing.  The problem has nothing to do
with relocations AFAICT.  The problem is that gcc is (as requested) generating
retpolines, and it's set up to do it by calling __x86_indirect_thunk_*, and
those helpers don't exist in the vDSO.

I also think that static inlines have anything to do with it.  Nor so I see why
any function attributes make any sense.

The trivial fix is:

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 261802b1cc50..8140176b8b41 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -74,7 +74,7 @@ CFL := $(PROFILING) -mcmodel=small -fPIC -O2
-fasynchronous-unwind-tables -m64 \
-fno-omit-frame-pointer -foptimize-sibling-calls \
-DDISABLE_BRANCH_PROFILING -DBUILD_VDSO

-$(vobjs): KBUILD_CFLAGS := $(filter-out
$(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS)) $(CFL)
+$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS)
$(RETPOLINE_CFLAGS),$(KBUILD_CFLAGS)) $(CFL)

 #
 # vDSO code runs in userspace and -pg doesn't help with profiling anyway.
@@ -138,6 +138,7 @@ KBUILD_CFLAGS_32 := $(filter-out
-mcmodel=kernel,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 := $(filter-out $(RETPOLINE_CFLAGS),$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=0 -fpic
 KBUILD_CFLAGS_32 += $(call cc-option, -fno-stack-protector)
 KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls)

but it might be better to use the internal thunk mechanism.

You said that Andi Kleen had a comment.  Can you point me to it?

[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'

2018-08-14 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908

Andy Lutomirski  changed:

   What|Removed |Added

 CC||luto at kernel dot org

--- Comment #9 from Andy Lutomirski  ---
I haven't fully dug into this, but I do one one immediate question: why is GCC
generating a jump table for a five-entry switch statement if retpolines are on?
 This has got to be a *huge* performance loss.  The retpoline sequence is very,
very slow, and branches aren't that slow.  A five-entry switch is only three
branches deep.

[Bug libstdc++/91357] New: _GLIBCXX_ASSERTIONS rejects possibly-valid code

2019-08-05 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91357

Bug ID: 91357
   Summary: _GLIBCXX_ASSERTIONS rejects possibly-valid code
   Product: gcc
   Version: 9.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: luto at kernel dot org
  Target Milestone: ---

Created attachment 46672
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46672=edit
Example

For a non-empty vector v, calling v.operator[](v.size()) fires this assertion:

/usr/include/c++/9/bits/stl_vector.h:1042: std::vector<_Tp, _Alloc>::reference
std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with
_Tp = int; _Alloc = std::allocator; std::vector<_Tp, _Alloc>::reference =
int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion
'__builtin_expect(__n < this->size(), true)' failed.

After some digging into the standard, I'm not entirely sure whether that
particular invocation of operator[] is UB or otherwise invalid.  It's certainly
poor form -- the right way in modern C++ to form a one-past-the-end pointer to
a vector is (v.data() + v.size()), but, if it's indeed legal, then
_GLIBCXX_ASSERTIONS should not be firing.

I suggest one of the following resolutions:

1. Determine that merely calling v.operator[](v.size()) is UB or that it is
otherwise permissible for a conforming implementation to crash when this
happens.  This would require some careful standard reading and maybe even a
change to the standard.

2. Weaken the assertion to only crash if the index is strictly greater than the
size.  Calling v.operator[](v.size() + 1) is definitely UB (except perhaps in
the pathological case where v.size() + 1 == 0, but I doubt it's possible to
make a vector that large on any supported platform).

3. Use compiler magic to only fire the assertion if the pointer returned by
operator[] is dereferenced for real, e.g. dereferenced and converted to an
lvalue.  Maybe gcc could gain a builtin like
__builtin_return_value_is_definitely_dereferenced().

4. Improve the assertion so that, when this assertion fires, a very clear
message is printed to stderr saying something like "This assertion fired due to
the formation of a one-past-the-end reference.  This may indicate a stylistic
issue with the program but is not necessarily a bug."

[Bug c++/57998] Unhelpful error message when a class has no move constructor

2020-03-19 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57998

Andy Lutomirski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andy Lutomirski  ---
Indeed.  That's slightly embarrassing.  I admit that after more than six years
I don't remember how I made this mistake :)

*** This bug has been marked as a duplicate of bug 53234 ***

[Bug c++/53234] [c++0x] unfriendly error message for missing move constructor

2020-03-19 Thread luto at kernel dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53234

--- Comment #2 from Andy Lutomirski  ---
*** Bug 57998 has been marked as a duplicate of this bug. ***

[Bug rtl-optimization/48877] Inline asm for rdtsc generates silly code

2020-12-25 Thread luto at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48877

Andy Lutomirski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #3 from Andy Lutomirski  ---
(In reply to Ivan Sorokin from comment #2)
> Modern GCC doesn't generate excessive moves for this example. It looks like
> the problem was fixed in 4.9.0: https://godbolt.org/z/MqE7sP .
> 
> I think the bug can be closed now.

Indeed.

[Bug target/98997] New: Linking mismatched -fcf-protection objects generates incorrect code

2021-02-07 Thread luto at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98997

Bug ID: 98997
   Summary: Linking mismatched -fcf-protection objects generates
incorrect code
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: luto at kernel dot org
  Target Milestone: ---

$ gcc --version
gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ cat cet1.c
#include 

void test(jmp_buf buf)
{
__builtin_longjmp(buf, 1);
}
$ cat cet2.c
#include 

void test(jmp_buf buf);

int main()
{
jmp_buf buf;
if (__builtin_setjmp(buf) == 0)
test(buf);
return 0;
}
$ gcc -c -O2 cet1.c -fcf-protection=full
$ gcc -c -O2 cet2.c
$ gcc -o cet -O2 cet1.o cet2.o
$ ./cet
Illegal instruction (core dumped)

Presumably the magic CET fixup in the builtin setjmp/longjmp requires that
matching code is generated for both.  I'm testing on a non-CET system.

I haven't even tried to figure out if anything sensible happens to the CET GNU
property, but I think the current GCC behavior is impolite at best.  I think
GCC should do one of a few things:

1. Generate working code.

2. Fail to link.

3. At least document this pitfall very clearly.

I really hope this doesn't affect shared objects.

[Bug target/98997] Linking mismatched -fcf-protection objects generates incorrect code

2021-02-08 Thread luto at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98997

--- Comment #4 from Andy Lutomirski  ---
I should add: on brief inspection, that patch looks like an ABI break for
-fcf-protection=none

[Bug target/98997] Linking mismatched -fcf-protection objects generates incorrect code

2021-02-08 Thread luto at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98997

--- Comment #3 from Andy Lutomirski  ---
What is -fcf-protection=stack actually supposed to do as compared to
-fcf-protection=none?  Is it valid to run code compiled with
-fcf-protection=none with SHSTK enabled?  If so, then I wonder why
-fcf-protection=stack exists at all.  If not, then I'm wondering why your patch
seems to be effectively hardcoding "stack" mode for SJLJ?

You could probably fix this bug differently by changing __builtin_setjmp() to
store 0 for SSP in "none" mode.  Then at least -fcf-protection=none wouldn't
emit CET code.

[Bug target/55522] -funsafe-math-optimizations is unexpectedly harmful, especially w/ -shared

2022-09-08 Thread luto at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522

--- Comment #16 from Andy Lutomirski  ---
(In reply to Andrew Pinski from comment #13)
> Patient: Doctor it hurts when I do this.
> Doctor: then don't do that and if you read the instructions I gave you I
> told you I would hurt this way.

I think this analogy is off.  Maybe for most questionable optimization or
codegen options it would apply, but -ffast-math is more like:

Patient: Doctor, it hurts me when my neighbor does this.
Doctor: Then tell your neighbor not to read that and to read the fine print on
the box.

If nitroglycerin pills have side effects and patients need to read the label,
maybe fine.  But if they start exploding and breaking other peoples' windows,
that's a whole different problem.