[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #22 from rguenther at suse dot de --- On Mon, 13 Dec 2021, hubicka at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 > > --- Comment #20 from Jan Hubicka --- > I really think with -fdelete-null-pointer-checks we should optimize away the > pointer adjustment relying on the fact that program will segfault. > > I was wondering, -fdelete-null-pointer-checks currently requires pointer to be > precisely 0. We are already iffy here since the access is at non-0 offset, > but > since infer_nonnull_range_by_dereference uses check_loadstore: > > static bool > check_loadstore (gimple *, tree op, tree, void *data) > { > if (TREE_CODE (op) == MEM_REF || TREE_CODE (op) == TARGET_MEM_REF) > { > /* Some address spaces may legitimately dereference zero. */ > addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (op)); > if (targetm.addr_space.zero_address_valid (as)) > return false; > > return operand_equal_p (TREE_OPERAND (op, 0), (tree)data, 0); > } > return false; > } > > which completely ignores MEM_REF_OFFSET we actually turn into trap accesses > that are arbitrarily far from NULL. We also ignore handled components so we I think MEM_REF[(void *)0 + 4] is non-canonical (IIRC we "simplify" that to MEM_REF[(void *)4])
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #21 from Jeffrey A. Law --- I don't think there's anything inherently wrong with treating 0 + small offset similarly to 0 when it comes to -fdelete-null-pointer-checks. I suspect it'll break some poorly written low level code, though we fixed up some of that a year or so ago in response to some of Martin's diagnostic work which was flagging 0 + offset as problematical.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #20 from Jan Hubicka --- I really think with -fdelete-null-pointer-checks we should optimize away the pointer adjustment relying on the fact that program will segfault. I was wondering, -fdelete-null-pointer-checks currently requires pointer to be precisely 0. We are already iffy here since the access is at non-0 offset, but since infer_nonnull_range_by_dereference uses check_loadstore: static bool check_loadstore (gimple *, tree op, tree, void *data) { if (TREE_CODE (op) == MEM_REF || TREE_CODE (op) == TARGET_MEM_REF) { /* Some address spaces may legitimately dereference zero. */ addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (op)); if (targetm.addr_space.zero_address_valid (as)) return false; return operand_equal_p (TREE_OPERAND (op, 0), (tree)data, 0); } return false; } which completely ignores MEM_REF_OFFSET we actually turn into trap accesses that are arbitrarily far from NULL. We also ignore handled components so we miss this for example for variable array accesses I think. However if we had --param null-pointer-zone defaulting to say 4k (a page size) we could optimize all accesses that are near the NULL pointer. This would let us to optimize this correctly and also i.e. simplify ipa-pta that currently special cases 0 as null but thinks that any other constat may point to any global variable. Small constants are common so this should optimize noticeably. For clang binary there are really many traps added by this logic that makes code noticeably uglier than what clang generates on its own sources.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #19 from Jan Hubicka --- *** Bug 103674 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #18 from Jeffrey A. Law --- Jon, there's no way for the optimizers to improve the to_derived_bad case as there's nothing in the IL after we leave the front-end that's useful. In the .original dump we have: ;; Function derived& to_derived_bad(base2*) (null) ;; enabled by -tree-original return = b != 0B ? (struct derived &) b + 18446744073709551612 : 0; There's just nothing the optimizers can do with that. The front-end would have to provide more information or remove the check itself (as is done for the to_derived_good case which has this .original dump): ;; Function derived& to_derived_good(base2*) (null) ;; enabled by -tree-original return = (struct derived &) NON_LVALUE_EXPR + 18446744073709551612;
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 Jonathan Wakely changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2021-01-07 Status|UNCONFIRMED |NEW --- Comment #17 from Jonathan Wakely --- Copied from the PR 98501 dup: Consider this code: struct base1 { int a; }; struct base2 { int b; }; struct derived : base1, base2 {}; derived& to_derived_bad(base2* b) { return *static_cast(b); } derived& to_derived_good(base2* b) { return static_cast(*b); } I believe both of these functions are functionally equivalent and should generate the same code. Both functions cast pointer from base to derived if it is not nullptr and both cause undefined behavior if it is nullptr. GCC optimizes to_derived_good() to a single subtraction, but it inserts nullptr-check into to_derived_bad(): to_derived_good(base2*): lea rax, [rdi-4] ret to_derived_bad(base2*): lea rax, [rdi-4] test rdi, rdi mov edx, 0 cmove rax, rdx ret Could GCC omit the nullptr-check in to_derived_bad?
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 Jonathan Wakely changed: What|Removed |Added CC||vanyacpp at gmail dot com --- Comment #16 from Jonathan Wakely --- *** Bug 98501 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #15 from Jakub Jelinek --- In the particular case we are talking about, security/non-security, it doesn't make sense to do anything but optimize it into straight line code, any __builtin_trap or similar will just hurt. If you feel e.g. by default adding __builtin_unreachable is too dangerous in some cases, it can just handle similar cases manually and optimize away the conditional if there are no side-effect statements in between. We are talking about: [local count: 1073741824]: if (base_2(D) != 0B) goto ; [70.00%] else goto ; [30.00%] [local count: 751619281]: iftmp.1_3 = base_2(D) + (sizetype)-4; [local count: 1073741824]: # iftmp.1_1 = PHI _5 = MEM[(const struct Derived *)iftmp.1_1].D.2340.y; where without -fno-delete-null-pointer-checks and without -fwrapv-pointer, we can assume: 1) pointers in valid programs don't wrap 2) the first page is not mapped As offsetof (Derived, D.2340.y) is >= 4 and < 4096 here, we don't need to even care about pointer wrapping, just rely on accesses to 0 .. 4095 offsets to trap. If the offsetof would be 0, it would be about pointer wrapping, whether we are ok if instead of dereferencing *(int *)0 we dereference *(int *)-4 instead.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #14 from Jeffrey A. Law --- The reason for turning the dereference into a trap is because we can clean up ancillary computations -- the address computation, the RHS of a store, and the like via standard dead code elimination algorithms.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #13 from Jeffrey A. Law --- Marc, Yes, absolutely. In fact, I think that falls out of the work Martin S is doing in this space. Conceptually we're looking to generalize that code so that we can route more cases where the compiler detects undefined behavior through the path isolation routines. Within those commonized routines we want to have a knob which selects different behavior from the compiler when undefined behavior is detected which could potentially include issuing the RTL equivalent of __builtin_unreachable vs trap vs warn, but leave the code alone, try to mitigate, etc. Where I think we've differed in the past is what to do with conditional branch upon which the undefined behavior is control dependent upon. As you may remember, the original submission of path isolation would turn that conditional into an unconditional branch to the valid path. That's not correct because there can be observable behavior that occurs on the path from the conditional, but before the undefined behavior triggers. Having a knob to twiddle *that* may or may not be a good idea.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #12 from Marc Glisse --- (In reply to Jeffrey A. Law from comment #10) > __builtin_trap emits an actual trap into the instruction stream which halts > the process immediately which is *much* better from a security standpoint Regardless of what the default is, I think we should be able to agree that there are uses where we want to favor hardening/security (public facing servers, web browsers), and others where performance is more important (scientific simulations), and it would be nice to give users a choice. (I think sanitizers already provide a way to turn __builtin_unreachable into __builtin_trap, but that's more meant for explicit __builtin_unreachable in user code)
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #11 from John Zwinck --- Jeffrey, when I compile with -fno-isolate-erroneous-paths-dereference I am asking the compiler not to insert traps. I filed this issue in hopes that GCC can better optimize when it is told not to insert traps. I think the default behavior of inserting a trap is sort of OK ("sort of" because it seems pointless to trap when not trapping would dereference 0x0, which in my mind is also a good way to end a program that dereferences a null pointer). But when I tell GCC not to insert traps, I definitely want the simplest, fastest code.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 Jeffrey A. Law changed: What|Removed |Added CC||law at redhat dot com --- Comment #10 from Jeffrey A. Law --- trap is much stronger than an unreachable. If you hit a gcc_unreachable location at runtime, execution just continues onward with no indication something terrible has happened. It's literally just a marker in our IL and results in no generated code. I think it's fundamentally broken from a security standpoint. __builtin_trap emits an actual trap into the instruction stream which halts the process immediately which is *much* better from a security standpoint
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #9 from Jakub Jelinek --- (In reply to rguent...@suse.de from comment #8) > -fdelete-null-pointer-checks optimizes NULL pointer checks _after_ a > dereference. This case is first checking and then dereferencing. > GCC sees -fdelete-null-pointer-checks is just an option that says that it is ok to optimize null pointer dereferences. So, I don't see why we couldn't optimize this out. -fisolate-erroneous-paths-dereference is documented to add __builtin_trap in there instead of __builtin_unreachable, I'd say we want (by default) something that will add __builtin_unreachable in those cases instead.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #8 from rguenther at suse dot de --- On Mon, 15 Jun 2020, redi at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 > > --- Comment #7 from Jonathan Wakely --- > So yes, the static_cast should evaluate to zero, Thanks for the clarification. > but if it's followed by a > dereference then it seems reasonable to expect -fdelete-null-pointer-checks to > optimize away the handling for zero. -fdelete-null-pointer-checks optimizes NULL pointer checks _after_ a dereference. This case is first checking and then dereferencing. GCC sees [local count: 1073741824]: if (base_2(D) != 0B) goto ; [70.00%] else goto ; [30.00%] [local count: 751619278]: iftmp.1_3 = base_2(D) + 18446744073709551612; [local count: 1073741824]: # iftmp.1_1 = PHI _5 = MEM[(int *)iftmp.1_1 + 4B]; in getter(). So this is more about if-conversion or in the GCC case path isolation where I'd actually question that MEM[0B + 4B] is an "obvious" null pointer dereference ;) It's more obvious in the IL for field(): [local count: 1073741824]: if (base_2(D) != 0B) goto ; [70.00%] else goto ; [30.00%] [local count: 751619278]: iftmp.0_3 = base_2(D) + 18446744073709551612; [local count: 1073741824]: # iftmp.0_1 = PHI _5 = iftmp.0_1->D.2309.y; but my point is I guess that the C++ FE has a more "native" view of this and should maybe elide the NULL pointer check when the static_cast is dereferenced. Because eliding the check and isolating the (not NULL!) dereference is sth different.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #7 from Jonathan Wakely --- So yes, the static_cast should evaluate to zero, but if it's followed by a dereference then it seems reasonable to expect -fdelete-null-pointer-checks to optimize away the handling for zero.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #6 from Jonathan Wakely --- Dereferencing in the null case is undefined.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #5 from rguenther at suse dot de --- On Mon, 15 Jun 2020, redi at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 > > --- Comment #3 from Jonathan Wakely --- > Or to be more clear: > > struct Large { > char pad[1024*1024]; > int x; > }; > > Large* p = 0; > int i = p->x; Sure, but this isn't the same if C++ mandates the static_cast of a null evaluates to null and not the offset of the base class. So what clang does is not unsafe but wrong since the offset is missing and it returns Base1::x instead of Base2::y? Note for getter and clang I see _Z6getterP5Base2: # @_Z6getterP5Base2 .cfi_startproc # %bb.0: leaq-4(%rdi), %rax testq %rdi, %rdi cmoveq %rdi, %rax movl4(%rax), %eax retq So either static_cast(base) should evaluate to zero or not. If it does then clang dereferences the wrong address in the null case (but only in 'field'). So, what does C++ say here?
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #4 from John Zwinck --- Richard Biener said: > Note it will make a difference for very large objects (and thus very large > offsets added) which may end up accessing actually mapped memory so IMHO what > clang does by default is a security risk. I am a bit confused by this statement, as GCC turns what would have been a load from address zero into a load from a non-zero address. Here's a demo inspired by Mr Wakely's example: https://godbolt.org/z/EvMpyz Maybe I'm misreading the output, or you, but Clang's generated code looks safer to me.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #3 from Jonathan Wakely --- Or to be more clear: struct Large { char pad[1024*1024]; int x; }; Large* p = 0; int i = p->x;
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #2 from Jonathan Wakely --- (In reply to Richard Biener from comment #1) > I suppose the C++ standard says static_cast(nullptr) == nullptr > and > we literally follow that. Note it will make a difference for very large > objects (and thus very large offsets added) which may end up acccessing > actually > mapped memory so IMHO what clang does by default is a security risk. Is that any worse than this though? LargeObject* p = nullptr; p->foo(); Adding a static_cast doesn't seem to be the problem here.
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 Richard Biener changed: What|Removed |Added CC||law at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Component|c++ |tree-optimization Keywords||missed-optimization Target||x86_64-*-* i?86-*-* --- Comment #1 from Richard Biener --- I suppose the C++ standard says static_cast(nullptr) == nullptr and we literally follow that. Note it will make a difference for very large objects (and thus very large offsets added) which may end up acccessing actually mapped memory so IMHO what clang does by default is a security risk. Now, what we should eventually improve is the code generated in the isolated path. On GIMPLE we retain the load: [count: 0]: _7 ={v} MEM[(struct Derived *)0B].D.2340.y; __builtin_trap (); (because it can trap). The use of the cold section for the ud2 is probably also bad since it will cause a larger jump instruction where very likely testq %rdi, %rdi jne .L2 ud2 .L2: movl (%rdi), ... would be both faster and smaller. For reference the generated code: _Z5fieldP5Base2: .LFB1: .cfi_startproc testq %rdi, %rdi je .L2 movl(%rdi), %eax ret .cfi_endproc .section.text.unlikely .cfi_startproc .type _Z5fieldP5Base2.cold, @function _Z5fieldP5Base2.cold: .LFSB1: .L2: movl4, %eax ud2 CCing Jeff for the RTL side representation - IIRC we have some special CFG magic for gcc_unreachable, not sure if what we end up with trap() matches that or if we should adjust this somehow. Currently DCE marks the load as always necessary because it seems isolate-paths makes the load volatile: /* We want the NULL pointer dereference to actually occur so that code that wishes to catch the signal can do so. ... fair enough - but as you see above we dereference not NULL but some derived constant which might not actually trap. I wonder if it is more useful/safe to replace the load with a plain *(char *)0? Note I don't think what clang does here is reasonable.