[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
rjmccall added a comment. That is an extremely Google-specific analysis, actually; AFAIK almost nobody else uses static linking all the way down to and including the C and C++ standard libraries unless they're literally building an executable for a fully-static environment, like the kernel. The C and C++ language standards state that `operator delete` and `free` are independently overridable by just defining those functions outside the stdlib, so they generally cannot be resolved as direct calls without the sort of whole-program analysis that the linker can only do when linking the final executable. I think a more reasonable benchmark would be to build a standard Linux executable that dynamically links libc and lib{std,}c++, or perhaps something with the ADK or Darwin. I'm quite open to the idea that the right thing to do is just to do this in all modes, but I do think we should understand the cost a little better. (Xcode defaults release builds to `-Os`, so in practice my proposal of "-Os or -O0" or would enable this by default for almost all builds for us at Apple.) You can check for -Os by just checking `getCodeGenOpts().OptimizeSize`. It should be quite easy to collect null-deletion counts by (1) enabling your patch and (2) writing an `operator delete` that counts nulls before calling `free` and reports that count in a global destructor. Then you just need to pick a C++-heavy program to count them in. :) Clang compiling 403.gcc isn't an unreasonable choice, although LLVM's use of allocation pools does mean that we're likely to have fewer `delete` calls than you might think. Repository: rC Clang https://reviews.llvm.org/D43430 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
ahh added a comment. > If the pointer is not null, the runtime overhead of the null check is pretty > negligible next to the cost of actually doing the allocation. If the pointer > is null, the runtime overhead of making at least one unnecessary call — > probably two, if 'operator delete' doesn't do its own null check before > calling 'free', and probably one that crosses image boundaries — is not > negligible at all. So the relative impact on code that does end up destroying > a trivial value is outsized. In a reply of mine that I think got eaten by list moderation, I looked into this and benchmarked the cost of ::operator delete; with our tcmalloc, the cost of deleting null is about 8 cycles (compared to an empty loop.) (I don't really know how to benchmark the version with an if around it, but if we assume that's free, 8 cycles is still very cheap.) I suppose this might go up somewhat in an environment where we have to make some sort of PLT call or even two. My Google centric response is "don't do that"--linking directly against any modern malloc should avoid any jump in `::operator delete` and our environment make direct calls quite fast; I'm not sure how generalizable that is. (The linking is I think universally good advice; I'm not sure who runs in an environment that cannot make efficient far calls. But point is that: while your statement is true, the penalty for getting this wrong seems very small, and as you say any programmer can if around it at a "hot" null delete, no? This is one of the few aspects of malloc calls that I don't have near-infinite telemetry for (our sampling architecture doesn't easily collect it.) So I cannot give you a hard number of the fraction of deletes that are of null pointers, but I am convinced that is very small. Would more (Google-internal, obviously) data on this make a decision easier? I could see why maybe this could be gated on -Os, but I didn't do this for two reasons: - I am new at Clang development and wasn't sure how to put that sort of a check in :) Though I can learn if this is a hard requirement. - From our perspective, I think Google would want this flag in non-size optimization (-O2 or whatever.) We delete null infrequently enough that I'd expect this to be a pure cycle win (if a very small one) and even though (because?) we don't optimize for size, we have a number of very large binaries, and reducing icache hit can help a lot. I'm unsure exactly how to make progress here, since for one thing I'm unsure how strongly you feel about the potential cost/benefits. Guidance would be greatly appreciated! Repository: rC Clang https://reviews.llvm.org/D43430 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
rjmccall added a comment. Discourse nitpick: I would encourage you to just use the ordinary phrase "null pointer", or just "null", when referring to a pointer value that happens to be null and to reserve "nullptr" for *statically* null pointers, especially the `nullptr` expression. If the pointer is not null, the runtime overhead of the null check is pretty negligible next to the cost of actually doing the allocation. If the pointer is null, the runtime overhead of making at least one unnecessary call — probably two, if 'operator delete' doesn't do its own null check before calling 'free', and probably one that crosses image boundaries — is not negligible at all. So the relative impact on code that does end up destroying a trivial value is outsized. On the other hand, if the programmer adds an explicit null-check, it's unlikely to be optimized away; that means that if we did this automatically, there would still be an avenue for them to get the null check back. The code-size argument against doing the null check seems strong, however. Have you considered just doing this in the code-size-sensitive modes, in particular -Os/-Oz (for obvious reasons) and -O0 (because less code size == faster compiles, especially when it involves control flow)? Repository: rC Clang https://reviews.llvm.org/D43430 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
ahh updated this revision to Diff 134846. ahh added a comment. Fix indentation Repository: rC Clang https://reviews.llvm.org/D43430 Files: lib/CodeGen/CGCXXABI.h lib/CodeGen/CGExprCXX.cpp test/CodeGenCXX/cxx2a-destroying-delete.cpp test/CodeGenCXX/delete-two-arg.cpp test/CodeGenCXX/delete.cpp Index: test/CodeGenCXX/delete.cpp === --- test/CodeGenCXX/delete.cpp +++ test/CodeGenCXX/delete.cpp @@ -9,7 +9,12 @@ }; // POD types. +// CHECK-LABEL: define void @_Z2t3P1S void t3(S *s) { + // CHECK-NOT: icmp eq {{.*}}, null + // CHECK-NOT: br i1 + // CHECK: call void @_ZdlPv + delete s; } @@ -99,6 +104,8 @@ namespace test3 { void f(int a[10][20]) { +// CHECK-NOT: icmp eq {{.*}}, null +// CHECK-NOT: br i1 // CHECK: call void @_ZdaPv(i8* delete a; } @@ -113,6 +120,8 @@ // CHECK-LABEL: define void @_ZN5test421global_delete_virtualEPNS_1XE void global_delete_virtual(X *xp) { +// CHECK: icmp eq {{.*}}, null +// CHECK: br i1 // Load the offset-to-top from the vtable and apply it. // This has to be done first because the dtor can mess it up. // CHECK: [[T0:%.*]] = bitcast [[X:%.*]]* [[XP:%.*]] to i64** Index: test/CodeGenCXX/delete-two-arg.cpp === --- test/CodeGenCXX/delete-two-arg.cpp +++ test/CodeGenCXX/delete-two-arg.cpp @@ -9,8 +9,8 @@ // CHECK-LABEL: define void @_ZN5test11aEPNS_1AE( void a(A *x) { // CHECK: load -// CHECK-NEXT: icmp eq {{.*}}, null -// CHECK-NEXT: br i1 +// CHECK-NOT: icmp eq {{.*}}, null +// CHECK-NOT: br i1 // CHECK: call void @_ZN5test11AdlEPvj(i8* %{{.*}}, i32 4) delete x; } Index: test/CodeGenCXX/cxx2a-destroying-delete.cpp === --- test/CodeGenCXX/cxx2a-destroying-delete.cpp +++ test/CodeGenCXX/cxx2a-destroying-delete.cpp @@ -15,9 +15,8 @@ void delete_A(A *a) { delete a; } // CHECK-LABEL: define {{.*}}delete_A // CHECK: %[[a:.*]] = load -// CHECK: icmp eq %{{.*}} %[[a]], null -// CHECK: br i1 -// +// CHECK-NOT: icmp eq %{{.*}} %[[a]], null +// CHECK-NOT: br i1 // Ensure that we call the destroying delete and not the destructor. // CHECK-NOT: call // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]]) @@ -60,8 +59,8 @@ // CHECK: %[[castbase:.*]] = bitcast {{.*}} %[[base]] // // CHECK: %[[a:.*]] = phi {{.*}} %[[castbase]] -// CHECK: icmp eq %{{.*}} %[[a]], null -// CHECK: br i1 +// CHECK-NOT: icmp eq %{{.*}} %[[a]], null +// CHECK-NOT: br i1 // // CHECK-NOT: call // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]]) Index: lib/CodeGen/CGExprCXX.cpp === --- lib/CodeGen/CGExprCXX.cpp +++ lib/CodeGen/CGExprCXX.cpp @@ -1971,26 +1971,57 @@ CGF.PopCleanupBlock(); } -void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) { - const Expr *Arg = E->getArgument(); - Address Ptr = EmitPointerWithAlignment(Arg); +// Will we read through the deleted pointer? If so, +// we must first check it is not null. +static bool DeleteMightAccessObject(CodeGenFunction , +const CXXDeleteExpr *E, QualType DeleteTy) { - // Null check the pointer. - llvm::BasicBlock *DeleteNotNull = createBasicBlock("delete.notnull"); - llvm::BasicBlock *DeleteEnd = createBasicBlock("delete.end"); + if (E->getOperatorDelete()->isDestroyingOperatorDelete()) { +// It is safe to call destroying operator delete with nullptr arguments +// ([expr.delete] tells us it is unspecified whether a deallocation +// function is called) but a virtual destructor must be resolved +// to find the right function, which we can't do on nullptr. +auto *Dtor = DeleteTy->getAsCXXRecordDecl()->getDestructor(); +return Dtor && Dtor->isVirtual(); + } - llvm::Value *IsNull = Builder.CreateIsNull(Ptr.getPointer(), "isnull"); + if (E->isArrayForm()) { +return CGF.CGM.getCXXABI().requiresArrayCookie(E, DeleteTy); + } - Builder.CreateCondBr(IsNull, DeleteEnd, DeleteNotNull); - EmitBlock(DeleteNotNull); + // Otherwise, we should avoid invoking any nontrivial destructor on + // a null object. + return DeleteTy.isDestructedType(); +} +void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) { + const Expr *Arg = E->getArgument(); + Address Ptr = EmitPointerWithAlignment(Arg); QualType DeleteTy = E->getDestroyedType(); + // Null check the pointer, unless the destructor is trivial. In that case, + // all we'll be doing is passing Ptr to ::operator delete(), which is + // well formed for nullptr arguments (and allowed by [expr.delete.7] + // The overwhelming majority of deletes are of non-nullptr, so there's + // no efficiency gain to be had by skipping the very rare exceptions, and + // it bleeds
[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
rsmith added a comment. LGTM, but I'd also like @rjmccall's opinion. Repository: rC Clang https://reviews.llvm.org/D43430 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
rsmith added a comment. In https://reviews.llvm.org/D43430#1011269, @kimgr wrote: > I wonder if this could have negative effects for frequent deletion of > nullptrs (e.g. a sometimes-allocated member of a heavily used value type). For that to be better, I think we'd need one of two things to happen: 1. The compiler can statically detect that the pointer is null, and remove the call to `operator delete` and potentially other code too. (This happens, eg, when inlining `vector::push_back` on an empty `vector`.) 2. The condition cannot be determined statically, but dynamically it turns out that the pointer is very frequently null, so that the cost of the extra checks in the non-null case are cheaper than the cost of the function call in the null case. For case 1, the optimizer already knows that it can remove calls to usual `operator delete` functions on a null pointer, so that optimization should not be inhibited by this change. For case 2, it seems to me that our default assumption should probably be that most deleted pointers are not null. But I don't have measurements to back that up. If the user knows that their pointers are usually null, they can express that knowledge with an `if`, but if we always generate the branch on null here, then there would be no easy way for the programmer to express their intent that the pointer is usually not null. Comment at: lib/CodeGen/CGExprCXX.cpp:1977-1978 +static bool DeleteMightAccessObject(CodeGenFunction , + const CXXDeleteExpr *E, + QualType DeleteTy) { Reindent. Repository: rC Clang https://reviews.llvm.org/D43430 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Re: [PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
On Sat, Feb 17, 2018 at 1:41 AM Kim Gräsman via Phabricator < revi...@reviews.llvm.org> wrote: > kimgr added a comment. > > Peanut gallery observation: there was a discussion on the Boost list years > and years ago where someone made the case that `if (x != nullptr) delete > x;` was measurably faster than just calling `delete x;` I can't find it > now, but I think it might have been in the context of their > `checked_delete` library. Anyway, the reasoning was that with an external > nullptr check, you'd pay for one comparison, but without it you'd always > pay for a jump + a comparison. I suppose that only holds true for null > pointers, for non-null pointers the extra check is just waste. > > This is all correct, to some extent; that said, the net difference will be a call to a reasonably hot function and an extra branch. For grins, I benchmarked the cost of ::operator delete (this is slightly challenging because clang currently (mis?) compiles ::operator delete(nullptr) away, but this is solvable...) With our tcmalloc, the cost of deleting null is about 8 cycles (compared to an empty loop.) (I don't really know how to benchmark the version with an if around it, but if we assume that's free, 8 cycles is still very cheap.) > It looks to me like the compiler inserts an external null check, and > you're now removing it in select cases, did I understand that right? Yes, precisely. > I wonder if this could have negative effects for frequent deletion of > nullptrs (e.g. a sometimes-allocated member of a heavily used value type). > > In principle yes. I would be very surprised if any binary deletes null pointers often enough for this to be a significant concern (other than statically-provable ones, which are still free.) You'd have to be doing it a lot to notice those 8 cycles. ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
kimgr added a comment. Peanut gallery observation: there was a discussion on the Boost list years and years ago where someone made the case that `if (x != nullptr) delete x;` was measurably faster than just calling `delete x;` I can't find it now, but I think it might have been in the context of their `checked_delete` library. Anyway, the reasoning was that with an external nullptr check, you'd pay for one comparison, but without it you'd always pay for a jump + a comparison. I suppose that only holds true for null pointers, for non-null pointers the extra check is just waste. It looks to me like the compiler inserts an external null check, and you're now removing it in select cases, did I understand that right? I wonder if this could have negative effects for frequent deletion of nullptrs (e.g. a sometimes-allocated member of a heavily used value type). That said, I'm not sure how valid the observation back then still is. Repository: rC Clang https://reviews.llvm.org/D43430 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
ahh added a comment. On my workstation's checkout of head, one test fails (Clang :: Driver/response-file.c) both with and without this change; everything else appears to pass. I believe that between the tests I add to delete.cpp and the ones that are already there (and destroying-delete.cpp) we cover every case that has to get a nullptr check, and pretty much every one that should *not*. Name of the helper function is, uh, good enough for me, but no objections to changing it. Repository: rC Clang https://reviews.llvm.org/D43430 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
ahh created this revision. ahh added a reviewer: rsmith. Herald added a subscriber: cfe-commits. [expr.delete] is pretty crystal clear that it's OK to invoke a deallocation-function on a nullptr delete-expression: "If the value of the operand of the delete-expression is a null pointer value, it is unspecified whether a deallocation function will be called as described above." Even so, we currently check against nullptr unconditionally. This is wasteful for anything with a trivial destructor; deleting nullptr is very rare so it's not worth saving the call to ::operator delete, and this is a useless branch (and waste of code size) in the common case. Condition emission of the branch on us needing to actually look through the pointer for a vtable, size cookie, or nontrivial destructor. (In principle a nontrivial destructor with no side effects that didn't touch the object would also be safe to run unconditionally, but I don't know how to test we have one of those and who in the world would write one?) On an important and very large (~500 MiB) Google search binary, this saves approximately 32 KiB of text. Okay, it's not impressive in a relative sense, but it's still the right thing to do. A note on optimization: we still successfully elide delete-expressions of (literal) nullptr. Where before they were stuck behind a never-taken branch, now they reduce (effectively) to calls to __builtin_operator_delete(nullptr), which EarlyCSE is capable of optimizing out. So this has no cost in the already well-optimized case. Repository: rC Clang https://reviews.llvm.org/D43430 Files: lib/CodeGen/CGCXXABI.h lib/CodeGen/CGExprCXX.cpp test/CodeGenCXX/cxx2a-destroying-delete.cpp test/CodeGenCXX/delete-two-arg.cpp test/CodeGenCXX/delete.cpp Index: test/CodeGenCXX/delete.cpp === --- test/CodeGenCXX/delete.cpp +++ test/CodeGenCXX/delete.cpp @@ -9,7 +9,12 @@ }; // POD types. +// CHECK-LABEL: define void @_Z2t3P1S void t3(S *s) { + // CHECK-NOT: icmp eq {{.*}}, null + // CHECK-NOT: br i1 + // CHECK: call void @_ZdlPv + delete s; } @@ -99,6 +104,8 @@ namespace test3 { void f(int a[10][20]) { +// CHECK-NOT: icmp eq {{.*}}, null +// CHECK-NOT: br i1 // CHECK: call void @_ZdaPv(i8* delete a; } @@ -113,6 +120,8 @@ // CHECK-LABEL: define void @_ZN5test421global_delete_virtualEPNS_1XE void global_delete_virtual(X *xp) { +// CHECK: icmp eq {{.*}}, null +// CHECK: br i1 // Load the offset-to-top from the vtable and apply it. // This has to be done first because the dtor can mess it up. // CHECK: [[T0:%.*]] = bitcast [[X:%.*]]* [[XP:%.*]] to i64** Index: test/CodeGenCXX/delete-two-arg.cpp === --- test/CodeGenCXX/delete-two-arg.cpp +++ test/CodeGenCXX/delete-two-arg.cpp @@ -9,8 +9,8 @@ // CHECK-LABEL: define void @_ZN5test11aEPNS_1AE( void a(A *x) { // CHECK: load -// CHECK-NEXT: icmp eq {{.*}}, null -// CHECK-NEXT: br i1 +// CHECK-NOT: icmp eq {{.*}}, null +// CHECK-NOT: br i1 // CHECK: call void @_ZN5test11AdlEPvj(i8* %{{.*}}, i32 4) delete x; } Index: test/CodeGenCXX/cxx2a-destroying-delete.cpp === --- test/CodeGenCXX/cxx2a-destroying-delete.cpp +++ test/CodeGenCXX/cxx2a-destroying-delete.cpp @@ -15,9 +15,8 @@ void delete_A(A *a) { delete a; } // CHECK-LABEL: define {{.*}}delete_A // CHECK: %[[a:.*]] = load -// CHECK: icmp eq %{{.*}} %[[a]], null -// CHECK: br i1 -// +// CHECK-NOT: icmp eq %{{.*}} %[[a]], null +// CHECK-NOT: br i1 // Ensure that we call the destroying delete and not the destructor. // CHECK-NOT: call // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]]) @@ -60,8 +59,8 @@ // CHECK: %[[castbase:.*]] = bitcast {{.*}} %[[base]] // // CHECK: %[[a:.*]] = phi {{.*}} %[[castbase]] -// CHECK: icmp eq %{{.*}} %[[a]], null -// CHECK: br i1 +// CHECK-NOT: icmp eq %{{.*}} %[[a]], null +// CHECK-NOT: br i1 // // CHECK-NOT: call // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]]) Index: lib/CodeGen/CGExprCXX.cpp === --- lib/CodeGen/CGExprCXX.cpp +++ lib/CodeGen/CGExprCXX.cpp @@ -1971,26 +1971,58 @@ CGF.PopCleanupBlock(); } -void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) { - const Expr *Arg = E->getArgument(); - Address Ptr = EmitPointerWithAlignment(Arg); +// Will we read through the deleted pointer? If so, +// we must first check it is not null. +static bool DeleteMightAccessObject(CodeGenFunction , + const CXXDeleteExpr *E, + QualType DeleteTy) { - // Null check the pointer. - llvm::BasicBlock *DeleteNotNull = createBasicBlock("delete.notnull"); - llvm::BasicBlock