[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-24 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

That is an extremely Google-specific analysis, actually; AFAIK almost nobody 
else uses static linking all the way down to and including the C and C++ 
standard libraries unless they're literally building an executable for a 
fully-static environment, like the kernel.  The C and C++ language standards 
state that `operator delete` and `free` are independently overridable by just 
defining those functions outside the stdlib, so they generally cannot be 
resolved as direct calls without the sort of whole-program analysis that the 
linker can only do when linking the final executable.

I think a more reasonable benchmark would be to build a standard Linux 
executable that dynamically links libc and lib{std,}c++, or perhaps something 
with the ADK or Darwin.

I'm quite open to the idea that the right thing to do is just to do this in all 
modes, but I do think we should understand the cost a little better.  (Xcode 
defaults release builds to `-Os`, so in practice my proposal of "-Os or -O0" or 
would enable this by default for almost all builds for us at Apple.)

You can check for -Os by just checking `getCodeGenOpts().OptimizeSize`.

It should be quite easy to collect null-deletion counts by (1) enabling your 
patch and (2) writing an `operator delete` that counts nulls before calling 
`free` and reports that count in a global destructor.  Then you just need to 
pick a C++-heavy program to count them in. :)  Clang compiling 403.gcc isn't an 
unreasonable choice, although LLVM's use of allocation pools does mean that 
we're likely to have fewer `delete` calls than you might think.


Repository:
  rC Clang

https://reviews.llvm.org/D43430



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-23 Thread Andrew Hunter via Phabricator via cfe-commits
ahh added a comment.

> If the pointer is not null, the runtime overhead of the null check is pretty 
> negligible next to the cost of actually doing the allocation. If the pointer 
> is null, the runtime overhead of making at least one unnecessary call — 
> probably two, if 'operator delete' doesn't do its own null check before 
> calling 'free', and probably one that crosses image boundaries — is not 
> negligible at all. So the relative impact on code that does end up destroying 
> a trivial value is outsized.

In a reply of mine that I think got eaten by list moderation,  I looked into 
this and benchmarked the cost of ::operator delete; with our tcmalloc, the cost 
of deleting null is about 8 cycles (compared to an empty loop.) (I don't really 
know how to benchmark the version with an if around it, but if we assume that's 
free, 8 cycles is still very cheap.)

I suppose this might go up somewhat in an environment where we have to make 
some sort of PLT call or even two. My Google centric response is "don't do 
that"--linking directly against any modern malloc should avoid any jump in 
`::operator delete` and our environment make direct calls quite fast; I'm not 
sure how generalizable that is. (The linking is I think universally good 
advice; I'm not sure who runs in an environment that cannot make efficient far 
calls. But point is that: while your statement is true, the penalty for getting 
this wrong seems very small, and as you say any programmer can if around it at 
a "hot" null delete, no?

This is one of the few aspects of malloc calls that I don't have near-infinite 
telemetry for (our sampling architecture doesn't easily collect it.)  So I 
cannot give you a hard number of the fraction of deletes that are of null 
pointers, but I am convinced that is very small.  Would more (Google-internal, 
obviously) data on this make a decision easier?

I could see why maybe this could be gated on -Os, but I didn't do this for two 
reasons:

- I am new at Clang development and wasn't sure how to put that sort of a check 
in :) Though I can learn if this is a hard requirement.
- From our perspective, I think Google would want this flag in non-size 
optimization (-O2 or whatever.)  We delete null infrequently enough that I'd 
expect this to be a pure cycle win (if a very small one) and even though 
(because?) we don't optimize for size, we have a number of very large binaries, 
and reducing icache hit can help a lot.

I'm unsure exactly how to make progress here, since for one thing I'm unsure 
how strongly you feel about the potential cost/benefits. Guidance would be 
greatly appreciated!


Repository:
  rC Clang

https://reviews.llvm.org/D43430



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-19 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Discourse nitpick: I would encourage you to just use the ordinary phrase "null 
pointer", or just "null", when referring to a pointer value that happens to be 
null and to reserve "nullptr" for *statically* null pointers, especially the 
`nullptr` expression.

If the pointer is not null, the runtime overhead of the null check is pretty 
negligible next to the cost of actually doing the allocation.  If the pointer 
is null, the runtime overhead of making at least one unnecessary call — 
probably two, if 'operator delete' doesn't do its own null check before calling 
'free', and probably one that crosses image boundaries — is not negligible at 
all.  So the relative impact on code that does end up destroying a trivial 
value is outsized.

On the other hand, if the programmer adds an explicit null-check, it's unlikely 
to be optimized away; that means that if we did this automatically, there would 
still be an avenue for them to get the null check back.

The code-size argument against doing the null check seems strong, however.  
Have you considered just doing this in the code-size-sensitive modes, in 
particular -Os/-Oz (for obvious reasons) and -O0 (because less code size == 
faster compiles, especially when it involves control flow)?


Repository:
  rC Clang

https://reviews.llvm.org/D43430



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-18 Thread Andrew Hunter via Phabricator via cfe-commits
ahh updated this revision to Diff 134846.
ahh added a comment.

Fix indentation


Repository:
  rC Clang

https://reviews.llvm.org/D43430

Files:
  lib/CodeGen/CGCXXABI.h
  lib/CodeGen/CGExprCXX.cpp
  test/CodeGenCXX/cxx2a-destroying-delete.cpp
  test/CodeGenCXX/delete-two-arg.cpp
  test/CodeGenCXX/delete.cpp

Index: test/CodeGenCXX/delete.cpp
===
--- test/CodeGenCXX/delete.cpp
+++ test/CodeGenCXX/delete.cpp
@@ -9,7 +9,12 @@
 };
 
 // POD types.
+// CHECK-LABEL: define void @_Z2t3P1S
 void t3(S *s) {
+  // CHECK-NOT: icmp eq {{.*}}, null
+  // CHECK-NOT: br i1
+  // CHECK: call void @_ZdlPv
+
   delete s;
 }
 
@@ -99,6 +104,8 @@
 
 namespace test3 {
   void f(int a[10][20]) {
+// CHECK-NOT: icmp eq {{.*}}, null
+// CHECK-NOT: br i1
 // CHECK: call void @_ZdaPv(i8*
 delete a;
   }
@@ -113,6 +120,8 @@
 
   // CHECK-LABEL: define void @_ZN5test421global_delete_virtualEPNS_1XE
   void global_delete_virtual(X *xp) {
+// CHECK: icmp eq {{.*}}, null
+// CHECK: br i1
 //   Load the offset-to-top from the vtable and apply it.
 //   This has to be done first because the dtor can mess it up.
 // CHECK:  [[T0:%.*]] = bitcast [[X:%.*]]* [[XP:%.*]] to i64**
Index: test/CodeGenCXX/delete-two-arg.cpp
===
--- test/CodeGenCXX/delete-two-arg.cpp
+++ test/CodeGenCXX/delete-two-arg.cpp
@@ -9,8 +9,8 @@
   // CHECK-LABEL: define void @_ZN5test11aEPNS_1AE(
   void a(A *x) {
 // CHECK:  load
-// CHECK-NEXT: icmp eq {{.*}}, null
-// CHECK-NEXT: br i1
+// CHECK-NOT: icmp eq {{.*}}, null
+// CHECK-NOT: br i1
 // CHECK:  call void @_ZN5test11AdlEPvj(i8* %{{.*}}, i32 4)
 delete x;
   }
Index: test/CodeGenCXX/cxx2a-destroying-delete.cpp
===
--- test/CodeGenCXX/cxx2a-destroying-delete.cpp
+++ test/CodeGenCXX/cxx2a-destroying-delete.cpp
@@ -15,9 +15,8 @@
 void delete_A(A *a) { delete a; }
 // CHECK-LABEL: define {{.*}}delete_A
 // CHECK: %[[a:.*]] = load
-// CHECK: icmp eq %{{.*}} %[[a]], null
-// CHECK: br i1
-//
+// CHECK-NOT: icmp eq %{{.*}} %[[a]], null
+// CHECK-NOT: br i1
 // Ensure that we call the destroying delete and not the destructor.
 // CHECK-NOT: call
 // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]])
@@ -60,8 +59,8 @@
 // CHECK: %[[castbase:.*]] = bitcast {{.*}} %[[base]]
 //
 // CHECK: %[[a:.*]] = phi {{.*}} %[[castbase]]
-// CHECK: icmp eq %{{.*}} %[[a]], null
-// CHECK: br i1
+// CHECK-NOT: icmp eq %{{.*}} %[[a]], null
+// CHECK-NOT: br i1
 //
 // CHECK-NOT: call
 // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]])
Index: lib/CodeGen/CGExprCXX.cpp
===
--- lib/CodeGen/CGExprCXX.cpp
+++ lib/CodeGen/CGExprCXX.cpp
@@ -1971,26 +1971,57 @@
   CGF.PopCleanupBlock();
 }
 
-void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) {
-  const Expr *Arg = E->getArgument();
-  Address Ptr = EmitPointerWithAlignment(Arg);
+// Will we read through the deleted pointer?  If so,
+// we must first check it is not null.
+static bool DeleteMightAccessObject(CodeGenFunction ,
+const CXXDeleteExpr *E, QualType DeleteTy) {
 
-  // Null check the pointer.
-  llvm::BasicBlock *DeleteNotNull = createBasicBlock("delete.notnull");
-  llvm::BasicBlock *DeleteEnd = createBasicBlock("delete.end");
+  if (E->getOperatorDelete()->isDestroyingOperatorDelete()) {
+// It is safe to call destroying operator delete with nullptr arguments
+// ([expr.delete] tells us it is unspecified whether a deallocation
+// function is called) but a virtual destructor must be resolved
+// to find the right function, which we can't do on nullptr.
+auto *Dtor = DeleteTy->getAsCXXRecordDecl()->getDestructor();
+return Dtor && Dtor->isVirtual();
+  }
 
-  llvm::Value *IsNull = Builder.CreateIsNull(Ptr.getPointer(), "isnull");
+  if (E->isArrayForm()) {
+return CGF.CGM.getCXXABI().requiresArrayCookie(E, DeleteTy);
+  }
 
-  Builder.CreateCondBr(IsNull, DeleteEnd, DeleteNotNull);
-  EmitBlock(DeleteNotNull);
+  // Otherwise, we should avoid invoking any nontrivial destructor on
+  // a null object.
+  return DeleteTy.isDestructedType();
+}
 
+void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) {
+  const Expr *Arg = E->getArgument();
+  Address Ptr = EmitPointerWithAlignment(Arg);
   QualType DeleteTy = E->getDestroyedType();
 
+  // Null check the pointer, unless the destructor is trivial.  In that case,
+  // all we'll be doing is passing Ptr to ::operator delete(), which is
+  // well formed for nullptr arguments (and allowed by [expr.delete.7]
+  // The overwhelming majority of deletes are of non-nullptr, so there's
+  // no efficiency gain to be had by skipping the very rare exceptions, and
+  // it bleeds 

[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-18 Thread Richard Smith - zygoloid via Phabricator via cfe-commits
rsmith added a comment.

LGTM, but I'd also like @rjmccall's opinion.


Repository:
  rC Clang

https://reviews.llvm.org/D43430



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-18 Thread Richard Smith - zygoloid via Phabricator via cfe-commits
rsmith added a comment.

In https://reviews.llvm.org/D43430#1011269, @kimgr wrote:

> I wonder if this could have negative effects for frequent deletion of 
> nullptrs (e.g. a sometimes-allocated member of a heavily used value type).


For that to be better, I think we'd need one of two things to happen:

1. The compiler can statically detect that the pointer is null, and remove the 
call to `operator delete` and potentially other code too. (This happens, eg, 
when inlining `vector::push_back` on an empty `vector`.)
2. The condition cannot be determined statically, but dynamically it turns out 
that the pointer is very frequently null, so that the cost of the extra checks 
in the non-null case are cheaper than the cost of the function call in the null 
case.

For case 1, the optimizer already knows that it can remove calls to usual 
`operator delete` functions on a null pointer, so that optimization should not 
be inhibited by this change.

For case 2, it seems to me that our default assumption should probably be that 
most deleted pointers are not null. But I don't have measurements to back that 
up. If the user knows that their pointers are usually null, they can express 
that knowledge with an `if`, but if we always generate the branch on null here, 
then there would be no easy way for the programmer to express their intent that 
the pointer is usually not null.




Comment at: lib/CodeGen/CGExprCXX.cpp:1977-1978
+static bool DeleteMightAccessObject(CodeGenFunction ,
+  const CXXDeleteExpr *E,
+  QualType DeleteTy) {
 

Reindent.


Repository:
  rC Clang

https://reviews.llvm.org/D43430



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-18 Thread Andrew Hunter via cfe-commits
On Sat, Feb 17, 2018 at 1:41 AM Kim Gräsman via Phabricator <
revi...@reviews.llvm.org> wrote:

> kimgr added a comment.
>
> Peanut gallery observation: there was a discussion on the Boost list years
> and years ago where someone made the case that `if (x != nullptr) delete
> x;` was measurably faster than just calling `delete x;` I can't find it
> now, but I think it might have been in the context of their
> `checked_delete` library. Anyway, the reasoning was that with an external
> nullptr check, you'd pay for one comparison, but without it you'd always
> pay for a jump + a comparison. I suppose that only holds true for null
> pointers, for non-null pointers the extra check is just waste.
>
>
This is all correct, to some extent; that said, the net difference will be
a call to a reasonably hot function and an extra branch.  For grins, I
benchmarked the cost of ::operator delete (this is slightly challenging
because clang currently (mis?) compiles ::operator delete(nullptr) away,
but this is solvable...)  With our tcmalloc, the cost of deleting null is
about 8 cycles (compared to an empty loop.) (I don't really know how to
benchmark the version with an if around it, but if we assume that's free, 8
cycles is still very cheap.)


> It looks to me like the compiler inserts an external null check, and
> you're now removing it in select cases, did I understand that right?


Yes, precisely.


> I wonder if this could have negative effects for frequent deletion of
> nullptrs (e.g. a sometimes-allocated member of a heavily used value type).
>
>
In principle yes.  I would be very surprised if any binary deletes null
pointers often enough for this to be a significant concern (other than
statically-provable ones, which are still free.) You'd have to be doing it
a lot to notice those 8 cycles.
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-17 Thread Kim Gräsman via Phabricator via cfe-commits
kimgr added a comment.

Peanut gallery observation: there was a discussion on the Boost list years and 
years ago where someone made the case that `if (x != nullptr) delete x;` was 
measurably faster than just calling `delete x;` I can't find it now, but I 
think it might have been in the context of their `checked_delete` library. 
Anyway, the reasoning was that with an external nullptr check, you'd pay for 
one comparison, but without it you'd always pay for a jump + a comparison. I 
suppose that only holds true for null pointers, for non-null pointers the extra 
check is just waste.

It looks to me like the compiler inserts an external null check, and you're now 
removing it in select cases, did I understand that right? I wonder if this 
could have negative effects for frequent deletion of nullptrs (e.g. a 
sometimes-allocated member of a heavily used value type).

That said, I'm not sure how valid the observation back then still is.


Repository:
  rC Clang

https://reviews.llvm.org/D43430



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-17 Thread Andrew Hunter via Phabricator via cfe-commits
ahh added a comment.

On my workstation's checkout of head, one test fails (Clang :: 
Driver/response-file.c) both with and without this change; everything else 
appears to pass.

I believe that between the tests I add to delete.cpp and the ones that are 
already there (and destroying-delete.cpp) we cover every case that has to get a 
nullptr check, and pretty much every one that should *not*.

Name of the helper function is, uh, good enough for me, but no objections to 
changing it.


Repository:
  rC Clang

https://reviews.llvm.org/D43430



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-17 Thread Andrew Hunter via Phabricator via cfe-commits
ahh created this revision.
ahh added a reviewer: rsmith.
Herald added a subscriber: cfe-commits.

[expr.delete] is pretty crystal clear that it's OK to invoke a
deallocation-function on a nullptr delete-expression:

"If the value of the operand of the delete-expression is a null
pointer value, it is unspecified whether a deallocation function will
be called as described above."

Even so, we currently check against nullptr unconditionally. This is
wasteful for anything with a trivial destructor; deleting nullptr is
very rare so it's not worth saving the call to ::operator delete, and
this is a useless branch (and waste of code size) in the common case.

Condition emission of the branch on us needing to actually look
through the pointer for a vtable, size cookie, or nontrivial
destructor.  (In principle a nontrivial destructor with no side
effects that didn't touch the object would also be safe to run
unconditionally, but I don't know how to test we have one of those and
who in the world would write one?)

On an important and very large (~500 MiB) Google search binary, this
saves approximately 32 KiB of text. Okay, it's not impressive in a
relative sense, but it's still the right thing to do.

A note on optimization: we still successfully elide delete-expressions of
(literal) nullptr.  Where before they were stuck behind a never-taken branch,
now they reduce (effectively) to calls to __builtin_operator_delete(nullptr),
which EarlyCSE is capable of optimizing out. So this has no cost in the
already well-optimized case.


Repository:
  rC Clang

https://reviews.llvm.org/D43430

Files:
  lib/CodeGen/CGCXXABI.h
  lib/CodeGen/CGExprCXX.cpp
  test/CodeGenCXX/cxx2a-destroying-delete.cpp
  test/CodeGenCXX/delete-two-arg.cpp
  test/CodeGenCXX/delete.cpp

Index: test/CodeGenCXX/delete.cpp
===
--- test/CodeGenCXX/delete.cpp
+++ test/CodeGenCXX/delete.cpp
@@ -9,7 +9,12 @@
 };
 
 // POD types.
+// CHECK-LABEL: define void @_Z2t3P1S
 void t3(S *s) {
+  // CHECK-NOT: icmp eq {{.*}}, null
+  // CHECK-NOT: br i1
+  // CHECK: call void @_ZdlPv
+
   delete s;
 }
 
@@ -99,6 +104,8 @@
 
 namespace test3 {
   void f(int a[10][20]) {
+// CHECK-NOT: icmp eq {{.*}}, null
+// CHECK-NOT: br i1
 // CHECK: call void @_ZdaPv(i8*
 delete a;
   }
@@ -113,6 +120,8 @@
 
   // CHECK-LABEL: define void @_ZN5test421global_delete_virtualEPNS_1XE
   void global_delete_virtual(X *xp) {
+// CHECK: icmp eq {{.*}}, null
+// CHECK: br i1
 //   Load the offset-to-top from the vtable and apply it.
 //   This has to be done first because the dtor can mess it up.
 // CHECK:  [[T0:%.*]] = bitcast [[X:%.*]]* [[XP:%.*]] to i64**
Index: test/CodeGenCXX/delete-two-arg.cpp
===
--- test/CodeGenCXX/delete-two-arg.cpp
+++ test/CodeGenCXX/delete-two-arg.cpp
@@ -9,8 +9,8 @@
   // CHECK-LABEL: define void @_ZN5test11aEPNS_1AE(
   void a(A *x) {
 // CHECK:  load
-// CHECK-NEXT: icmp eq {{.*}}, null
-// CHECK-NEXT: br i1
+// CHECK-NOT: icmp eq {{.*}}, null
+// CHECK-NOT: br i1
 // CHECK:  call void @_ZN5test11AdlEPvj(i8* %{{.*}}, i32 4)
 delete x;
   }
Index: test/CodeGenCXX/cxx2a-destroying-delete.cpp
===
--- test/CodeGenCXX/cxx2a-destroying-delete.cpp
+++ test/CodeGenCXX/cxx2a-destroying-delete.cpp
@@ -15,9 +15,8 @@
 void delete_A(A *a) { delete a; }
 // CHECK-LABEL: define {{.*}}delete_A
 // CHECK: %[[a:.*]] = load
-// CHECK: icmp eq %{{.*}} %[[a]], null
-// CHECK: br i1
-//
+// CHECK-NOT: icmp eq %{{.*}} %[[a]], null
+// CHECK-NOT: br i1
 // Ensure that we call the destroying delete and not the destructor.
 // CHECK-NOT: call
 // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]])
@@ -60,8 +59,8 @@
 // CHECK: %[[castbase:.*]] = bitcast {{.*}} %[[base]]
 //
 // CHECK: %[[a:.*]] = phi {{.*}} %[[castbase]]
-// CHECK: icmp eq %{{.*}} %[[a]], null
-// CHECK: br i1
+// CHECK-NOT: icmp eq %{{.*}} %[[a]], null
+// CHECK-NOT: br i1
 //
 // CHECK-NOT: call
 // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]])
Index: lib/CodeGen/CGExprCXX.cpp
===
--- lib/CodeGen/CGExprCXX.cpp
+++ lib/CodeGen/CGExprCXX.cpp
@@ -1971,26 +1971,58 @@
   CGF.PopCleanupBlock();
 }
 
-void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) {
-  const Expr *Arg = E->getArgument();
-  Address Ptr = EmitPointerWithAlignment(Arg);
+// Will we read through the deleted pointer?  If so,
+// we must first check it is not null.
+static bool DeleteMightAccessObject(CodeGenFunction ,
+  const CXXDeleteExpr *E,
+  QualType DeleteTy) {
 
-  // Null check the pointer.
-  llvm::BasicBlock *DeleteNotNull = createBasicBlock("delete.notnull");
-  llvm::BasicBlock