[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2020-01-09 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

In D70172#1812664 , @yaxunl wrote:

> In D70172#1812631 , @rjmccall wrote:
>
> > Most uses of the destructor do not use the delete operator, though, and 
> > therefore should not trigger the diagnostics in `f` to be emitted.  And 
> > this really doesn't require a fully-realized use graph; you could very 
> > easily track the current use stack when making a later pass over the 
> > entities used.
>
>
> The call graph is not for this specific situation. A call graph is needed 
> because of the transitive nature of the deferred diagnostic message. That is, 
> if any direct or indirect caller is emitted, the diagnostic msg needs to be 
> emitted.


One of the points that Richard and I have been trying to make is that this 
really isn't specifically about *calls*, it's about *uses*.  You only want to 
emit diagnostics associated with an entity if you actually have to emit that 
entity, and whether you emit an entity has nothing to do with what places might 
*call* it, but rather what places *use* it and therefore force it to be 
emitted.  This is fortunate because call graphs are inherently imperfect 
because of indirect calls, but use graphs are totally reliable.  It's also 
fortunate because it means you can piggy-back on all of the existing logic that 
Sema has for tracking ODR uses.

Richard and I are also pointing out that Sema has to treat the v-table as its 
own separate thing when tracking ODR uses, and so you need to as well.  You 
want to emit diagnostics associated with a virtual function if you're emitting 
code that either (1) directly uses the function (e.g. by calling `x->A::foo()`) 
or (2) directly uses a v-table containing the function.  You can't rely on 
Sema's normal ODR-use tracking for *either* of these, because Sema might have 
observed a use in code that you don't actually have to emit, e.g. host code if 
you're compiling for the device.  That is, a v-table is only a "root" for 
virtual functions if you actually have to emit that v-table, and you can't know 
that without tracking v-tables in your use graph.

> The deferred diagnostic msg is recorded when parsing a function body. At that 
> time we do not know which function will directly or indirectly call it. How 
> do we keep a use stack?

The "use stack" idea would apply if you switched from eagerly creating the 
entire use graph to instead just making a late pass that walked function 
bodies.  If you walk function bodies depth-first, starting from a true root and 
gathering all the ODR-used entities to be  recursively walked, then you can 
maintain a stack of what entities you're currently walking, and that stack is a 
use-path that explains why you need to emit the current function.

It should be straightforward to build a function that walks over the entities 
used by a function body and calls a callback by just extracting it out of the 
code in `MarkDeclarationsUsedInExpr`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2020-01-09 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D70172#1812631 , @rjmccall wrote:

> Most uses of the destructor do not use the delete operator, though, and 
> therefore should not trigger the diagnostics in `f` to be emitted.  And this 
> really doesn't require a fully-realized use graph; you could very easily 
> track the current use stack when making a later pass over the entities used.


The call graph is not for this specific situation. A call graph is needed 
because of the transitive nature of the deferred diagnostic message. That is, 
if any direct or indirect caller is emitted, the diagnostic msg needs to be 
emitted.

The deferred diagnostic msg is recorded when parsing a function body. At that 
time we do not know which function will directly or indirectly call it. How do 
we keep a use stack?

When we parsing other function bodies, we only know the direct callee. Since we 
do not know if this function indirectly calls the function with deferred 
diagnostics, we have to keep a record of all the caller/callee edges.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2020-01-09 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D70172#1810665 , @rsmith wrote:

> This doesn't look quite right to me. I don't think we should treat the 
> `delete this;` for a destructor as being emitted-for-device in any 
> translation unit in which the vtable is marked used. (For example, if in your 
> testcase `MSEmitDeletingDtor::CFileStream::CFileStream()` were a `__host__` 
> function, I think you'd still diagnose, but presumably shouldn't do so, 
> because the vtable -- and therefore `CFileStream::operator delete` -- is 
> never referenced / emitted for the device.) Instead, I think we should treat 
> the `delete this;` as being emitted in any translation unit in which the 
> vtable itself is emitted-for-device. Presumably, this means you will need to 
> model / track usage of the vtable itself in your "call graph".


A user declared ctor/dtor by default is `__host__`.

Let's consider this testcase:

  static __device__ __host__ void f(__m256i *p) {
__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
   : "r0"); // MS-error{{unknown register name 'r0' in asm}}
  }
  struct CFileStream {
void operator delete(void *p) {
  f(0);  // MS-note{{called by 'operator delete'}}
}
CFileStream();
virtual ~CFileStream();  // MS-note{{called by '~CFileStream'}}
  };
  
  struct CMultiFileStream {
CFileStream m_fileStream;
~CMultiFileStream();
  };
  
  // This causes vtable emitted so that deleting dtor is emitted for MS.
  CFileStream::CFileStream() {}

In host compilation, vtbl is emitted, since it causes dtor emitted, whereas 
dtor calls f(), therefore the diagnostic msg is emitted.

In device compilation, vtbl is not emitted, therefore dtor is not emitted, and 
the diagnostic msg in f() is not emitted.

We only need an entity in call graph if that entity can be called by other 
entities. Here vtbl is always at the top level of the 'call graph'. Therefore 
it is not needed to be in the call graph.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2020-01-09 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

In D70172#1812533 , @yaxunl wrote:

> In D70172#1809571 , @rjmccall wrote:
>
> > I thought you were saying that the destructor decl hadn't been created yet, 
> > but I see now that you're saying something more subtle.
> >
> > `CurContext` is set to the destructor because the standard says in 
> > [class.dtor]p13:
> >
> >   At the point of definition of a virtual destructor (including an implicit 
> > definition), the non-array deallocation function is determined as if for 
> > the expression `delete this` appearing in a non-virtual destructor of the 
> > destructor’s class.
> >   
> >
> > Which is to say that, semantically, the context is as if it were within the 
> > destructor, to the extent that this affects access control and so on.
> >
> > I can see why this causes problems for your call graph (really a use 
> > graph), since it's a use in the apparent context of the destructor at a 
> > point where the destructor is not being defined.  A similar thing happens 
> > with default arguments, but because we don't consider uses from default 
> > arguments to be true ODR-uses until the default argument is used, that 
> > probably doesn't cause problems for you.
> >
> > I don't think the destructor -> deallocation function edge is actually 
> > interesting for your use graph.  It'd be more appropriate to treat the 
> > deallocation function as used by the v-table than by the destructor; I 
> > don't know whether you make any attempt to model v-tables as nodes in your 
> > use graph.  You might consider finding a simple way to suppress adding this 
> > edge, like just not adding edges from a destructor that's not currently 
> > being defined (`D->willHaveBody()`).
> >
> > With all that said, maintaining a use graph for all the functions you might 
> > emit in the entire translation unit seems very expensive and brittle.  Have 
> > you considered doing this walk in a final pass?   You could just build up a 
> > set of all the functions you know you're going to emit and then walk their 
> > bodies looking for uses of lazy-emitted entities.  If we don't already have 
> > a function that calls a callback for every declaration ODR-used by a 
> > function body, we should.
>
>
> The deferred diagnostic mechanism is shared between CUDA/HIP and OpenMP. The 
> diagnostic messages not only depend on the callee, but also depend on the 
> caller, the caller information needs to be kept. Also if a caller is to be 
> emitted, all the deferred diagnostics associated with the direct or indirect 
> callees need to be emitted. Therefore a call graph is needed for this 
> mechanism.
>
> If we ignore the dtor->deallocation edge in the call graph, we may miss 
> diagnostics, e.g.
>
>   static __device__ __host__ void f(__m256i *p) {
> __asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
>: "r0"); // MS-error{{unknown register name 'r0' in asm}}
>   }
>   struct CFileStream {
> void operator delete(void *p) {
>   f(0);  // MS-note{{called by 'operator delete'}}
> }
> CFileStream();
> virtual ~CFileStream();  // MS-note{{called by '~CFileStream'}}
>   };
>   
>   struct CMultiFileStream {
> CFileStream m_fileStream;
> ~CMultiFileStream();
>   };
>   
>   // This causes vtable emitted so that deleting dtor is emitted for MS.
>   CFileStream::CFileStream() {}
>   
>
> Assuming the host compilation is on windows.
>
> Here f() is a host device function which is unknown to be emitted, therefore 
> the inline assembly error results in a delayed diagnostic. When f() is 
> checked in the delete operator body, a 'delete operator -> f' edge is added 
> to the call graph since f() is unknown to be emitted.
>
> Since CFileStream::CFileStream is defined, clang sets vtbl to be emitted and 
> does an explicit dtor check even though dtor is not defined. clang knows that 
> this dtor check is for deleting dtor and will check delete operator as 
> referenced, which causes `dtor -> delete operator' to be added to the call 
> graph. Then clang checks dtor as referenced. Since deleting dtor will be 
> emitted together with vtbl, clang should assume dtor is to be emitted. Then 
> clang will found the callees 'delete operator' and f(), and emits the delayed 
> diagnostics associated with them.
>
> If we do not add 'dtor -> delete operator' edge to the call graph, the 
> diagnostic msg in f() will not be emitted.


Most uses of the destructor do not use the delete operator, though, and 
therefore should not trigger the diagnostics in `f` to be emitted.  And this 
really doesn't require a fully-realized use graph; you could very easily track 
the current use stack when making a later pass over the entities used.

Also I agree with Richard that you really need the v-table to be a node in your 
use graph/stack.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2020-01-09 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D70172#1809571 , @rjmccall wrote:

> I thought you were saying that the destructor decl hadn't been created yet, 
> but I see now that you're saying something more subtle.
>
> `CurContext` is set to the destructor because the standard says in 
> [class.dtor]p13:
>
>   At the point of definition of a virtual destructor (including an implicit 
> definition), the non-array deallocation function is determined as if for the 
> expression `delete this` appearing in a non-virtual destructor of the 
> destructor’s class.
>   
>
> Which is to say that, semantically, the context is as if it were within the 
> destructor, to the extent that this affects access control and so on.
>
> I can see why this causes problems for your call graph (really a use graph), 
> since it's a use in the apparent context of the destructor at a point where 
> the destructor is not being defined.  A similar thing happens with default 
> arguments, but because we don't consider uses from default arguments to be 
> true ODR-uses until the default argument is used, that probably doesn't cause 
> problems for you.
>
> I don't think the destructor -> deallocation function edge is actually 
> interesting for your use graph.  It'd be more appropriate to treat the 
> deallocation function as used by the v-table than by the destructor; I don't 
> know whether you make any attempt to model v-tables as nodes in your use 
> graph.  You might consider finding a simple way to suppress adding this edge, 
> like just not adding edges from a destructor that's not currently being 
> defined (`D->willHaveBody()`).
>
> With all that said, maintaining a use graph for all the functions you might 
> emit in the entire translation unit seems very expensive and brittle.  Have 
> you considered doing this walk in a final pass?   You could just build up a 
> set of all the functions you know you're going to emit and then walk their 
> bodies looking for uses of lazy-emitted entities.  If we don't already have a 
> function that calls a callback for every declaration ODR-used by a function 
> body, we should.


The deferred diagnostic mechanism is shared between CUDA/HIP and OpenMP. The 
diagnostic messages not only depend on the callee, but also depend on the 
caller, the caller information needs to be kept. Also if a caller is to be 
emitted, all the deferred diagnostics associated with the direct or indirect 
callees need to be emitted. Therefore a call graph is needed for this mechanism.

If we ignore the dtor->deallocation edge in the call graph, we may miss 
diagnostics, e.g.

  static __device__ __host__ void f(__m256i *p) {
__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
   : "r0"); // MS-error{{unknown register name 'r0' in asm}}
  }
  struct CFileStream {
void operator delete(void *p) {
  f(0);  // MS-note{{called by 'operator delete'}}
}
CFileStream();
virtual ~CFileStream();  // MS-note{{called by '~CFileStream'}}
  };
  
  struct CMultiFileStream {
CFileStream m_fileStream;
~CMultiFileStream();
  };
  
  // This causes vtable emitted so that deleting dtor is emitted for MS.
  CFileStream::CFileStream() {}

Assuming the host compilation is on windows.

Here f() is a host device function which is unknown to be emitted, therefore 
the inline assembly error results in a delayed diagnostic. When f() is checked 
in the delete operator body, a 'delete operator -> f' edge is added to the call 
graph since f() is unknown to be emitted.

Since CFileStream::CFileStream is defined, clang sets vtbl to be emitted and 
does an explicit dtor check even though dtor is not defined. clang knows that 
this dtor check is for deleting dtor and will check delete operator as 
referenced, which causes `dtor -> delete operator' to be added to the call 
graph. Then clang checks dtor as referenced. Since deleting dtor will be 
emitted together with vtbl, clang should assume dtor is to be emitted. Then 
clang will found the callees 'delete operator' and f(), and emits the delayed 
diagnostics associated with them.

If we do not add 'dtor -> delete operator' edge to the call graph, the 
diagnostic msg in f() will not be emitted.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2020-01-09 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 237122.
yaxunl added a comment.

Add tests for device compilation.

Add a test when both vtbl and deleting dtor are emitted with diagnostic due to 
delete operator.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172

Files:
  clang/lib/Sema/SemaDecl.cpp
  clang/test/SemaCUDA/deleting-dtor.cu

Index: clang/test/SemaCUDA/deleting-dtor.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/deleting-dtor.cu
@@ -0,0 +1,119 @@
+// RUN: %clang_cc1 -triple x86_64-pc-windows-msvc -fsyntax-only -verify=MS -verify=GEN %s
+// RUN: %clang_cc1 -triple x86_64-pc-linux -fsyntax-only -verify=GEN %s
+// RUN: %clang_cc1 -triple nvptx -fcuda-is-device -fsyntax-only -verify=DEV %s
+
+// DEV-no-diagnostics
+#include "Inputs/cuda.h"
+typedef long long __m256i __attribute__((__vector_size__(32)));
+
+namespace NoDiag {
+  struct CFileStream {
+CFileStream();
+virtual ~CFileStream();
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+
+  CFileStream::CFileStream() {}
+  CFileStream::~CFileStream() {}
+  CMultiFileStream::~CMultiFileStream() {}
+}
+
+// No diagnostic since deleting dtor is not emitted.
+namespace NoVtbl {
+// Intentionally generates delayed diagnostic about r0.
+// This diagnostic is not supposed to be emitted unless f is emitted.
+  static __device__ __host__ void f(__m256i *p) {
+__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
+ : "r0");
+  }
+  struct CFileStream {
+void operator delete(void *p) {
+  f(0);
+}
+CFileStream();
+virtual ~CFileStream();
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+}
+
+// Only MS has diagnostic since MS requires deleting dtor to be emitted when
+// vtable is emitted, even though dtor is not defined.
+namespace MSEmitDeletingDtor {
+  static __device__ __host__ void f(__m256i *p) {
+__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
+   : "r0"); // MS-error{{unknown register name 'r0' in asm}}
+  }
+  struct CFileStream {
+void operator delete(void *p) {
+  f(0);  // MS-note{{called by 'operator delete'}}
+}
+CFileStream();
+virtual ~CFileStream();  // MS-note{{called by '~CFileStream'}}
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+
+  // This causes vtable emitted so that deleting dtor is emitted for MS.
+  CFileStream::CFileStream() {}
+}
+
+// Both MS and Linux host compilation has diagnostic since deleting dtor is
+// emitted.
+namespace EmitDeletingDtor {
+  static __device__ __host__ void f(__m256i *p) {
+__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
+   : "r0"); // GEN-error{{unknown register name 'r0' in asm}}
+  }
+  struct CFileStream {
+void operator delete(void *p) {
+  f(0);  // GEN-note{{called by 'operator delete'}}
+}
+CFileStream();
+virtual ~CFileStream();
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+
+  CFileStream::~CFileStream() {} // GEN-note{{called by '~CFileStream'}}
+  CMultiFileStream::~CMultiFileStream() {}
+  // This causes vtable emitted so that deleting dtor is emitted for MS.
+  CFileStream::CFileStream() {}
+}
+
+namespace EmitDtor {
+  static __device__ __host__ void f(__m256i *p) {
+__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
+ : "r0"); // GEN-error{{unknown register name 'r0' in asm}}
+  }
+  struct CFileStream {
+void operator delete(void *p) {
+  f(0);
+}
+CFileStream();
+virtual ~CFileStream();
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+
+  CFileStream::~CFileStream() {
+f(0); // GEN-note{{called by '~CFileStream'}}
+  }
+}
+
Index: clang/lib/Sema/SemaDecl.cpp
===
--- clang/lib/Sema/SemaDecl.cpp
+++ clang/lib/Sema/SemaDecl.cpp
@@ -17772,6 +17772,17 @@
   if (FD->isDependentContext())
 return FunctionEmissionStatus::TemplateDiscarded;
 
+  // The Microsoft ABI requires that we perform the destructor body
+  // checks (i.e. operator delete() lookup) when the vtable is marked used, as
+  // the deleting destructor is emitted with the vtable. Such check may happen
+  // even though the destructor is not defined yet.
+  if (Context.getTargetInfo().getCXXABI().isMicrosoft()) {
+if (auto *DD = dyn_cast(FD)) {
+  if (DD == CurContext && !DD->getDefinition())
+return FunctionEmissionStatus::Emitted;
+}
+  }
+
   FunctionEmissionStatus OMPES = FunctionEmissionStatus::Unknown;
   if (LangOpts.OpenMPIsDevice) {
 Optional DevTy =
___
cfe-commits mailing list
cfe-commits@lists.llvm.org

[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2020-01-08 Thread Richard Smith - zygoloid via Phabricator via cfe-commits
rsmith added a comment.
Herald added a subscriber: herhut.

This doesn't look quite right to me. I don't think we should treat the `delete 
this;` for a destructor as being emitted-for-device in any translation unit in 
which the vtable is marked used. (For example, if in your testcase 
`MSEmitDeletingDtor::CFileStream::CFileStream()` were a `__host__` function, I 
think you'd still diagnose, but presumably shouldn't do so, because the vtable 
-- and therefore `CFileStream::operator delete` -- is never referenced / 
emitted for the device.) Instead, I think we should treat the `delete this;` as 
being emitted in any translation unit in which the vtable itself is 
emitted-for-device. Presumably, this means you will need to model / track usage 
of the vtable itself in your "call graph".


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2020-01-08 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

I thought you were saying that the destructor decl hadn't been created yet, but 
I see now that you're saying something more subtle.

`CurContext` is set to the destructor because the standard says in 
[class.dtor]p13:

  At the point of definition of a virtual destructor (including an implicit 
definition), the non-array deallocation function is determined as if for the 
expression `delete this` appearing in a non-virtual destructor of the 
destructor’s class.

Which is to say that, semantically, the context is as if it were within the 
destructor, to the extent that this affects access control and so on.

I can see why this causes problems for your call graph (really a use graph), 
since it's a use in the apparent context of the destructor at a point where the 
destructor is not being defined.  A similar thing happens with default 
arguments, but because we don't consider uses from default arguments to be true 
ODR-uses until the default argument is used, that probably doesn't cause 
problems for you.

I don't think the destructor -> deallocation function edge is actually 
interesting for your use graph.  It'd be more appropriate to treat the 
deallocation function as used by the v-table than by the destructor; I don't 
know whether you make any attempt to model v-tables as nodes in your use graph. 
 You might consider finding a simple way to suppress adding this edge, like 
just not adding edges from a destructor that's not currently being defined 
(`D->willHaveBody()`).

With all that said, maintaining a use graph for all the functions you might 
emit in the entire translation unit seems very expensive and brittle.  Have you 
considered doing this walk in a final pass?   You could just build up a set of 
all the functions you know you're going to emit and then walk their bodies 
looking for uses of lazy-emitted entities.  If we don't already have a function 
that calls a callback for every declaration ODR-used by a function body, we 
should.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2020-01-07 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D70172#1772140 , @rjmccall wrote:

> Richard is definitely our main expert in the implicit synthesis of special 
> members.  It seems to me that if we need the destructor declaration at some 
> point, we should be forcing it to exist at that point.


In AST there are no separate decls for deleting dtors and complete object 
dtors. In AST there are only complete object dtors. In codegen when clang emits 
the definition of a deleting dtor, clang uses GlobalDecl with Dtor_Deleting. 
However AST does not have that.

Since a deleting dtor is supposed to call a complete object dtor, clang needs 
to check the complete object dtor in the context of the deleting dtor. Since 
deleting dtor is synthesized in codegen and does not have a body, clang 
manually pushed the decl of the complete object dtor as context and checks the 
same complete object dtor.

One may consider using GlobalDecl to differentiate complete object dtor and 
deleting dtor in AST. However that requires to use GlobalDecl to replace Decl 
in many places in Sema, which seems to be an overkill.

Fortunately, we could identify the deleting dtor by context without using 
GlobalDecl.

There are two cases :

1. There is no definition of complete object dtor,

When clang checks a dtor, if the caller is itself and the caller has no 
definition. This can only happen when clang checks the deleting dtor. Clang 
should just assumes the dtor is emitted. Since the dtor has no definition, 
there is no deferred diagnostics emitted. Clang just add a call graph branch 
dtor->dtor to the call graph. There is no deferred diagnostics happening with 
the dtor since the deleting dtor only calls complete object dtor and 
deallocating functions which are not supposed to cause diagnostics.

Later, if the dtor is called in other functions and checked, since the caller 
is not itself, it is treated as a normal function, i.e., whether it is emitted 
is determined by whether it has definition. Since the deleting dtor does not 
have extra deferred diagnostics compared with complete object dtor, there is no 
need to differentiate whether the callee is deleting dtor or complet object 
dtor.

If the complete object dtor is defined, its callees and deferred diagnostics 
happening in its body will be recorded as normal functions. If the complete 
object dtor or deleting dtor is called by other functions, the deferred 
diagnotics of the complete object dtor will be emitted.

2. There is definition of complete object dtor.

Clang will not check the deleting dtor. In this case the complete object dtor 
will be checked as a normal function. As discussed in case 1, deleting dtor 
should result in the same deferred diagnotics as complete object dtor, 
therefore there is no need to differentiate call of deleting dtor and complete 
object dtor.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2019-12-05 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Richard is definitely our main expert in the implicit synthesis of special 
members.  It seems to me that if we need the destructor declaration at some 
point, we should be forcing it to exist at that point.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2019-12-04 Thread Reid Kleckner via Phabricator via cfe-commits
rnk added a comment.

In D70172#1746451 , @yaxunl wrote:

> We are not using Itanium ABI when we do host compilation of CUDA/HIP on 
> windows. During the host compilation on windows only MS C++ ABI is used.
>
> This issue is not due to mixing MS ABI with Itanium ABI.
>  ...


I think I might have understood all that.

Really, the problem is that, in C++, there are many kinds of special members 
created by the compiler that are not modeled in the AST. Deleting destructors 
are a good example. If we consistently used GlobalDecl throughout Sema, then we 
would be able to separate marking the deleting destructor referenced from 
marking the base destructor referenced, and this code would be easier to 
understand.

However, given the way things stand, your new approach seems like a reasonable 
way of detecting the case of referencing the deleting dtor here. So from my 
perspective, this is fine. @rjmccall, assuming that Richard doesn't have time 
to give any input, do you still think this needs his review?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2019-12-04 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/test/SemaCUDA/deleting-dtor.cu:45-46
+
+// Only MS has diagnostic since MS requires deleting dtor is emitted when
+// vtable is emitted, even though dtor is not defined.
+namespace MSEmitDeletingDtor {

tra wrote:
> Nit: I think it should be `requires deleting dtor to be emitted` or `requires 
> that deleting dtor is emitted`
fixed


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2019-12-04 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 232197.
yaxunl marked 2 inline comments as done.
yaxunl added a comment.

remove unnecessary states added to Sema.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172

Files:
  clang/lib/Sema/SemaDecl.cpp
  clang/test/SemaCUDA/deleting-dtor.cu

Index: clang/test/SemaCUDA/deleting-dtor.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/deleting-dtor.cu
@@ -0,0 +1,91 @@
+// RUN: %clang_cc1 -triple x86_64-pc-windows-msvc -fsyntax-only -verify=MS -verify=GEN %s
+// RUN: %clang_cc1 -triple x86_64-pc-linux -fsyntax-only -verify=GEN %s
+
+#include "Inputs/cuda.h"
+typedef long long __m256i __attribute__((__vector_size__(32)));
+
+namespace NoDiag {
+  struct CFileStream {
+CFileStream();
+virtual ~CFileStream();
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+
+  CFileStream::CFileStream() {}
+  CFileStream::~CFileStream() {}
+  CMultiFileStream::~CMultiFileStream() {}
+}
+
+// No diagnostic since deleting dtor is not emitted.
+namespace NoVtbl {
+// Intentionally generates delayed diagnostic about r0.
+// This diagnostic is not supposed to be emitted unless f is emitted.
+  static __device__ __host__ void f(__m256i *p) {
+__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
+ : "r0");
+  }
+  struct CFileStream {
+void operator delete(void *p) {
+  f(0);
+}
+CFileStream();
+virtual ~CFileStream();
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+}
+
+// Only MS has diagnostic since MS requires deleting dtor to be emitted when
+// vtable is emitted, even though dtor is not defined.
+namespace MSEmitDeletingDtor {
+  static __device__ __host__ void f(__m256i *p) {
+__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
+   : "r0"); // MS-error{{unknown register name 'r0' in asm}}
+  }
+  struct CFileStream {
+void operator delete(void *p) {
+  f(0);  // MS-note{{called by 'operator delete'}}
+}
+CFileStream();
+virtual ~CFileStream();  // MS-note{{called by '~CFileStream'}}
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+
+  // This causes vtable emitted so that deleting dtor is emitted for MS.
+  CFileStream::CFileStream() {}
+}
+
+namespace EmitDtor {
+  static __device__ __host__ void f(__m256i *p) {
+__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
+ : "r0"); // GEN-error{{unknown register name 'r0' in asm}}
+  }
+  struct CFileStream {
+void operator delete(void *p) {
+  f(0);
+}
+CFileStream();
+virtual ~CFileStream();
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+
+  CFileStream::~CFileStream() {
+f(0); // GEN-note{{called by '~CFileStream'}}
+  }
+}
+
Index: clang/lib/Sema/SemaDecl.cpp
===
--- clang/lib/Sema/SemaDecl.cpp
+++ clang/lib/Sema/SemaDecl.cpp
@@ -17746,6 +17746,17 @@
   if (FD->isDependentContext())
 return FunctionEmissionStatus::TemplateDiscarded;
 
+  // The Microsoft ABI requires that we perform the destructor body
+  // checks (i.e. operator delete() lookup) when the vtable is marked used, as
+  // the deleting destructor is emitted with the vtable. Such check may happen
+  // even though the destructor is not defined yet.
+  if (Context.getTargetInfo().getCXXABI().isMicrosoft()) {
+if (auto *DD = dyn_cast(FD)) {
+  if (DD == CurContext && !DD->getDefinition())
+return FunctionEmissionStatus::Emitted;
+}
+  }
+
   FunctionEmissionStatus OMPES = FunctionEmissionStatus::Unknown;
   if (LangOpts.OpenMPIsDevice) {
 Optional DevTy =
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2019-11-14 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D70172#1745998 , @rnk wrote:

> Are we sure using both Itanium and MS C++ ABIs at the same time is really the 
> best way forward here? What are the constraints on CUDA that require the 
> Itanium ABI? I'm sure there are real reasons you can't just use the MS ABI as 
> is, but I'm curious what they are. Was there some RFC or design showing that 
> this is the right way forward?
>
> I wonder if it would be more productive to add new, more expansive 
> attributes, similar to `__attribute__((ms_struct))`, that tag class or 
> function decls as MS or Itanium C++ ABI. CUDA could then leverage this as 
> needed, and it would be much easier to construct test cases for MS/Itanium 
> interop. This is an expansion in scope, but it seems like it could be 
> generally useful, and if we're already going to enter the crazy world of 
> multiple C++ ABIs in a single TU, we might as well bite the bullet and do it 
> in a way that isn't specific to CUDA.


We are not using Itanium ABI when we do host compilation of CUDA/HIP on 
windows. During the host compilation on windows only MS C++ ABI is used.

This issue is not due to mixing MS ABI with Itanium ABI.

This issue arises from the delayed diagnostics for CUDA/HIP. Basically we do 
not want to emit certain diagnostics (e.g. error in inline assembly code) in 
`__host__` `__device__` functions to avoid clutter. We only want to emit such 
diagnostics once we are certain these functions will be emitted in IR.

To implement this, clang maintains a call graph. For each reference to a 
function, clang checks the current context. If it is evaluating context and it 
is a function, clang assumes the referenced function is callee and its context 
is the caller. Clang checks if the caller is known to be emitted (if it has 
body and external linkage). If not, clang adds this caller/callee pair to the 
call graph. If the caller is known to be emitted, clang will check if the 
callee is known to be emitted. If so, do nothing. If the callee is not known to 
be emitted, clang will eliminate it and all its callee from the call graph, and 
emits the delayed diagnostics associated with them.

You can see a caller is added to the call graph only if it is not known to be 
emitted. Therefore clang has an assert that if a callee is known to be emitted, 
it should not be in the call graph.

On windows, when vtable is known to be emitted for a class, clang does a body 
check for dtor of the class. It makes the dtor as the context, then checks the 
dtor. I think it is to emulate the situation that a deleting dtor is calling a 
normal dtor. This happens if the dtor is not defined since otherwise the dtor 
has already been checked. Since dtor is not defined yet, it is not known to be 
emitted and put into call graph. Later on, if the dtor is defined, it will be 
checked again. This time it is known to be emitted, then clang finds that it is 
in the call graph, then the assert fails.

So the issue is that clang incorrectly assume the dtor is not known to be 
emitted in the first check and put it in the call graph. To fix that, a map is 
added to Sema to tell clang that it is checking a deleting dtor which is 
supposed to be emitted even if it is not defined.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2019-11-14 Thread Reid Kleckner via Phabricator via cfe-commits
rnk added a comment.

Are we sure using both Itanium and MS C++ ABIs at the same time is really the 
best way forward here? What are the constraints on CUDA that require the 
Itanium ABI? I'm sure there are real reasons you can't just use the MS ABI as 
is, but I'm curious what they are. Was there some RFC or design showing that 
this is the right way forward?

I wonder if it would be more productive to add new, more expansive attributes, 
similar to `__attribute__((ms_struct))`, that tag class or function decls as MS 
or Itanium C++ ABI. CUDA could then leverage this as needed, and it would be 
much easier to construct test cases for MS/Itanium interop. This is an 
expansion in scope, but it seems like it could be generally useful, and if 
we're already going to enter the crazy world of multiple C++ ABIs in a single 
TU, we might as well bite the bullet and do it in a way that isn't specific to 
CUDA.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2019-11-13 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

sorry I think I misunderstood the meaning of "blocking" so I put it back.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2019-11-13 Thread John McCall via Phabricator via cfe-commits
rjmccall added a reviewer: rsmith.
rjmccall added a subscriber: rsmith.
rjmccall added a comment.

This seems like the wrong approach; @rsmith should take a look.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2019-11-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: rnk.
tra added a comment.

Calling @rnk for Windows know-how.




Comment at: clang/test/SemaCUDA/deleting-dtor.cu:45-46
+
+// Only MS has diagnostic since MS requires deleting dtor is emitted when
+// vtable is emitted, even though dtor is not defined.
+namespace MSEmitDeletingDtor {

Nit: I think it should be `requires deleting dtor to be emitted` or `requires 
that deleting dtor is emitted`


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

2019-11-13 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl created this revision.
yaxunl added reviewers: tra, rjmccall.

The Microsoft ABI requires that clang performs the destructor body
checks (i.e. operator delete() lookup) when the vtable is marked used, as
the deleting destructor is emitted with the vtable, not with the
destructor definition as in the Itanium ABI. This can cause a CXXDestrcuctorDecl
to be passed to CheckDestructor even though the CXXDestrcuctorDecl is
not defined yet. Later on, if the same dtor is defined, it will be passed to
CXXDestrcuctorDecl again. This causes assertion in Sema::markKnownEmitted.

This patch fixes that by let Sema::getEmissionStatus report correct emission
state for the dtor passed to each CheckDestructor.


https://reviews.llvm.org/D70172

Files:
  clang/include/clang/Sema/Sema.h
  clang/lib/Sema/SemaDecl.cpp
  clang/lib/Sema/SemaDeclCXX.cpp
  clang/test/SemaCUDA/deleting-dtor.cu

Index: clang/test/SemaCUDA/deleting-dtor.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/deleting-dtor.cu
@@ -0,0 +1,91 @@
+// RUN: %clang_cc1 -triple x86_64-pc-windows-msvc -fsyntax-only -verify=MS -verify=GEN %s
+// RUN: %clang_cc1 -triple x86_64-pc-linux -fsyntax-only -verify=GEN %s
+
+#include "Inputs/cuda.h"
+typedef long long __m256i __attribute__((__vector_size__(32)));
+
+namespace NoDiag {
+  struct CFileStream {
+CFileStream();
+virtual ~CFileStream();
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+
+  CFileStream::CFileStream() {}
+  CFileStream::~CFileStream() {}
+  CMultiFileStream::~CMultiFileStream() {}
+}
+
+// No diagnostic since deleting dtor is not emitted.
+namespace NoVtbl {
+// Intentionally generates delayed diagnostic about r0.
+// This diagnostic is not supposed to be emitted unless f is emitted.
+  static __device__ __host__ void f(__m256i *p) {
+__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
+ : "r0");
+  }
+  struct CFileStream {
+void operator delete(void *p) {
+  f(0);
+}
+CFileStream();
+virtual ~CFileStream();
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+}
+
+// Only MS has diagnostic since MS requires deleting dtor is emitted when
+// vtable is emitted, even though dtor is not defined.
+namespace MSEmitDeletingDtor {
+  static __device__ __host__ void f(__m256i *p) {
+__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
+   : "r0"); // MS-error{{unknown register name 'r0' in asm}}
+  }
+  struct CFileStream {
+void operator delete(void *p) {
+  f(0);  // MS-note{{called by 'operator delete'}}
+}
+CFileStream();
+virtual ~CFileStream();  // MS-note{{called by '~CFileStream'}}
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+
+  // This causes vtable emitted so that deleting dtor is emitted for MS.
+  CFileStream::CFileStream() {}
+}
+
+namespace EmitDtor {
+  static __device__ __host__ void f(__m256i *p) {
+__asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
+ : "r0"); // GEN-error{{unknown register name 'r0' in asm}}
+  }
+  struct CFileStream {
+void operator delete(void *p) {
+  f(0);
+}
+CFileStream();
+virtual ~CFileStream();
+  };
+
+  struct CMultiFileStream {
+CFileStream m_fileStream;
+~CMultiFileStream();
+  };
+
+  CFileStream::~CFileStream() {
+f(0); // GEN-note{{called by '~CFileStream'}}
+  }
+}
+
Index: clang/lib/Sema/SemaDeclCXX.cpp
===
--- clang/lib/Sema/SemaDeclCXX.cpp
+++ clang/lib/Sema/SemaDeclCXX.cpp
@@ -15629,8 +15629,13 @@
   // If this is an out-of-line declaration, marking it referenced will
   // not do anything. Manually call CheckDestructor to look up operator
   // delete().
-  ContextRAII SavedContext(*this, DD);
-  CheckDestructor(DD);
+  auto Loc = MSDeletingDtorsChecked.find(DD);
+  if (Loc == MSDeletingDtorsChecked.end()) {
+MSDeletingDtorsChecked[DD] = false;
+ContextRAII SavedContext(*this, DD);
+CheckDestructor(DD);
+MSDeletingDtorsChecked[DD] = true;
+  }
 } else {
   MarkFunctionReferenced(Loc, Class->getDestructor());
 }
Index: clang/lib/Sema/SemaDecl.cpp
===
--- clang/lib/Sema/SemaDecl.cpp
+++ clang/lib/Sema/SemaDecl.cpp
@@ -17684,6 +17684,20 @@
   if (FD->isDependentContext())
 return FunctionEmissionStatus::TemplateDiscarded;
 
+  // The Microsoft ABI requires that we perform the destructor body
+  // checks (i.e. operator delete() lookup) when the vtable is marked used, as
+  // the deleting destructor is emitted with the vtable. Such check may happen
+  // even though the destructor is not defined yet. We use
+