[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D77954#2021299 , @yaxunl wrote:

> I will put a workaround: In device compilation, in implicit `__device__ 
> __host__` callers, I will keep the old behavior, that is, implicit 
> `__device__ __host__` candidate has equal preference with wrong-sided 
> candidate. By doing this, we will in most cases resolve the overloading the 
> same way as if the callers and callees are host functions, therefore resolved 
> the same way as in their expected environment. This will make sure: 1. we 
> will not end up with no viable candidate 2. we will not have ambiguity, since 
> we know it is resolvable in host compilation.


LMK when you have something. I can give it a spin internally.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

FYI, I've just reverted it in bf6a26b066382e0f41bf023c781d84061c542307 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-05-05 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D77954#2021026 , @tra wrote:

> It appears that re-landed b46b1a916d44216f0c70de55ae2123eb9de69027 
>  has 
> created another compilation regression. I don't have a simple reproducer yet, 
> so here's the error message for now:
>
>   llvm_unstable/toolchain/bin/../include/c++/v1/tuple:232:15: error: call to 
> implicitly-deleted copy constructor of 
> 'std::__u::unique_ptr std::__u::default_delete>'
>   : __value_(_VSTD::forward<_Tp>(__t))
> ^
>   llvm_unstable/toolchain/bin/../include/c++/v1/tuple:388:13: note: in 
> instantiation of function template specialization 'std::__u::__tuple_leaf<0, 
> std::__u::unique_ptr std::__u::default_delete>, 
> false>::__tuple_leaf std::__u::default_delete>, void>' requested here
>   __tuple_leaf<_Uf, _Tf>(_VSTD::forward<_Up>(__u))...,
>   ^
>   llvm_unstable/toolchain/bin/../include/c++/v1/tuple:793:15: note: in 
> instantiation of function template specialization 
> 'std::__u::__tuple_impl, 
> std::__u::unique_ptr std::__u::default_delete>, std::__u::function ()>>::__tuple_impl<0, 1, std::__u::unique_ptr std::__u::default_delete>, std::__u::function ()>, std::__u::unique_ptr std::__u::default_delete>, std::__u::function ()>>' requested here
>   : __base_(typename __make_tuple_indices::type(),
> ^
>   llvm_unstable/toolchain/bin/../include/c++/v1/thread:297:17: note: in 
> instantiation of function template specialization 
> 'std::__u::tuple std::__u::default_delete>, std::__u::function ()>>::tuple std::__u::default_delete>, std::__u::function ()>, false, false>' requested here
>   new _Gp(std::move(__tsp),
>   ^
>   
> ./third_party/eigen3/unsupported/Eigen/CXX11/src/ThreadPool/ThreadEnvironment.h:24:42:
>  note: in instantiation of function template specialization 
> 'std::__u::thread::thread, void>' requested here
>   EnvThread(std::function f) : thr_(std::move(f)) {}
>^
>   llvm_unstable/toolchain/bin/../include/c++/v1/memory:2583:3: note: copy 
> constructor is implicitly deleted because 
> 'unique_ptr std::__u::default_delete>' has a user-declared 
> move constructor
> unique_ptr(unique_ptr&& __u) _NOEXCEPT
> ^
>   1 error generated when compiling for sm_60.
>


For implicit `__host__ __device__` functions, they may be promoted by pragma 
but themselves may not be qualified as `__host__ __device__` functions.

Since they are promoted from host functions, they are good citizens in host 
compilation, but may incur diagnostics in device compilation, because their 
callees may be missing in device side. Since we cannot defer all the 
diagnostics, once such things happen, we are doomed.

So now we can understand why the previous behavior: that is, in a `__host__ 
__device__` function, same-side candidate is always preferred over wrong-sided 
candidate. However, `__device__ __host__` candidate is not preferred over 
wrong-sided candidate. On the other hand, their other properties take 
precedence. Only if all others are equal, `__device__ __host__` candidate is 
preferred over wrong-sided candidate.

I will put a workaround: In device compilation, in implicit `__device__ 
__host__` callers, I will keep the old behavior, that is, implicit `__device__ 
__host__` candidate has equal preference with wrong-sided candidate. By doing 
this, we will in most cases resolve the overloading the same way as if the 
callers and callees are host functions, therefore resolved the same way as in 
their expected environment. This will make sure: 1. we will not end up with no 
viable candidate 2. we will not have ambiguity, since we know it is resolvable 
in host compilation.

For explicit `__device__ __host__` functions, we do not need the workaround, 
since they are intended for host and device and are supposed to work for both 
host and device.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

It appears that re-landed b46b1a916d44216f0c70de55ae2123eb9de69027 
 has 
created another compilation regression. I don't have a simple reproducer yet, 
so here's the error message for now:

  llvm_unstable/toolchain/bin/../include/c++/v1/tuple:232:15: error: call to 
implicitly-deleted copy constructor of 
'std::__u::unique_ptr>'
  : __value_(_VSTD::forward<_Tp>(__t))
^
  llvm_unstable/toolchain/bin/../include/c++/v1/tuple:388:13: note: in 
instantiation of function template specialization 'std::__u::__tuple_leaf<0, 
std::__u::unique_ptr>, 
false>::__tuple_leaf>, void>' requested here
  __tuple_leaf<_Uf, _Tf>(_VSTD::forward<_Up>(__u))...,
  ^
  llvm_unstable/toolchain/bin/../include/c++/v1/tuple:793:15: note: in 
instantiation of function template specialization 
'std::__u::__tuple_impl, 
std::__u::unique_ptr>, std::__u::function>::__tuple_impl<0, 1, std::__u::unique_ptr>, std::__u::function, std::__u::unique_ptr>, std::__u::function>' requested here
  : __base_(typename __make_tuple_indices::type(),
^
  llvm_unstable/toolchain/bin/../include/c++/v1/thread:297:17: note: in 
instantiation of function template specialization 
'std::__u::tuple>, std::__u::function>::tuple>, std::__u::function, false, false>' requested here
  new _Gp(std::move(__tsp),
  ^
  
./third_party/eigen3/unsupported/Eigen/CXX11/src/ThreadPool/ThreadEnvironment.h:24:42:
 note: in instantiation of function template specialization 
'std::__u::thread::thread, void>' requested here
  EnvThread(std::function f) : thr_(std::move(f)) {}
   ^
  llvm_unstable/toolchain/bin/../include/c++/v1/memory:2583:3: note: copy 
constructor is implicitly deleted because 
'unique_ptr>' has a user-declared move 
constructor
unique_ptr(unique_ptr&& __u) _NOEXCEPT
^
  1 error generated when compiling for sm_60.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-27 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D77954#2005313 , @gribozavr2 wrote:

> Sorry -- this change broke overload resolution for `operator new`, as it is 
> declared in system headers. I'm reverting the patch.
>
>   $ cat /tmp/in.cu.cc
>   #define __device__ __attribute__((device))
>   void* operator new(__SIZE_TYPE__ size);
>   __device__ void *operator new(__SIZE_TYPE__ size);
>   void *x = new int;
>   $ clang -fsyntax-only --cuda-device-only --target=x86_64-grtev4-linux-gnu 
> -x cuda -nocudalib -nocudainc -std=gnu++17 /tmp/in.cu.cc
>   /tmp/in.cu.cc:4:11: error: call to 'operator new' is ambiguous
>   void *x = new int;
> ^
>   /tmp/in.cu.cc:2:7: note: candidate function
>   void* operator new(__SIZE_TYPE__ size);
> ^
>   /tmp/in.cu.cc:3:18: note: candidate function
>   __device__ void *operator new(__SIZE_TYPE__ size);
>^
>   1 error generated when compiling for sm_20.
>


Thanks. Fixed in https://reviews.llvm.org/D78970


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-27 Thread Dmitri Gribenko via Phabricator via cfe-commits
gribozavr2 added a comment.

Sorry -- this change broke overload resolution for `operator new`, as it is 
declared in system headers. I'm reverting the patch.

  $ cat /tmp/in.cu.cc
  #define __device__ __attribute__((device))
  void* operator new(__SIZE_TYPE__ size);
  __device__ void *operator new(__SIZE_TYPE__ size);
  void *x = new int;
  $ clang -fsyntax-only --cuda-device-only --target=x86_64-grtev4-linux-gnu -x 
cuda -nocudalib -nocudainc -std=gnu++17 /tmp/in.cu.cc
  /tmp/in.cu.cc:4:11: error: call to 'operator new' is ambiguous
  void *x = new int;
^
  /tmp/in.cu.cc:2:7: note: candidate function
  void* operator new(__SIZE_TYPE__ size);
^
  /tmp/in.cu.cc:3:18: note: candidate function
  __device__ void *operator new(__SIZE_TYPE__ size);
   ^
  1 error generated when compiling for sm_20.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-24 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D77954#2002580 , @tra wrote:

> Go ahead. I'll revert it if it breaks anything on our side.


Thanks. Done by b46b1a916d44216f0c70de55ae2123eb9de69027 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

Go ahead. I'll revert it if it breaks anything on our side.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-24 Thread Yaxun Liu via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGc77a4078e010: [CUDA][HIP] Fix host/device based overload 
resolution (authored by yaxunl).
Herald added a project: clang.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954

Files:
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -331,9 +331,6 @@
 // If we have a mix of HD and H-only or D-only candidates in the overload set,
 // normal C++ overload resolution rules apply first.
 template  TemplateReturnTy template_vs_hd_function(T arg)
-#ifdef __CUDA_ARCH__
-//expected-note@-2 {{declared here}}
-#endif
 {
   return TemplateReturnTy();
 }
@@ -342,11 +339,13 @@
 }
 
 __host__ __device__ void test_host_device_calls_hd_template() {
-  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
-  TemplateReturnTy ret2 = template_vs_hd_function(1);
 #ifdef __CUDA_ARCH__
-  // expected-error@-2 {{reference to __host__ function 'template_vs_hd_function' in __host__ __device__ function}}
+  typedef HostDeviceReturnTy ExpectedReturnTy;
+#else
+  typedef TemplateReturnTy ExpectedReturnTy;
 #endif
+  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
+  ExpectedReturnTy ret2 = template_vs_hd_function(1);
 }
 
 __host__ void test_host_calls_hd_template() {
@@ -367,14 +366,14 @@
 __device__ DeviceReturnTy device_only_function(int arg) { return DeviceReturnTy(); }
 __device__ DeviceReturnTy2 device_only_function(float arg) { return DeviceReturnTy2(); }
 #ifndef __CUDA_ARCH__
-  // expected-note@-3 {{'device_only_function' declared here}}
-  // expected-note@-3 {{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
 #endif
 __host__ HostReturnTy host_only_function(int arg) { return HostReturnTy(); }
 __host__ HostReturnTy2 host_only_function(float arg) { return HostReturnTy2(); }
 #ifdef __CUDA_ARCH__
-  // expected-note@-3 {{'host_only_function' declared here}}
-  // expected-note@-3 {{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
 #endif
 
 __host__ __device__ void test_host_device_single_side_overloading() {
@@ -392,6 +391,37 @@
 #endif
 }
 
+// wrong-sided overloading should not cause diagnostic unless it is emitted.
+// This inline function is not emitted.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_no_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+  HostReturnTy ret3 = host_only_function(1);
+  HostReturnTy2 ret4 = host_only_function(1.0f);
+}
+
+// wrong-sided overloading should cause diagnostic if it is emitted.
+// This inline function is emitted since it is called by an emitted function.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+#ifndef __CUDA_ARCH__
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
+#endif
+  HostReturnTy ret3 = host_only_function(1);
+  HostReturnTy2 ret4 = host_only_function(1.0f);
+#ifdef __CUDA_ARCH__
+  // expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
+  // expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
+#endif
+}
+
+__host__ __device__ void test_host_device_wrong_side_overloading_inline_diag_caller() {
+  test_host_device_wrong_side_overloading_inline_diag();
+  // expected-note@-1 {{called by 'test_host_device_wrong_side_overloading_inline_diag_caller'}}
+}
+
 // Verify that we allow overloading function templates.
 template  __host__ T template_overload(const T ) { return a; };
 template  __device__ T template_overload(const T ) { return a; };
Index: clang/lib/Sema/SemaOverload.cpp
===
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -9374,16 +9374,22 @@
   return Comparison::Equal;
 }
 
-static bool isBetterMultiversionCandidate(const OverloadCandidate ,
-  const OverloadCandidate ) {
+static Comparison
+isBetterMultiversionCandidate(const OverloadCandidate ,
+  const OverloadCandidate ) {
   if (!Cand1.Function || 

[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-24 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

@tra Is it OK I commit it now? Or better wait next Monday morning? Thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-24 Thread John McCall via Phabricator via cfe-commits
rjmccall accepted this revision.
rjmccall added a comment.

Thanks, Yaxun.  LGTM.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments.



Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+

yaxunl wrote:
> rjmccall wrote:
> > tra wrote:
> > > rjmccall wrote:
> > > > erichkeane wrote:
> > > > > yaxunl wrote:
> > > > > > echristo wrote:
> > > > > > > rjmccall wrote:
> > > > > > > > yaxunl wrote:
> > > > > > > > > rjmccall wrote:
> > > > > > > > > > If we move anything below this check, it needs to figure 
> > > > > > > > > > out a tri-state so that it can return false if `Cand2` is a 
> > > > > > > > > > better candidate than `Cand1`.  Now, that only matters if 
> > > > > > > > > > multiversion functions are supported under CUDA, but if 
> > > > > > > > > > you're relying on them not being supported, that should at 
> > > > > > > > > > least be commented on.
> > > > > > > > > multiversion host functions is orthogonal to CUDA therefore 
> > > > > > > > > should be supported. multiversion in device, host device, and 
> > > > > > > > > global functions are not supported. However this change does 
> > > > > > > > > not make things worse, and should continue to work if they 
> > > > > > > > > are supported.
> > > > > > > > > 
> > > > > > > > > host/device based overloading resolution is mostly for 
> > > > > > > > > determining viability of a function. If two functions are 
> > > > > > > > > both viable, other factors should take precedence in 
> > > > > > > > > preference. This general rule has been taken for cases other 
> > > > > > > > > than multiversion, I think it should also apply to 
> > > > > > > > > multiversion.
> > > > > > > > > 
> > > > > > > > > I will make isBetterMultiversionCandidate three states.
> > > > > > > > > This general rule has been taken for cases other than 
> > > > > > > > > multiversion, I think it should also apply to multiversion.
> > > > > > > > 
> > > > > > > > Well, but the multiversion people could say the same: that 
> > > > > > > > multiversioning is for picking an alternative among 
> > > > > > > > otherwise-identical functions, and HD and H functions are not 
> > > > > > > > otherwise-identical.
> > > > > > > > 
> > > > > > > > CC'ing @echristo for his thoughts on the right ordering here.
> > > > > > > Adding @erichkeane here as well.
> > > > > > > 
> > > > > > > I think this makes sense, but I can see a reason to multiversion 
> > > > > > > a function that will run on host and device. A version of some 
> > > > > > > matrix mult that takes advantage of 3 host architectures and one 
> > > > > > > cuda one? Am I missing something here?
> > > > > > My understanding is that a multiversion function is for a specific 
> > > > > > cpu(gpu). Let's say we want to have a function f for gfx900, 
> > > > > > gfx906, sandybridge, ivybridge, shouldn't they be more like
> > > > > > 
> > > > > > ```
> > > > > > __host__ __attribute__((cpu_specific(sandybridge))) f();
> > > > > > __host__ __attribute__((cpu_specific(ivybridge))) f();
> > > > > > __device__ __attribute__((cpu_specific(gfx900))) f();
> > > > > > __device__ __attribute__((cpu_specific(gfx906))) f();
> > > > > > ```
> > > > > > instead of all `__device__ __host__` functions?
> > > > > IMO, it doesn't make sense for functions to functions be BOTH host 
> > > > > and device, they'd have to be just one.  Otherwise I'm not sure how 
> > > > > the resolver behavior is supposed to work.  The whole idea is that 
> > > > > the definition is chosen at runtime.
> > > > > 
> > > > > Unless __host__ __device void foo(); is TWO declaration chains 
> > > > > (meaning two separate AST entries), it doesn't make sense to have 
> > > > > multiverison on it (and then, how it would be spelled is 
> > > > > awkward/confusing to me).
> > > > > 
> > > > > In the above case, if those 4 declarations are not 2 separate root- 
> > > > > AST nodes, multiversioning won't work.
> > > > There are certainly functions that ought to be usable from either host 
> > > > or device context — any inline function that just does ordinary 
> > > > language things should be in that category.  Also IIUC many 
> > > > declarations are *inferred* to be `__host__ __device__`, or can be 
> > > > mass-annotated with pragmas, and those reasons are probably the main 
> > > > ones this might matter — we might include a header in CUDA mode that 
> > > > declares a multi-versioned function, and we should handle it right.
> > > > 
> > > > My read of how CUDA programmers expect this to work is that they see 
> > > > the `__host__` / `__device__` attributes as primarily a mechanism for 
> > > > catching problems where you're using the wrong functions for the 
> > > > current configuration.  That is, while we allow overloading by 
> > > > `__host__`/`__device__`-ness, users expect those attributes to mostly 
> > > > be used as a filter for what's "really there" rather than really 
> > > > strictly segregating the namespace.  So I would say that CUDA 
> > > > programmers would probably 

[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-24 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 259870.
yaxunl marked an inline comment as done.
yaxunl added a comment.

change the precedence of multiversion to be over host/device-ness.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954

Files:
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -331,9 +331,6 @@
 // If we have a mix of HD and H-only or D-only candidates in the overload set,
 // normal C++ overload resolution rules apply first.
 template  TemplateReturnTy template_vs_hd_function(T arg)
-#ifdef __CUDA_ARCH__
-//expected-note@-2 {{declared here}}
-#endif
 {
   return TemplateReturnTy();
 }
@@ -342,11 +339,13 @@
 }
 
 __host__ __device__ void test_host_device_calls_hd_template() {
-  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
-  TemplateReturnTy ret2 = template_vs_hd_function(1);
 #ifdef __CUDA_ARCH__
-  // expected-error@-2 {{reference to __host__ function 'template_vs_hd_function' in __host__ __device__ function}}
+  typedef HostDeviceReturnTy ExpectedReturnTy;
+#else
+  typedef TemplateReturnTy ExpectedReturnTy;
 #endif
+  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
+  ExpectedReturnTy ret2 = template_vs_hd_function(1);
 }
 
 __host__ void test_host_calls_hd_template() {
@@ -367,14 +366,14 @@
 __device__ DeviceReturnTy device_only_function(int arg) { return DeviceReturnTy(); }
 __device__ DeviceReturnTy2 device_only_function(float arg) { return DeviceReturnTy2(); }
 #ifndef __CUDA_ARCH__
-  // expected-note@-3 {{'device_only_function' declared here}}
-  // expected-note@-3 {{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
 #endif
 __host__ HostReturnTy host_only_function(int arg) { return HostReturnTy(); }
 __host__ HostReturnTy2 host_only_function(float arg) { return HostReturnTy2(); }
 #ifdef __CUDA_ARCH__
-  // expected-note@-3 {{'host_only_function' declared here}}
-  // expected-note@-3 {{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
 #endif
 
 __host__ __device__ void test_host_device_single_side_overloading() {
@@ -392,6 +391,37 @@
 #endif
 }
 
+// wrong-sided overloading should not cause diagnostic unless it is emitted.
+// This inline function is not emitted.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_no_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+  HostReturnTy ret3 = host_only_function(1);
+  HostReturnTy2 ret4 = host_only_function(1.0f);
+}
+
+// wrong-sided overloading should cause diagnostic if it is emitted.
+// This inline function is emitted since it is called by an emitted function.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+#ifndef __CUDA_ARCH__
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
+#endif
+  HostReturnTy ret3 = host_only_function(1);
+  HostReturnTy2 ret4 = host_only_function(1.0f);
+#ifdef __CUDA_ARCH__
+  // expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
+  // expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
+#endif
+}
+
+__host__ __device__ void test_host_device_wrong_side_overloading_inline_diag_caller() {
+  test_host_device_wrong_side_overloading_inline_diag();
+  // expected-note@-1 {{called by 'test_host_device_wrong_side_overloading_inline_diag_caller'}}
+}
+
 // Verify that we allow overloading function templates.
 template  __host__ T template_overload(const T ) { return a; };
 template  __device__ T template_overload(const T ) { return a; };
Index: clang/lib/Sema/SemaOverload.cpp
===
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -9374,16 +9374,22 @@
   return Comparison::Equal;
 }
 
-static bool isBetterMultiversionCandidate(const OverloadCandidate ,
-  const OverloadCandidate ) {
+static Comparison
+isBetterMultiversionCandidate(const OverloadCandidate ,
+  const OverloadCandidate ) {
   if (!Cand1.Function || !Cand1.Function->isMultiVersion() || !Cand2.Function ||
   !Cand2.Function->isMultiVersion())
-

[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-24 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked 6 inline comments as done.
yaxunl added inline comments.



Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+

rjmccall wrote:
> tra wrote:
> > rjmccall wrote:
> > > erichkeane wrote:
> > > > yaxunl wrote:
> > > > > echristo wrote:
> > > > > > rjmccall wrote:
> > > > > > > yaxunl wrote:
> > > > > > > > rjmccall wrote:
> > > > > > > > > If we move anything below this check, it needs to figure out 
> > > > > > > > > a tri-state so that it can return false if `Cand2` is a 
> > > > > > > > > better candidate than `Cand1`.  Now, that only matters if 
> > > > > > > > > multiversion functions are supported under CUDA, but if 
> > > > > > > > > you're relying on them not being supported, that should at 
> > > > > > > > > least be commented on.
> > > > > > > > multiversion host functions is orthogonal to CUDA therefore 
> > > > > > > > should be supported. multiversion in device, host device, and 
> > > > > > > > global functions are not supported. However this change does 
> > > > > > > > not make things worse, and should continue to work if they are 
> > > > > > > > supported.
> > > > > > > > 
> > > > > > > > host/device based overloading resolution is mostly for 
> > > > > > > > determining viability of a function. If two functions are both 
> > > > > > > > viable, other factors should take precedence in preference. 
> > > > > > > > This general rule has been taken for cases other than 
> > > > > > > > multiversion, I think it should also apply to multiversion.
> > > > > > > > 
> > > > > > > > I will make isBetterMultiversionCandidate three states.
> > > > > > > > This general rule has been taken for cases other than 
> > > > > > > > multiversion, I think it should also apply to multiversion.
> > > > > > > 
> > > > > > > Well, but the multiversion people could say the same: that 
> > > > > > > multiversioning is for picking an alternative among 
> > > > > > > otherwise-identical functions, and HD and H functions are not 
> > > > > > > otherwise-identical.
> > > > > > > 
> > > > > > > CC'ing @echristo for his thoughts on the right ordering here.
> > > > > > Adding @erichkeane here as well.
> > > > > > 
> > > > > > I think this makes sense, but I can see a reason to multiversion a 
> > > > > > function that will run on host and device. A version of some matrix 
> > > > > > mult that takes advantage of 3 host architectures and one cuda one? 
> > > > > > Am I missing something here?
> > > > > My understanding is that a multiversion function is for a specific 
> > > > > cpu(gpu). Let's say we want to have a function f for gfx900, gfx906, 
> > > > > sandybridge, ivybridge, shouldn't they be more like
> > > > > 
> > > > > ```
> > > > > __host__ __attribute__((cpu_specific(sandybridge))) f();
> > > > > __host__ __attribute__((cpu_specific(ivybridge))) f();
> > > > > __device__ __attribute__((cpu_specific(gfx900))) f();
> > > > > __device__ __attribute__((cpu_specific(gfx906))) f();
> > > > > ```
> > > > > instead of all `__device__ __host__` functions?
> > > > IMO, it doesn't make sense for functions to functions be BOTH host and 
> > > > device, they'd have to be just one.  Otherwise I'm not sure how the 
> > > > resolver behavior is supposed to work.  The whole idea is that the 
> > > > definition is chosen at runtime.
> > > > 
> > > > Unless __host__ __device void foo(); is TWO declaration chains (meaning 
> > > > two separate AST entries), it doesn't make sense to have multiverison 
> > > > on it (and then, how it would be spelled is awkward/confusing to me).
> > > > 
> > > > In the above case, if those 4 declarations are not 2 separate root- AST 
> > > > nodes, multiversioning won't work.
> > > There are certainly functions that ought to be usable from either host or 
> > > device context — any inline function that just does ordinary language 
> > > things should be in that category.  Also IIUC many declarations are 
> > > *inferred* to be `__host__ __device__`, or can be mass-annotated with 
> > > pragmas, and those reasons are probably the main ones this might matter — 
> > > we might include a header in CUDA mode that declares a multi-versioned 
> > > function, and we should handle it right.
> > > 
> > > My read of how CUDA programmers expect this to work is that they see the 
> > > `__host__` / `__device__` attributes as primarily a mechanism for 
> > > catching problems where you're using the wrong functions for the current 
> > > configuration.  That is, while we allow overloading by 
> > > `__host__`/`__device__`-ness, users expect those attributes to mostly be 
> > > used as a filter for what's "really there" rather than really strictly 
> > > segregating the namespace.  So I would say that CUDA programmers would 
> > > probably expect the interaction with multiversioning to be:
> > > 
> > > - Programmers can put `__host__`, `__device__`, or both on a variant 
> > > depending on where it was 

[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-23 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 259796.
yaxunl marked an inline comment as done.
yaxunl added a comment.

Revised by John's comments.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954

Files:
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -331,9 +331,6 @@
 // If we have a mix of HD and H-only or D-only candidates in the overload set,
 // normal C++ overload resolution rules apply first.
 template  TemplateReturnTy template_vs_hd_function(T arg)
-#ifdef __CUDA_ARCH__
-//expected-note@-2 {{declared here}}
-#endif
 {
   return TemplateReturnTy();
 }
@@ -342,11 +339,13 @@
 }
 
 __host__ __device__ void test_host_device_calls_hd_template() {
-  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
-  TemplateReturnTy ret2 = template_vs_hd_function(1);
 #ifdef __CUDA_ARCH__
-  // expected-error@-2 {{reference to __host__ function 'template_vs_hd_function' in __host__ __device__ function}}
+  typedef HostDeviceReturnTy ExpectedReturnTy;
+#else
+  typedef TemplateReturnTy ExpectedReturnTy;
 #endif
+  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
+  ExpectedReturnTy ret2 = template_vs_hd_function(1);
 }
 
 __host__ void test_host_calls_hd_template() {
@@ -367,14 +366,14 @@
 __device__ DeviceReturnTy device_only_function(int arg) { return DeviceReturnTy(); }
 __device__ DeviceReturnTy2 device_only_function(float arg) { return DeviceReturnTy2(); }
 #ifndef __CUDA_ARCH__
-  // expected-note@-3 {{'device_only_function' declared here}}
-  // expected-note@-3 {{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
 #endif
 __host__ HostReturnTy host_only_function(int arg) { return HostReturnTy(); }
 __host__ HostReturnTy2 host_only_function(float arg) { return HostReturnTy2(); }
 #ifdef __CUDA_ARCH__
-  // expected-note@-3 {{'host_only_function' declared here}}
-  // expected-note@-3 {{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
 #endif
 
 __host__ __device__ void test_host_device_single_side_overloading() {
@@ -392,6 +391,37 @@
 #endif
 }
 
+// wrong-sided overloading should not cause diagnostic unless it is emitted.
+// This inline function is not emitted.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_no_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+  HostReturnTy ret3 = host_only_function(1);
+  HostReturnTy2 ret4 = host_only_function(1.0f);
+}
+
+// wrong-sided overloading should cause diagnostic if it is emitted.
+// This inline function is emitted since it is called by an emitted function.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+#ifndef __CUDA_ARCH__
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
+#endif
+  HostReturnTy ret3 = host_only_function(1);
+  HostReturnTy2 ret4 = host_only_function(1.0f);
+#ifdef __CUDA_ARCH__
+  // expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
+  // expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
+#endif
+}
+
+__host__ __device__ void test_host_device_wrong_side_overloading_inline_diag_caller() {
+  test_host_device_wrong_side_overloading_inline_diag();
+  // expected-note@-1 {{called by 'test_host_device_wrong_side_overloading_inline_diag_caller'}}
+}
+
 // Verify that we allow overloading function templates.
 template  __host__ T template_overload(const T ) { return a; };
 template  __device__ T template_overload(const T ) { return a; };
Index: clang/lib/Sema/SemaOverload.cpp
===
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -9374,16 +9374,22 @@
   return Comparison::Equal;
 }
 
-static bool isBetterMultiversionCandidate(const OverloadCandidate ,
-  const OverloadCandidate ) {
+static Comparison
+isBetterMultiversionCandidate(const OverloadCandidate ,
+  const OverloadCandidate ) {
   if (!Cand1.Function || !Cand1.Function->isMultiVersion() || !Cand2.Function ||
   !Cand2.Function->isMultiVersion())
-return false;
+return 

[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-23 Thread John McCall via Phabricator via cfe-commits
rjmccall added inline comments.



Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+

tra wrote:
> rjmccall wrote:
> > erichkeane wrote:
> > > yaxunl wrote:
> > > > echristo wrote:
> > > > > rjmccall wrote:
> > > > > > yaxunl wrote:
> > > > > > > rjmccall wrote:
> > > > > > > > If we move anything below this check, it needs to figure out a 
> > > > > > > > tri-state so that it can return false if `Cand2` is a better 
> > > > > > > > candidate than `Cand1`.  Now, that only matters if multiversion 
> > > > > > > > functions are supported under CUDA, but if you're relying on 
> > > > > > > > them not being supported, that should at least be commented on.
> > > > > > > multiversion host functions is orthogonal to CUDA therefore 
> > > > > > > should be supported. multiversion in device, host device, and 
> > > > > > > global functions are not supported. However this change does not 
> > > > > > > make things worse, and should continue to work if they are 
> > > > > > > supported.
> > > > > > > 
> > > > > > > host/device based overloading resolution is mostly for 
> > > > > > > determining viability of a function. If two functions are both 
> > > > > > > viable, other factors should take precedence in preference. This 
> > > > > > > general rule has been taken for cases other than multiversion, I 
> > > > > > > think it should also apply to multiversion.
> > > > > > > 
> > > > > > > I will make isBetterMultiversionCandidate three states.
> > > > > > > This general rule has been taken for cases other than 
> > > > > > > multiversion, I think it should also apply to multiversion.
> > > > > > 
> > > > > > Well, but the multiversion people could say the same: that 
> > > > > > multiversioning is for picking an alternative among 
> > > > > > otherwise-identical functions, and HD and H functions are not 
> > > > > > otherwise-identical.
> > > > > > 
> > > > > > CC'ing @echristo for his thoughts on the right ordering here.
> > > > > Adding @erichkeane here as well.
> > > > > 
> > > > > I think this makes sense, but I can see a reason to multiversion a 
> > > > > function that will run on host and device. A version of some matrix 
> > > > > mult that takes advantage of 3 host architectures and one cuda one? 
> > > > > Am I missing something here?
> > > > My understanding is that a multiversion function is for a specific 
> > > > cpu(gpu). Let's say we want to have a function f for gfx900, gfx906, 
> > > > sandybridge, ivybridge, shouldn't they be more like
> > > > 
> > > > ```
> > > > __host__ __attribute__((cpu_specific(sandybridge))) f();
> > > > __host__ __attribute__((cpu_specific(ivybridge))) f();
> > > > __device__ __attribute__((cpu_specific(gfx900))) f();
> > > > __device__ __attribute__((cpu_specific(gfx906))) f();
> > > > ```
> > > > instead of all `__device__ __host__` functions?
> > > IMO, it doesn't make sense for functions to functions be BOTH host and 
> > > device, they'd have to be just one.  Otherwise I'm not sure how the 
> > > resolver behavior is supposed to work.  The whole idea is that the 
> > > definition is chosen at runtime.
> > > 
> > > Unless __host__ __device void foo(); is TWO declaration chains (meaning 
> > > two separate AST entries), it doesn't make sense to have multiverison on 
> > > it (and then, how it would be spelled is awkward/confusing to me).
> > > 
> > > In the above case, if those 4 declarations are not 2 separate root- AST 
> > > nodes, multiversioning won't work.
> > There are certainly functions that ought to be usable from either host or 
> > device context — any inline function that just does ordinary language 
> > things should be in that category.  Also IIUC many declarations are 
> > *inferred* to be `__host__ __device__`, or can be mass-annotated with 
> > pragmas, and those reasons are probably the main ones this might matter — 
> > we might include a header in CUDA mode that declares a multi-versioned 
> > function, and we should handle it right.
> > 
> > My read of how CUDA programmers expect this to work is that they see the 
> > `__host__` / `__device__` attributes as primarily a mechanism for catching 
> > problems where you're using the wrong functions for the current 
> > configuration.  That is, while we allow overloading by 
> > `__host__`/`__device__`-ness, users expect those attributes to mostly be 
> > used as a filter for what's "really there" rather than really strictly 
> > segregating the namespace.  So I would say that CUDA programmers would 
> > probably expect the interaction with multiversioning to be:
> > 
> > - Programmers can put `__host__`, `__device__`, or both on a variant 
> > depending on where it was usable.
> > - Dispatches should simply ignore any variants that aren't usable for the 
> > current configuration.
> > 
> > And specifically they would not expect e.g. a `__host__` dispatch function 
> > to only consider `__host__` 

[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-23 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision.
tra added inline comments.
This revision is now accepted and ready to land.



Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+

rjmccall wrote:
> erichkeane wrote:
> > yaxunl wrote:
> > > echristo wrote:
> > > > rjmccall wrote:
> > > > > yaxunl wrote:
> > > > > > rjmccall wrote:
> > > > > > > If we move anything below this check, it needs to figure out a 
> > > > > > > tri-state so that it can return false if `Cand2` is a better 
> > > > > > > candidate than `Cand1`.  Now, that only matters if multiversion 
> > > > > > > functions are supported under CUDA, but if you're relying on them 
> > > > > > > not being supported, that should at least be commented on.
> > > > > > multiversion host functions is orthogonal to CUDA therefore should 
> > > > > > be supported. multiversion in device, host device, and global 
> > > > > > functions are not supported. However this change does not make 
> > > > > > things worse, and should continue to work if they are supported.
> > > > > > 
> > > > > > host/device based overloading resolution is mostly for determining 
> > > > > > viability of a function. If two functions are both viable, other 
> > > > > > factors should take precedence in preference. This general rule has 
> > > > > > been taken for cases other than multiversion, I think it should 
> > > > > > also apply to multiversion.
> > > > > > 
> > > > > > I will make isBetterMultiversionCandidate three states.
> > > > > > This general rule has been taken for cases other than multiversion, 
> > > > > > I think it should also apply to multiversion.
> > > > > 
> > > > > Well, but the multiversion people could say the same: that 
> > > > > multiversioning is for picking an alternative among 
> > > > > otherwise-identical functions, and HD and H functions are not 
> > > > > otherwise-identical.
> > > > > 
> > > > > CC'ing @echristo for his thoughts on the right ordering here.
> > > > Adding @erichkeane here as well.
> > > > 
> > > > I think this makes sense, but I can see a reason to multiversion a 
> > > > function that will run on host and device. A version of some matrix 
> > > > mult that takes advantage of 3 host architectures and one cuda one? Am 
> > > > I missing something here?
> > > My understanding is that a multiversion function is for a specific 
> > > cpu(gpu). Let's say we want to have a function f for gfx900, gfx906, 
> > > sandybridge, ivybridge, shouldn't they be more like
> > > 
> > > ```
> > > __host__ __attribute__((cpu_specific(sandybridge))) f();
> > > __host__ __attribute__((cpu_specific(ivybridge))) f();
> > > __device__ __attribute__((cpu_specific(gfx900))) f();
> > > __device__ __attribute__((cpu_specific(gfx906))) f();
> > > ```
> > > instead of all `__device__ __host__` functions?
> > IMO, it doesn't make sense for functions to functions be BOTH host and 
> > device, they'd have to be just one.  Otherwise I'm not sure how the 
> > resolver behavior is supposed to work.  The whole idea is that the 
> > definition is chosen at runtime.
> > 
> > Unless __host__ __device void foo(); is TWO declaration chains (meaning two 
> > separate AST entries), it doesn't make sense to have multiverison on it 
> > (and then, how it would be spelled is awkward/confusing to me).
> > 
> > In the above case, if those 4 declarations are not 2 separate root- AST 
> > nodes, multiversioning won't work.
> There are certainly functions that ought to be usable from either host or 
> device context — any inline function that just does ordinary language things 
> should be in that category.  Also IIUC many declarations are *inferred* to be 
> `__host__ __device__`, or can be mass-annotated with pragmas, and those 
> reasons are probably the main ones this might matter — we might include a 
> header in CUDA mode that declares a multi-versioned function, and we should 
> handle it right.
> 
> My read of how CUDA programmers expect this to work is that they see the 
> `__host__` / `__device__` attributes as primarily a mechanism for catching 
> problems where you're using the wrong functions for the current 
> configuration.  That is, while we allow overloading by 
> `__host__`/`__device__`-ness, users expect those attributes to mostly be used 
> as a filter for what's "really there" rather than really strictly segregating 
> the namespace.  So I would say that CUDA programmers would probably expect 
> the interaction with multiversioning to be:
> 
> - Programmers can put `__host__`, `__device__`, or both on a variant 
> depending on where it was usable.
> - Dispatches should simply ignore any variants that aren't usable for the 
> current configuration.
> 
> And specifically they would not expect e.g. a `__host__` dispatch function to 
> only consider `__host__` variants — it should be able to dispatch to anything 
> available, which is to say, it should also include `__host__ __device__` 

[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-23 Thread John McCall via Phabricator via cfe-commits
rjmccall added inline comments.



Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+

erichkeane wrote:
> yaxunl wrote:
> > echristo wrote:
> > > rjmccall wrote:
> > > > yaxunl wrote:
> > > > > rjmccall wrote:
> > > > > > If we move anything below this check, it needs to figure out a 
> > > > > > tri-state so that it can return false if `Cand2` is a better 
> > > > > > candidate than `Cand1`.  Now, that only matters if multiversion 
> > > > > > functions are supported under CUDA, but if you're relying on them 
> > > > > > not being supported, that should at least be commented on.
> > > > > multiversion host functions is orthogonal to CUDA therefore should be 
> > > > > supported. multiversion in device, host device, and global functions 
> > > > > are not supported. However this change does not make things worse, 
> > > > > and should continue to work if they are supported.
> > > > > 
> > > > > host/device based overloading resolution is mostly for determining 
> > > > > viability of a function. If two functions are both viable, other 
> > > > > factors should take precedence in preference. This general rule has 
> > > > > been taken for cases other than multiversion, I think it should also 
> > > > > apply to multiversion.
> > > > > 
> > > > > I will make isBetterMultiversionCandidate three states.
> > > > > This general rule has been taken for cases other than multiversion, I 
> > > > > think it should also apply to multiversion.
> > > > 
> > > > Well, but the multiversion people could say the same: that 
> > > > multiversioning is for picking an alternative among otherwise-identical 
> > > > functions, and HD and H functions are not otherwise-identical.
> > > > 
> > > > CC'ing @echristo for his thoughts on the right ordering here.
> > > Adding @erichkeane here as well.
> > > 
> > > I think this makes sense, but I can see a reason to multiversion a 
> > > function that will run on host and device. A version of some matrix mult 
> > > that takes advantage of 3 host architectures and one cuda one? Am I 
> > > missing something here?
> > My understanding is that a multiversion function is for a specific 
> > cpu(gpu). Let's say we want to have a function f for gfx900, gfx906, 
> > sandybridge, ivybridge, shouldn't they be more like
> > 
> > ```
> > __host__ __attribute__((cpu_specific(sandybridge))) f();
> > __host__ __attribute__((cpu_specific(ivybridge))) f();
> > __device__ __attribute__((cpu_specific(gfx900))) f();
> > __device__ __attribute__((cpu_specific(gfx906))) f();
> > ```
> > instead of all `__device__ __host__` functions?
> IMO, it doesn't make sense for functions to functions be BOTH host and 
> device, they'd have to be just one.  Otherwise I'm not sure how the resolver 
> behavior is supposed to work.  The whole idea is that the definition is 
> chosen at runtime.
> 
> Unless __host__ __device void foo(); is TWO declaration chains (meaning two 
> separate AST entries), it doesn't make sense to have multiverison on it (and 
> then, how it would be spelled is awkward/confusing to me).
> 
> In the above case, if those 4 declarations are not 2 separate root- AST 
> nodes, multiversioning won't work.
There are certainly functions that ought to be usable from either host or 
device context — any inline function that just does ordinary language things 
should be in that category.  Also IIUC many declarations are *inferred* to be 
`__host__ __device__`, or can be mass-annotated with pragmas, and those reasons 
are probably the main ones this might matter — we might include a header in 
CUDA mode that declares a multi-versioned function, and we should handle it 
right.

My read of how CUDA programmers expect this to work is that they see the 
`__host__` / `__device__` attributes as primarily a mechanism for catching 
problems where you're using the wrong functions for the current configuration.  
That is, while we allow overloading by `__host__`/`__device__`-ness, users 
expect those attributes to mostly be used as a filter for what's "really there" 
rather than really strictly segregating the namespace.  So I would say that 
CUDA programmers would probably expect the interaction with multiversioning to 
be:

- Programmers can put `__host__`, `__device__`, or both on a variant depending 
on where it was usable.
- Dispatches should simply ignore any variants that aren't usable for the 
current configuration.

And specifically they would not expect e.g. a `__host__` dispatch function to 
only consider `__host__` variants — it should be able to dispatch to anything 
available, which is to say, it should also include `__host__ __device__` 
variants.  Similarly (and probably more usefully), a `__host__ __device__` 
dispatch function being compiled for the device should also consider pure 
`__device__` functions, and so on.

If we accept that, then I think it gives us a much better idea for how to 

[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-23 Thread Erich Keane via Phabricator via cfe-commits
erichkeane added inline comments.



Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+

yaxunl wrote:
> echristo wrote:
> > rjmccall wrote:
> > > yaxunl wrote:
> > > > rjmccall wrote:
> > > > > If we move anything below this check, it needs to figure out a 
> > > > > tri-state so that it can return false if `Cand2` is a better 
> > > > > candidate than `Cand1`.  Now, that only matters if multiversion 
> > > > > functions are supported under CUDA, but if you're relying on them not 
> > > > > being supported, that should at least be commented on.
> > > > multiversion host functions is orthogonal to CUDA therefore should be 
> > > > supported. multiversion in device, host device, and global functions 
> > > > are not supported. However this change does not make things worse, and 
> > > > should continue to work if they are supported.
> > > > 
> > > > host/device based overloading resolution is mostly for determining 
> > > > viability of a function. If two functions are both viable, other 
> > > > factors should take precedence in preference. This general rule has 
> > > > been taken for cases other than multiversion, I think it should also 
> > > > apply to multiversion.
> > > > 
> > > > I will make isBetterMultiversionCandidate three states.
> > > > This general rule has been taken for cases other than multiversion, I 
> > > > think it should also apply to multiversion.
> > > 
> > > Well, but the multiversion people could say the same: that 
> > > multiversioning is for picking an alternative among otherwise-identical 
> > > functions, and HD and H functions are not otherwise-identical.
> > > 
> > > CC'ing @echristo for his thoughts on the right ordering here.
> > Adding @erichkeane here as well.
> > 
> > I think this makes sense, but I can see a reason to multiversion a function 
> > that will run on host and device. A version of some matrix mult that takes 
> > advantage of 3 host architectures and one cuda one? Am I missing something 
> > here?
> My understanding is that a multiversion function is for a specific cpu(gpu). 
> Let's say we want to have a function f for gfx900, gfx906, sandybridge, 
> ivybridge, shouldn't they be more like
> 
> ```
> __host__ __attribute__((cpu_specific(sandybridge))) f();
> __host__ __attribute__((cpu_specific(ivybridge))) f();
> __device__ __attribute__((cpu_specific(gfx900))) f();
> __device__ __attribute__((cpu_specific(gfx906))) f();
> ```
> instead of all `__device__ __host__` functions?
IMO, it doesn't make sense for functions to functions be BOTH host and device, 
they'd have to be just one.  Otherwise I'm not sure how the resolver behavior 
is supposed to work.  The whole idea is that the definition is chosen at 
runtime.

Unless __host__ __device void foo(); is TWO declaration chains (meaning two 
separate AST entries), it doesn't make sense to have multiverison on it (and 
then, how it would be spelled is awkward/confusing to me).

In the above case, if those 4 declarations are not 2 separate root- AST nodes, 
multiversioning won't work.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-23 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked an inline comment as done.
yaxunl added inline comments.



Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+

echristo wrote:
> rjmccall wrote:
> > yaxunl wrote:
> > > rjmccall wrote:
> > > > If we move anything below this check, it needs to figure out a 
> > > > tri-state so that it can return false if `Cand2` is a better candidate 
> > > > than `Cand1`.  Now, that only matters if multiversion functions are 
> > > > supported under CUDA, but if you're relying on them not being 
> > > > supported, that should at least be commented on.
> > > multiversion host functions is orthogonal to CUDA therefore should be 
> > > supported. multiversion in device, host device, and global functions are 
> > > not supported. However this change does not make things worse, and should 
> > > continue to work if they are supported.
> > > 
> > > host/device based overloading resolution is mostly for determining 
> > > viability of a function. If two functions are both viable, other factors 
> > > should take precedence in preference. This general rule has been taken 
> > > for cases other than multiversion, I think it should also apply to 
> > > multiversion.
> > > 
> > > I will make isBetterMultiversionCandidate three states.
> > > This general rule has been taken for cases other than multiversion, I 
> > > think it should also apply to multiversion.
> > 
> > Well, but the multiversion people could say the same: that multiversioning 
> > is for picking an alternative among otherwise-identical functions, and HD 
> > and H functions are not otherwise-identical.
> > 
> > CC'ing @echristo for his thoughts on the right ordering here.
> Adding @erichkeane here as well.
> 
> I think this makes sense, but I can see a reason to multiversion a function 
> that will run on host and device. A version of some matrix mult that takes 
> advantage of 3 host architectures and one cuda one? Am I missing something 
> here?
My understanding is that a multiversion function is for a specific cpu(gpu). 
Let's say we want to have a function f for gfx900, gfx906, sandybridge, 
ivybridge, shouldn't they be more like

```
__host__ __attribute__((cpu_specific(sandybridge))) f();
__host__ __attribute__((cpu_specific(ivybridge))) f();
__device__ __attribute__((cpu_specific(gfx900))) f();
__device__ __attribute__((cpu_specific(gfx906))) f();
```
instead of all `__device__ __host__` functions?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-23 Thread Eric Christopher via Phabricator via cfe-commits
echristo added a subscriber: erichkeane.
echristo added inline comments.



Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+

rjmccall wrote:
> yaxunl wrote:
> > rjmccall wrote:
> > > If we move anything below this check, it needs to figure out a tri-state 
> > > so that it can return false if `Cand2` is a better candidate than 
> > > `Cand1`.  Now, that only matters if multiversion functions are supported 
> > > under CUDA, but if you're relying on them not being supported, that 
> > > should at least be commented on.
> > multiversion host functions is orthogonal to CUDA therefore should be 
> > supported. multiversion in device, host device, and global functions are 
> > not supported. However this change does not make things worse, and should 
> > continue to work if they are supported.
> > 
> > host/device based overloading resolution is mostly for determining 
> > viability of a function. If two functions are both viable, other factors 
> > should take precedence in preference. This general rule has been taken for 
> > cases other than multiversion, I think it should also apply to multiversion.
> > 
> > I will make isBetterMultiversionCandidate three states.
> > This general rule has been taken for cases other than multiversion, I think 
> > it should also apply to multiversion.
> 
> Well, but the multiversion people could say the same: that multiversioning is 
> for picking an alternative among otherwise-identical functions, and HD and H 
> functions are not otherwise-identical.
> 
> CC'ing @echristo for his thoughts on the right ordering here.
Adding @erichkeane here as well.

I think this makes sense, but I can see a reason to multiversion a function 
that will run on host and device. A version of some matrix mult that takes 
advantage of 3 host architectures and one cuda one? Am I missing something here?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-23 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Okay, one minor fix.




Comment at: clang/lib/Sema/SemaOverload.cpp:9389
+  if (Cand2.Function->isInvalidDecl())
+return Comparison::Better;
 

This is neglecting the case where they're both invalid.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-22 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 259458.
yaxunl added a comment.

Revised to let host/device take precedence over multiversion, as John suggested.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954

Files:
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -331,9 +331,6 @@
 // If we have a mix of HD and H-only or D-only candidates in the overload set,
 // normal C++ overload resolution rules apply first.
 template  TemplateReturnTy template_vs_hd_function(T arg)
-#ifdef __CUDA_ARCH__
-//expected-note@-2 {{declared here}}
-#endif
 {
   return TemplateReturnTy();
 }
@@ -342,11 +339,13 @@
 }
 
 __host__ __device__ void test_host_device_calls_hd_template() {
-  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
-  TemplateReturnTy ret2 = template_vs_hd_function(1);
 #ifdef __CUDA_ARCH__
-  // expected-error@-2 {{reference to __host__ function 'template_vs_hd_function' in __host__ __device__ function}}
+  typedef HostDeviceReturnTy ExpectedReturnTy;
+#else
+  typedef TemplateReturnTy ExpectedReturnTy;
 #endif
+  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
+  ExpectedReturnTy ret2 = template_vs_hd_function(1);
 }
 
 __host__ void test_host_calls_hd_template() {
@@ -367,14 +366,14 @@
 __device__ DeviceReturnTy device_only_function(int arg) { return DeviceReturnTy(); }
 __device__ DeviceReturnTy2 device_only_function(float arg) { return DeviceReturnTy2(); }
 #ifndef __CUDA_ARCH__
-  // expected-note@-3 {{'device_only_function' declared here}}
-  // expected-note@-3 {{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
 #endif
 __host__ HostReturnTy host_only_function(int arg) { return HostReturnTy(); }
 __host__ HostReturnTy2 host_only_function(float arg) { return HostReturnTy2(); }
 #ifdef __CUDA_ARCH__
-  // expected-note@-3 {{'host_only_function' declared here}}
-  // expected-note@-3 {{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
 #endif
 
 __host__ __device__ void test_host_device_single_side_overloading() {
@@ -392,6 +391,37 @@
 #endif
 }
 
+// wrong-sided overloading should not cause diagnostic unless it is emitted.
+// This inline function is not emitted.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_no_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+  HostReturnTy ret3 = host_only_function(1);
+  HostReturnTy2 ret4 = host_only_function(1.0f);
+}
+
+// wrong-sided overloading should cause diagnostic if it is emitted.
+// This inline function is emitted since it is called by an emitted function.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+#ifndef __CUDA_ARCH__
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
+#endif
+  HostReturnTy ret3 = host_only_function(1);
+  HostReturnTy2 ret4 = host_only_function(1.0f);
+#ifdef __CUDA_ARCH__
+  // expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
+  // expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
+#endif
+}
+
+__host__ __device__ void test_host_device_wrong_side_overloading_inline_diag_caller() {
+  test_host_device_wrong_side_overloading_inline_diag();
+  // expected-note@-1 {{called by 'test_host_device_wrong_side_overloading_inline_diag_caller'}}
+}
+
 // Verify that we allow overloading function templates.
 template  __host__ T template_overload(const T ) { return a; };
 template  __device__ T template_overload(const T ) { return a; };
Index: clang/lib/Sema/SemaOverload.cpp
===
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -9374,16 +9374,19 @@
   return Comparison::Equal;
 }
 
-static bool isBetterMultiversionCandidate(const OverloadCandidate ,
-  const OverloadCandidate ) {
+static Comparison
+isBetterMultiversionCandidate(const OverloadCandidate ,
+  const OverloadCandidate ) {
   if (!Cand1.Function || !Cand1.Function->isMultiVersion() || !Cand2.Function ||
   !Cand2.Function->isMultiVersion())
-return false;
+return 

[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-20 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

ping


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

LGTM in principle. That said, my gut feeling is that this patch has a good 
chance of breaking something in sufficiently convoluted CUDA code like Eigen. 
When you land this patch, I'd appreciate if you could do it on a workday 
morning (Pacific time) so I'm around to test it on our code and revert if 
something unexpected pops up.

On a side note, this case is another point towards having to redo handling of 
`__host__ __device__`.  There are way too many corner cases all over the place. 
Things will only get worse as we move towards newer C++ standard where a lot 
more code becomes constexpr which is implicitly `HD`. Having calls from `HD` 
functions resolve in a different way during host/device compilation is 
observable and may result in host and device code diverging unexpectedly.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-13 Thread John McCall via Phabricator via cfe-commits
rjmccall added a reviewer: echristo.
rjmccall added a subscriber: echristo.
rjmccall added inline comments.



Comment at: clang/lib/Sema/SemaOverload.cpp:9481
+  // emitted, Cand1 is not better than Cand2. This rule should have precedence
+  // over other rules.
+  //

yaxunl wrote:
> rjmccall wrote:
> > Please add `[CUDA]` or something similar to the top of this comment so that 
> > readers can immediately know that it's dialect-specific.
> > 
> > At a high level, this part of the rule is essentially saying that CUDA 
> > non-emittability is a kind of non-viability.  Should we just make 
> > non-emittable functions get flagged as non-viable (which will avoid a lot 
> > of relatively expensive conversion checking), or is it important to be able 
> > to select non-emittable candidates over candidates that are non-viable for 
> > other reasons?
> There are two situations for "bad" callees:
> 
> 1. the callee should never be called. It is not just invalid call in codegen, 
> but also invalid call in AST. e.g. a host function call a device function. In 
> CUDA call preference, it is termed "never". And clang already removed such 
> callees from overload candidates.
> 
> 2. the callee should not be called in codegen, but may be called in AST. This 
> happens with `__host__ __device__` functions when calling a "wrong sided" 
> function. e.g. in device compilation, a `__host__ __device__` function calls 
> a `__host__` function. This is valid in AST since the `__host__ __device__` 
> function may be an inline function which is only called by a `__host__` 
> function. There is a deferred diagnostic for the wrong-sided call, which is 
> triggered only if the caller is emitted. However in overloading resolution, 
> if no better candidates are available, wrong-sided candidates are still 
> viable.
Oh, I see what you're saying; sorry, I mis-read the code.  So anything with a 
preference *worse* than wrong-sided is outright non-viable; there's a very 
strong preference against wrong-sided calls that takes priority of all of the 
normal overload-resolution rules; and then there's a very weak preference 
against non-exact matches that everything else takes priority over.  Okay.



Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+

yaxunl wrote:
> rjmccall wrote:
> > If we move anything below this check, it needs to figure out a tri-state so 
> > that it can return false if `Cand2` is a better candidate than `Cand1`.  
> > Now, that only matters if multiversion functions are supported under CUDA, 
> > but if you're relying on them not being supported, that should at least be 
> > commented on.
> multiversion host functions is orthogonal to CUDA therefore should be 
> supported. multiversion in device, host device, and global functions are not 
> supported. However this change does not make things worse, and should 
> continue to work if they are supported.
> 
> host/device based overloading resolution is mostly for determining viability 
> of a function. If two functions are both viable, other factors should take 
> precedence in preference. This general rule has been taken for cases other 
> than multiversion, I think it should also apply to multiversion.
> 
> I will make isBetterMultiversionCandidate three states.
> This general rule has been taken for cases other than multiversion, I think 
> it should also apply to multiversion.

Well, but the multiversion people could say the same: that multiversioning is 
for picking an alternative among otherwise-identical functions, and HD and H 
functions are not otherwise-identical.

CC'ing @echristo for his thoughts on the right ordering here.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-13 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 256973.
yaxunl marked 3 inline comments as done.
yaxunl added a comment.

fix preference for multiversion. add comments. add more tests for wrong-sided 
function.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954

Files:
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -331,9 +331,6 @@
 // If we have a mix of HD and H-only or D-only candidates in the overload set,
 // normal C++ overload resolution rules apply first.
 template  TemplateReturnTy template_vs_hd_function(T arg)
-#ifdef __CUDA_ARCH__
-//expected-note@-2 {{declared here}}
-#endif
 {
   return TemplateReturnTy();
 }
@@ -342,11 +339,13 @@
 }
 
 __host__ __device__ void test_host_device_calls_hd_template() {
-  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
-  TemplateReturnTy ret2 = template_vs_hd_function(1);
 #ifdef __CUDA_ARCH__
-  // expected-error@-2 {{reference to __host__ function 'template_vs_hd_function' in __host__ __device__ function}}
+  typedef HostDeviceReturnTy ExpectedReturnTy;
+#else
+  typedef TemplateReturnTy ExpectedReturnTy;
 #endif
+  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
+  ExpectedReturnTy ret2 = template_vs_hd_function(1);
 }
 
 __host__ void test_host_calls_hd_template() {
@@ -367,14 +366,14 @@
 __device__ DeviceReturnTy device_only_function(int arg) { return DeviceReturnTy(); }
 __device__ DeviceReturnTy2 device_only_function(float arg) { return DeviceReturnTy2(); }
 #ifndef __CUDA_ARCH__
-  // expected-note@-3 {{'device_only_function' declared here}}
-  // expected-note@-3 {{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
 #endif
 __host__ HostReturnTy host_only_function(int arg) { return HostReturnTy(); }
 __host__ HostReturnTy2 host_only_function(float arg) { return HostReturnTy2(); }
 #ifdef __CUDA_ARCH__
-  // expected-note@-3 {{'host_only_function' declared here}}
-  // expected-note@-3 {{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
 #endif
 
 __host__ __device__ void test_host_device_single_side_overloading() {
@@ -392,6 +391,37 @@
 #endif
 }
 
+// wrong-sided overloading should not cause diagnostic unless it is emitted.
+// This inline function is not emitted.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_no_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+  HostReturnTy ret3 = host_only_function(1);
+  HostReturnTy2 ret4 = host_only_function(1.0f);
+}
+
+// wrong-sided overloading should cause diagnostic if it is emitted.
+// This inline function is emitted since it is called by an emitted function.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+#ifndef __CUDA_ARCH__
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
+#endif
+  HostReturnTy ret3 = host_only_function(1);
+  HostReturnTy2 ret4 = host_only_function(1.0f);
+#ifdef __CUDA_ARCH__
+  // expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
+  // expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
+#endif
+}
+
+__host__ __device__ void test_host_device_wrong_side_overloading_inline_diag_caller() {
+  test_host_device_wrong_side_overloading_inline_diag();
+  // expected-note@-1 {{called by 'test_host_device_wrong_side_overloading_inline_diag_caller'}}
+}
+
 // Verify that we allow overloading function templates.
 template  __host__ T template_overload(const T ) { return a; };
 template  __device__ T template_overload(const T ) { return a; };
Index: clang/lib/Sema/SemaOverload.cpp
===
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -9374,16 +9374,19 @@
   return Comparison::Equal;
 }
 
-static bool isBetterMultiversionCandidate(const OverloadCandidate ,
-  const OverloadCandidate ) {
+static Comparison
+isBetterMultiversionCandidate(const OverloadCandidate ,
+  const OverloadCandidate ) {
   if (!Cand1.Function || !Cand1.Function->isMultiVersion() || !Cand2.Function ||
   

[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-13 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked 6 inline comments as done.
yaxunl added inline comments.



Comment at: clang/lib/Sema/SemaOverload.cpp:9481
+  // emitted, Cand1 is not better than Cand2. This rule should have precedence
+  // over other rules.
+  //

rjmccall wrote:
> Please add `[CUDA]` or something similar to the top of this comment so that 
> readers can immediately know that it's dialect-specific.
> 
> At a high level, this part of the rule is essentially saying that CUDA 
> non-emittability is a kind of non-viability.  Should we just make 
> non-emittable functions get flagged as non-viable (which will avoid a lot of 
> relatively expensive conversion checking), or is it important to be able to 
> select non-emittable candidates over candidates that are non-viable for other 
> reasons?
There are two situations for "bad" callees:

1. the callee should never be called. It is not just invalid call in codegen, 
but also invalid call in AST. e.g. a host function call a device function. In 
CUDA call preference, it is termed "never". And clang already removed such 
callees from overload candidates.

2. the callee should not be called in codegen, but may be called in AST. This 
happens with `__host__ __device__` functions when calling a "wrong sided" 
function. e.g. in device compilation, a `__host__ __device__` function calls a 
`__host__` function. This is valid in AST since the `__host__ __device__` 
function may be an inline function which is only called by a `__host__` 
function. There is a deferred diagnostic for the wrong-sided call, which is 
triggered only if the caller is emitted. However in overloading resolution, if 
no better candidates are available, wrong-sided candidates are still viable.



Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+

rjmccall wrote:
> If we move anything below this check, it needs to figure out a tri-state so 
> that it can return false if `Cand2` is a better candidate than `Cand1`.  Now, 
> that only matters if multiversion functions are supported under CUDA, but if 
> you're relying on them not being supported, that should at least be commented 
> on.
multiversion host functions is orthogonal to CUDA therefore should be 
supported. multiversion in device, host device, and global functions are not 
supported. However this change does not make things worse, and should continue 
to work if they are supported.

host/device based overloading resolution is mostly for determining viability of 
a function. If two functions are both viable, other factors should take 
precedence in preference. This general rule has been taken for cases other than 
multiversion, I think it should also apply to multiversion.

I will make isBetterMultiversionCandidate three states.



Comment at: clang/lib/Sema/SemaOverload.cpp:9752
+  // If other rules cannot determine which is better, CUDA preference is used
+  // to determine which is better.
+  if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {

rjmccall wrote:
> Okay, let's think about the right place to put this check in the ordering; we 
> don't want different extensions to get into a who-comes-last competition.
> 
> - Certainly this should have lower priority than the standard-defined 
> preferences like argument conversion ranks or `enable_if` partial-ordering.
> - The preference for pass-object-size parameters is probably most similar to 
> a type-based-overloading decision and so should take priority.
> - I would say that this should take priority over function multi-versioning.  
> Function multi-versioning is all about making specialized versions of the 
> "same function", whereas I think host/device overloading is meant to be 
> semantically broader than that.
> 
> What do you think?
> 
> Regardless, the rationale for the order should be explained in comments.
I will add comments for the rationale of preference.

I commented the preference between multiversion and host/device in another 
comment.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-12 Thread John McCall via Phabricator via cfe-commits
rjmccall added inline comments.



Comment at: clang/lib/Sema/SemaOverload.cpp:9481
+  // emitted, Cand1 is not better than Cand2. This rule should have precedence
+  // over other rules.
+  //

Please add `[CUDA]` or something similar to the top of this comment so that 
readers can immediately know that it's dialect-specific.

At a high level, this part of the rule is essentially saying that CUDA 
non-emittability is a kind of non-viability.  Should we just make non-emittable 
functions get flagged as non-viable (which will avoid a lot of relatively 
expensive conversion checking), or is it important to be able to select 
non-emittable candidates over candidates that are non-viable for other reasons?



Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+

If we move anything below this check, it needs to figure out a tri-state so 
that it can return false if `Cand2` is a better candidate than `Cand1`.  Now, 
that only matters if multiversion functions are supported under CUDA, but if 
you're relying on them not being supported, that should at least be commented 
on.



Comment at: clang/lib/Sema/SemaOverload.cpp:9752
+  // If other rules cannot determine which is better, CUDA preference is used
+  // to determine which is better.
+  if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {

Okay, let's think about the right place to put this check in the ordering; we 
don't want different extensions to get into a who-comes-last competition.

- Certainly this should have lower priority than the standard-defined 
preferences like argument conversion ranks or `enable_if` partial-ordering.
- The preference for pass-object-size parameters is probably most similar to a 
type-based-overloading decision and so should take priority.
- I would say that this should take priority over function multi-versioning.  
Function multi-versioning is all about making specialized versions of the "same 
function", whereas I think host/device overloading is meant to be semantically 
broader than that.

What do you think?

Regardless, the rationale for the order should be explained in comments.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

2020-04-12 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 256903.
yaxunl retitled this revision from "[CUDA][HIP] Fix overload resolution issue 
for device host functions" to "[CUDA][HIP] Fix host/device based overload 
resolution".
yaxunl added a comment.

Revised by John's comments.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954

Files:
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -331,9 +331,6 @@
 // If we have a mix of HD and H-only or D-only candidates in the overload set,
 // normal C++ overload resolution rules apply first.
 template  TemplateReturnTy template_vs_hd_function(T arg)
-#ifdef __CUDA_ARCH__
-//expected-note@-2 {{declared here}}
-#endif
 {
   return TemplateReturnTy();
 }
@@ -342,11 +339,13 @@
 }
 
 __host__ __device__ void test_host_device_calls_hd_template() {
-  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
-  TemplateReturnTy ret2 = template_vs_hd_function(1);
 #ifdef __CUDA_ARCH__
-  // expected-error@-2 {{reference to __host__ function 'template_vs_hd_function' in __host__ __device__ function}}
+  typedef HostDeviceReturnTy ExpectedReturnTy;
+#else
+  typedef TemplateReturnTy ExpectedReturnTy;
 #endif
+  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
+  ExpectedReturnTy ret2 = template_vs_hd_function(1);
 }
 
 __host__ void test_host_calls_hd_template() {
Index: clang/lib/Sema/SemaOverload.cpp
===
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -9475,6 +9475,35 @@
   else if (!Cand1.Viable)
 return false;
 
+  // If Cand1 can be emitted and Cand2 cannot be emitted in the current context,
+  // Cand1 is better than Cand2. If Cand1 can not be emitted and Cand2 can be
+  // emitted, Cand1 is not better than Cand2. This rule should have precedence
+  // over other rules.
+  //
+  // If both Cand1 and Cand2 can be emitted, or neither can be emitted, then
+  // other rules should be used to determine which is better.
+  //
+  // If other rules cannot determine which is better, CUDA preference will be
+  // used again to determine which is better.
+  //
+  // TODO: Currently IdentifyCUDAPreference does not return correct values
+  // for functions called in global variable initializers due to missing
+  // correct context about device/host. Therefore we can only enforce this
+  // rule when there is a caller. We should enforce this rule for functions
+  // in global variable initializers once proper context is added.
+  if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
+if (FunctionDecl *Caller = dyn_cast(S.CurContext)) {
+  auto Cand1Emittable = S.IdentifyCUDAPreference(Caller, Cand1.Function) >
+Sema::CFP_WrongSide;
+  auto Cand2Emittable = S.IdentifyCUDAPreference(Caller, Cand2.Function) >
+Sema::CFP_WrongSide;
+  if (Cand1Emittable && !Cand2Emittable)
+return true;
+  if (!Cand1Emittable && Cand2Emittable)
+return false;
+}
+  }
+
   // C++ [over.match.best]p1:
   //
   //   -- if F is a static member function, ICS1(F) is defined such
@@ -9709,12 +9738,6 @@
   return Cmp == Comparison::Better;
   }
 
-  if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
-FunctionDecl *Caller = dyn_cast(S.CurContext);
-return S.IdentifyCUDAPreference(Caller, Cand1.Function) >
-   S.IdentifyCUDAPreference(Caller, Cand2.Function);
-  }
-
   bool HasPS1 = Cand1.Function != nullptr &&
 functionHasPassObjectSizeParams(Cand1.Function);
   bool HasPS2 = Cand2.Function != nullptr &&
@@ -9722,7 +9745,19 @@
   if (HasPS1 != HasPS2 && HasPS1)
 return true;
 
-  return isBetterMultiversionCandidate(Cand1, Cand2);
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+return true;
+
+  // If other rules cannot determine which is better, CUDA preference is used
+  // to determine which is better.
+  if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
+if (FunctionDecl *Caller = dyn_cast(S.CurContext)) {
+  return S.IdentifyCUDAPreference(Caller, Cand1.Function) >
+ S.IdentifyCUDAPreference(Caller, Cand2.Function);
+}
+  }
+
+  return false;
 }
 
 /// Determine whether two declarations are "equivalent" for the purposes of
@@ -9808,33 +9843,6 @@
   std::transform(begin(), end(), std::back_inserter(Candidates),
  [](OverloadCandidate ) { return  });
 
-  // [CUDA] HD->H or HD->D calls are technically not allowed by CUDA but
-  // are accepted by both clang and NVCC. However, during a particular
-  // compilation mode only one call variant is viable. We need to
-  // exclude non-viable overload candidates from consideration