[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-24 Thread Joseph Huber via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGba57828e11c5: [CUDA][OpenMP] Fix the new driver crashing on 
multiple device-only outputs (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248

Files:
  clang/lib/Driver/Driver.cpp
  clang/test/Driver/cuda-bindings.cu
  clang/test/Driver/hip-binding.hip

Index: clang/test/Driver/hip-binding.hip
===
--- clang/test/Driver/hip-binding.hip
+++ clang/test/Driver/hip-binding.hip
@@ -51,3 +51,20 @@
 
 // NORDC-NOT: offload bundler
 // NORDC: # "x86_64-unknown-linux-gnu" - "GNU::Linker", inputs: ["{{.*o}}"], output: "a.out"
+
+//
+// Check to make sure we can generate multiple outputs for device-only
+// compilation and fail with '-o'.
+//
+// RUN: %clang -### -target x86_64-linux-gnu --offload-new-driver -ccc-print-bindings \
+// RUN:--offload-arch=gfx90a --offload-arch=gfx908 --offload-device-only -c %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY %s
+//  MULTI-D-ONLY: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[GFX908:.+]]"
+// MULTI-D-ONLY-NEXT: # "amdgcn-amd-amdhsa" - "AMDGCN::Linker", inputs: ["[[GFX908]]"], output: "[[GFX908_OUT:.+]]"
+// MULTI-D-ONLY-NEXT: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]"], output: "[[GFX90a:.+]]"
+// MULTI-D-ONLY-NEXT: # "amdgcn-amd-amdhsa" - "AMDGCN::Linker", inputs: ["[[GFX90a]]"], output: "[[GFX90a_OUT:.+]]"
+//
+// RUN: %clang -### -target x86_64-linux-gnu --offload-new-driver -ccc-print-bindings \
+// RUN:--offload-arch=gfx90a --offload-arch=gfx908 --offload-device-only -c -o %t %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY-O %s
+// MULTI-D-ONLY-O: error: cannot specify -o when generating multiple output files
Index: clang/test/Driver/cuda-bindings.cu
===
--- clang/test/Driver/cuda-bindings.cu
+++ clang/test/Driver/cuda-bindings.cu
@@ -146,3 +146,20 @@
 // RUN:--cuda-gpu-arch=sm_52 --cuda-device-only -c -o foo.o %s 2>&1 \
 // RUN: | FileCheck -check-prefix=D_ONLY %s
 // D_ONLY: "foo.o"
+
+//
+// Check to make sure we can generate multiple outputs for device-only
+// compilation and fail with '-o'.
+//
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu --offload-new-driver -ccc-print-bindings \
+// RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only -c %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY %s
+//  MULTI-D-ONLY: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX_70]]"], output: "[[CUBIN_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]"], output: "[[PTX_52:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX_52]]"], output: "[[CUBIN_52:.+]]"
+//
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu --offload-new-driver -ccc-print-bindings \
+// RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only -c -o %t %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY-O %s
+// MULTI-D-ONLY-O: error: cannot specify -o when generating multiple output files
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -4716,10 +4716,13 @@
   // we are also generating .o files. So we allow more than one output file in
   // this case as well.
   //
+  // OffloadClass of type TY_Nothing: device-only output will place many outputs
+  // into a single offloading action. We should count all inputs to the action
+  // as outputs.
   if (FinalOutput) {
 unsigned NumOutputs = 0;
 unsigned NumIfsOutputs = 0;
-for (const Action *A : C.getActions())
+for (const Action *A : C.getActions()) {
   if (A->getType() != types::TY_Nothing &&
   !(A->getKind() == Action::IfsMergeJobClass ||
 (A->getType() == clang::driver::types::TY_IFS_CPP &&
@@ -4728,6 +4731,10 @@
 (A->getKind() == Action::BindArchClass && A->getInputs().size() &&
  A->getInputs().front()->getKind() == Action::IfsMergeJobClass)))
 ++NumOutputs;
+  else if (A->getKind() == Action::OffloadClass &&
+   A->getType() == types::TY_Nothing)
+NumOutputs += A->size();
+}
 
 if (NumOutputs > 1) {
   Diag(clang::diag::err_drv_output_argument_with_multiple_files);
@@ -5265,20 +5272,21 @@
 //  \
 //Device Action 1  ---> OffloadAction -> Device Action 2
 //
-// For a) and b), we just return the job generated for the dependence. For
+// For a) and b), we just return the job generated for the dependences. For
 // c) 

[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-23 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl accepted this revision.
yaxunl added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-23 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 454839.
jhuber6 added a comment.

Adding HIP test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248

Files:
  clang/lib/Driver/Driver.cpp
  clang/test/Driver/cuda-bindings.cu
  clang/test/Driver/hip-binding.hip

Index: clang/test/Driver/hip-binding.hip
===
--- clang/test/Driver/hip-binding.hip
+++ clang/test/Driver/hip-binding.hip
@@ -51,3 +51,20 @@
 
 // NORDC-NOT: offload bundler
 // NORDC: # "x86_64-unknown-linux-gnu" - "GNU::Linker", inputs: ["{{.*o}}"], output: "a.out"
+
+//
+// Check to make sure we can generate multiple outputs for device-only
+// compilation and fail with '-o'.
+//
+// RUN: %clang -### -target x86_64-linux-gnu --offload-new-driver -ccc-print-bindings \
+// RUN:--offload-arch=gfx90a --offload-arch=gfx908 --offload-device-only -c %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY %s
+//  MULTI-D-ONLY: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[GFX908:.+]]"
+// MULTI-D-ONLY-NEXT: # "amdgcn-amd-amdhsa" - "AMDGCN::Linker", inputs: ["[[GFX908]]"], output: "[[GFX908_OUT:.+]]"
+// MULTI-D-ONLY-NEXT: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]"], output: "[[GFX90a:.+]]"
+// MULTI-D-ONLY-NEXT: # "amdgcn-amd-amdhsa" - "AMDGCN::Linker", inputs: ["[[GFX90a]]"], output: "[[GFX90a_OUT:.+]]"
+//
+// RUN: %clang -### -target x86_64-linux-gnu --offload-new-driver -ccc-print-bindings \
+// RUN:--offload-arch=gfx90a --offload-arch=gfx908 --offload-device-only -c -o %t %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY-O %s
+// MULTI-D-ONLY-O: error: cannot specify -o when generating multiple output files
Index: clang/test/Driver/cuda-bindings.cu
===
--- clang/test/Driver/cuda-bindings.cu
+++ clang/test/Driver/cuda-bindings.cu
@@ -146,3 +146,20 @@
 // RUN:--cuda-gpu-arch=sm_52 --cuda-device-only -c -o foo.o %s 2>&1 \
 // RUN: | FileCheck -check-prefix=D_ONLY %s
 // D_ONLY: "foo.o"
+
+//
+// Check to make sure we can generate multiple outputs for device-only
+// compilation and fail with '-o'.
+//
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu --offload-new-driver -ccc-print-bindings \
+// RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only -c %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY %s
+//  MULTI-D-ONLY: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX_70]]"], output: "[[CUBIN_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]"], output: "[[PTX_52:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX_52]]"], output: "[[CUBIN_52:.+]]"
+//
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu --offload-new-driver -ccc-print-bindings \
+// RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only -c -o %t %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY-O %s
+// MULTI-D-ONLY-O: error: cannot specify -o when generating multiple output files
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -4716,10 +4716,13 @@
   // we are also generating .o files. So we allow more than one output file in
   // this case as well.
   //
+  // OffloadClass of type TY_Nothing: device-only output will place many outputs
+  // into a single offloading action. We should count all inputs to the action
+  // as outputs.
   if (FinalOutput) {
 unsigned NumOutputs = 0;
 unsigned NumIfsOutputs = 0;
-for (const Action *A : C.getActions())
+for (const Action *A : C.getActions()) {
   if (A->getType() != types::TY_Nothing &&
   !(A->getKind() == Action::IfsMergeJobClass ||
 (A->getType() == clang::driver::types::TY_IFS_CPP &&
@@ -4728,6 +4731,10 @@
 (A->getKind() == Action::BindArchClass && A->getInputs().size() &&
  A->getInputs().front()->getKind() == Action::IfsMergeJobClass)))
 ++NumOutputs;
+  else if (A->getKind() == Action::OffloadClass &&
+   A->getType() == types::TY_Nothing)
+NumOutputs += A->size();
+}
 
 if (NumOutputs > 1) {
   Diag(clang::diag::err_drv_output_argument_with_multiple_files);
@@ -5265,20 +5272,21 @@
 //  \
 //Device Action 1  ---> OffloadAction -> Device Action 2
 //
-// For a) and b), we just return the job generated for the dependence. For
+// For a) and b), we just return the job generated for the dependences. For
 // c) and d) we override the current action with the host/device dependence
 // if the current toolchain is host/device 

[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D132248#3736295 , @tra wrote:

> I'm OK with that.
>
> @yaxunl -- what are your thoughts on whether this approach would work for 
> HIP? On one hand HIP already has a lot of features that the new driver is 
> intended to provide, so AMD may have no pressure to change to something else. 
> On the other hand, long term it would make sense to unify the driver pipeline 
> across the different offloading mechanisms we have now.

I am OK as long as it works for HIP and does not break the old driver.




Comment at: clang/test/Driver/cuda-bindings.cu:154
+//
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu --offload-new-driver 
-ccc-print-bindings \
+// RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only 
-c %s 2>&1 \

can you add similar tests for HIP? Thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

I'm OK with that.

@yaxunl -- what are your thoughts on whether this approach would work for HIP? 
On one hand HIP already has a lot of features that the new driver is intended 
to provide, so AMD may have no pressure to change to something else. On the 
other hand, long term it would make sense to unify the driver pipeline across 
the different offloading mechanisms we have now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 454076.
jhuber6 added a comment.

Updating to error with `-o` and multiple files.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248

Files:
  clang/lib/Driver/Driver.cpp
  clang/test/Driver/cuda-bindings.cu


Index: clang/test/Driver/cuda-bindings.cu
===
--- clang/test/Driver/cuda-bindings.cu
+++ clang/test/Driver/cuda-bindings.cu
@@ -146,3 +146,20 @@
 // RUN:--cuda-gpu-arch=sm_52 --cuda-device-only -c -o foo.o %s 2>&1 \
 // RUN: | FileCheck -check-prefix=D_ONLY %s
 // D_ONLY: "foo.o"
+
+//
+// Check to make sure we can generate multiple outputs for device-only
+// compilation and fail with '-o'.
+//
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu --offload-new-driver 
-ccc-print-bindings \
+// RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only 
-c %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY %s
+//  MULTI-D-ONLY: # "nvptx64-nvidia-cuda" - "clang", inputs: 
["[[INPUT:.+]]"], output: "[[PTX_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[PTX_70]]"], output: "[[CUBIN_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: 
["[[INPUT]]"], output: "[[PTX_52:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[PTX_52]]"], output: "[[CUBIN_52:.+]]"
+//
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu --offload-new-driver 
-ccc-print-bindings \
+// RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only 
-c -o %t %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY-O %s
+// MULTI-D-ONLY-O: error: cannot specify -o when generating multiple output 
files
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -4712,10 +4712,13 @@
   // we are also generating .o files. So we allow more than one output file in
   // this case as well.
   //
+  // OffloadClass of type TY_Nothing: device-only output will place many 
outputs
+  // into a single offloading action. We should count all inputs to the action
+  // as outputs.
   if (FinalOutput) {
 unsigned NumOutputs = 0;
 unsigned NumIfsOutputs = 0;
-for (const Action *A : C.getActions())
+for (const Action *A : C.getActions()) {
   if (A->getType() != types::TY_Nothing &&
   !(A->getKind() == Action::IfsMergeJobClass ||
 (A->getType() == clang::driver::types::TY_IFS_CPP &&
@@ -4724,6 +4727,10 @@
 (A->getKind() == Action::BindArchClass && A->getInputs().size() &&
  A->getInputs().front()->getKind() == Action::IfsMergeJobClass)))
 ++NumOutputs;
+  else if (A->getKind() == Action::OffloadClass &&
+   A->getType() == types::TY_Nothing)
+NumOutputs += A->size();
+}
 
 if (NumOutputs > 1) {
   Diag(clang::diag::err_drv_output_argument_with_multiple_files);
@@ -5261,20 +5268,21 @@
 //  \
 //Device Action 1  ---> OffloadAction -> Device Action 2
 //
-// For a) and b), we just return the job generated for the dependence. For
+// For a) and b), we just return the job generated for the dependences. For
 // c) and d) we override the current action with the host/device dependence
 // if the current toolchain is host/device and set the offload dependences
 // info with the jobs obtained from the device/host dependence(s).
 
-// If there is a single device option, just generate the job for it.
-if (OA->hasSingleDeviceDependence()) {
+// If there is a single device option or has no host action, just generate
+// the job for it.
+if (OA->hasSingleDeviceDependence() || !OA->hasHostDependence()) {
   InputInfoList DevA;
   OA->doOnEachDeviceDependence([&](Action *DepA, const ToolChain *DepTC,
const char *DepBoundArch) {
-DevA =
-BuildJobsForAction(C, DepA, DepTC, DepBoundArch, AtTopLevel,
-   /*MultipleArchs*/ !!DepBoundArch, LinkingOutput,
-   CachedResults, DepA->getOffloadingDeviceKind());
+DevA.append(BuildJobsForAction(C, DepA, DepTC, DepBoundArch, 
AtTopLevel,
+   /*MultipleArchs*/ !!DepBoundArch,
+   LinkingOutput, CachedResults,
+   DepA->getOffloadingDeviceKind()));
   });
   return DevA;
 }


Index: clang/test/Driver/cuda-bindings.cu
===
--- clang/test/Driver/cuda-bindings.cu
+++ clang/test/Driver/cuda-bindings.cu
@@ -146,3 +146,20 @@
 // RUN:--cuda-gpu-arch=sm_52 --cuda-device-only -c -o foo.o %s 

[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D132248#3735943 , @tra wrote:

> In D132248#3735900 , @jhuber6 wrote:
>
>> Is this an architectural limitation? I'd imagine they'd just behave the same 
>> way here in my implementation.
>
> The constraint here is that we have to stick with a single output per 
> compiler invocation and that format of that output should be the same. E.g. 
> for C++ we'd expect to see an ELF file when we compile with `-c` and text 
> assembly, if we compile with `-S`.
>
> We could pack GPU objects into a fat binary, but for consistency it would 
> have to be done for single-target compilations, too. Packing a single object 
> into a fat binary would make little sense, but producing an object file or a 
> fat binary depending on the nubmer of targets would be inconsitent.
> Similarly, compilation with `-S` also gets tricky -- do you bundle the text 
> output? That would be not particularly useful as, presumably, one would want 
> to examine the assembly. We could concatenate together the ASM files, but 
> that would produce an assembly source we can't really assemble.
> On top of that, CUDA compilation has been around for a while and changing the 
> output format would be somewhat disruptive.
>
> In the end, CUDA stuck with insisting on erroring out when the `-o` has been 
> specified, but where it would need to produce multiple outputs.
> HIP grew a `--[no-]gpu-bundle-output` option to control whether to bundle 
> outputs of device-only compilation.

Thanks for the background. I'm assuming HIP did this because they use the old 
`clang-offload-bundler` which supported bundling multiple file types, while my 
new method relies on having some LLVM-IR to embed things in. I wasn't a huge 
fan of outputting bundles because it meant we couldn't do things like `clang -o 
- | opt` or similar. For my implementation I will probably make HIP do what 
CUDA does as I feel that is more reasonable unless someone has a major 
objection.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D132248#3735900 , @jhuber6 wrote:

> Is this an architectural limitation? I'd imagine they'd just behave the same 
> way here in my implementation.

The constraint here is that we have to stick with a single output per compiler 
invocation and that format of that output should be the same. E.g. for C++ we'd 
expect to see an ELF file when we compile with `-c` and text assembly, if we 
compile with `-S`.

We could pack GPU objects into a fat binary, but for consistency it would have 
to be done for single-target compilations, too. Packing a single object into a 
fat binary would make little sense, but producing an object file or a fat 
binary depending on the nubmer of targets would be inconsitent.
Similarly, compilation with `-S` also gets tricky -- do you bundle the text 
output? That would be not particularly useful as, presumably, one would want to 
examine the assembly. We could concatenate together the ASM files, but that 
would produce an assembly source we can't really assemble.
On top of that, CUDA compilation has been around for a while and changing the 
output format would be somewhat disruptive.

In the end, CUDA stuck with insisting on erroring out when the `-o` has been 
specified, but where it would need to produce multiple outputs.
HIP grew a `--[no-]gpu-bundle-output` option to control whether to bundle 
outputs of device-only compilation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D132248#3735793 , @tra wrote:

>> The old driver would put all the outputs in the final action list akin to a 
>> linker job.
>
> IIRC that's where HIP and CUDA behaved differently. CUDA compilation does not 
> allow device-only compilation for multiple targets if we have explicitly 
> specified output. It does produce individual per-gpu .o files if compiled 
> without `-o`.
>
>   bin/clang++ --cuda-path=$HOME/local/cuda-11.7 --offload-arch=sm_80 
> --offload-arch=sm_86 -x cuda axpy.cu  --cuda-device-only -O3  -c -o axpy.o
>   clang-15: error: cannot specify -o when generating multiple output files

Is this an architectural limitation? I'd imagine they'd just behave the same 
way here in my implementation.




Comment at: clang/test/Driver/cuda-bindings.cu:160
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: 
["[[INPUT]]"], output: "[[PTX_52:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[PTX_52]]"], output: "[[CUBIN_52:.+]]"

tra wrote:
> If we've specified `-o foo.o`, where do those multiple outputs go to?
> 
> The old driver disallowed using `-o` when compiling for multiple GPUs.
Good catch, right now it'll just write both of them to the same file.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

> The old driver would put all the outputs in the final action list akin to a 
> linker job.

IIRC that's where HIP and CUDA behaved differently. CUDA compilation does not 
allow device-only compilation for multiple targets if we have explicitly 
specified output. It does produce individual per-gpu .o files if compiled 
without `-o`.

  bin/clang++ --cuda-path=$HOME/local/cuda-11.7 --offload-arch=sm_80 
--offload-arch=sm_86 -x cuda axpy.cu  --cuda-device-only -O3  -c -o axpy.o
  clang-15: error: cannot specify -o when generating multiple output files






Comment at: clang/test/Driver/cuda-bindings.cu:160
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: 
["[[INPUT]]"], output: "[[PTX_52:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[PTX_52]]"], output: "[[CUBIN_52:.+]]"

If we've specified `-o foo.o`, where do those multiple outputs go to?

The old driver disallowed using `-o` when compiling for multiple GPUs.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 454036.
jhuber6 added a comment.

Forgot to use the new driver in the test.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248

Files:
  clang/lib/Driver/Driver.cpp
  clang/test/Driver/cuda-bindings.cu


Index: clang/test/Driver/cuda-bindings.cu
===
--- clang/test/Driver/cuda-bindings.cu
+++ clang/test/Driver/cuda-bindings.cu
@@ -146,3 +146,15 @@
 // RUN:--cuda-gpu-arch=sm_52 --cuda-device-only -c -o foo.o %s 2>&1 \
 // RUN: | FileCheck -check-prefix=D_ONLY %s
 // D_ONLY: "foo.o"
+
+//
+// Check to make sure we can generate multiple outputs for device-only
+// compilation.
+//
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu --offload-new-driver 
-ccc-print-bindings \
+// RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only 
-c -o foo.o %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY %s
+//  MULTI-D-ONLY: # "nvptx64-nvidia-cuda" - "clang", inputs: 
["[[INPUT:.+]]"], output: "[[PTX_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[PTX_70]]"], output: "[[CUBIN_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: 
["[[INPUT]]"], output: "[[PTX_52:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[PTX_52]]"], output: "[[CUBIN_52:.+]]"
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -5261,20 +5261,21 @@
 //  \
 //Device Action 1  ---> OffloadAction -> Device Action 2
 //
-// For a) and b), we just return the job generated for the dependence. For
+// For a) and b), we just return the job generated for the dependences. For
 // c) and d) we override the current action with the host/device dependence
 // if the current toolchain is host/device and set the offload dependences
 // info with the jobs obtained from the device/host dependence(s).
 
-// If there is a single device option, just generate the job for it.
-if (OA->hasSingleDeviceDependence()) {
+// If there is a single device option or has no host action, just generate
+// the job for it.
+if (OA->hasSingleDeviceDependence() || !OA->hasHostDependence()) {
   InputInfoList DevA;
   OA->doOnEachDeviceDependence([&](Action *DepA, const ToolChain *DepTC,
const char *DepBoundArch) {
-DevA =
-BuildJobsForAction(C, DepA, DepTC, DepBoundArch, AtTopLevel,
-   /*MultipleArchs*/ !!DepBoundArch, LinkingOutput,
-   CachedResults, DepA->getOffloadingDeviceKind());
+DevA.append(BuildJobsForAction(C, DepA, DepTC, DepBoundArch, 
AtTopLevel,
+   /*MultipleArchs*/ !!DepBoundArch,
+   LinkingOutput, CachedResults,
+   DepA->getOffloadingDeviceKind()));
   });
   return DevA;
 }


Index: clang/test/Driver/cuda-bindings.cu
===
--- clang/test/Driver/cuda-bindings.cu
+++ clang/test/Driver/cuda-bindings.cu
@@ -146,3 +146,15 @@
 // RUN:--cuda-gpu-arch=sm_52 --cuda-device-only -c -o foo.o %s 2>&1 \
 // RUN: | FileCheck -check-prefix=D_ONLY %s
 // D_ONLY: "foo.o"
+
+//
+// Check to make sure we can generate multiple outputs for device-only
+// compilation.
+//
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu --offload-new-driver -ccc-print-bindings \
+// RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only -c -o foo.o %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY %s
+//  MULTI-D-ONLY: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX_70]]"], output: "[[CUBIN_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]"], output: "[[PTX_52:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX_52]]"], output: "[[CUBIN_52:.+]]"
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -5261,20 +5261,21 @@
 //  \
 //Device Action 1  ---> OffloadAction -> Device Action 2
 //
-// For a) and b), we just return the job generated for the dependence. For
+// For a) and b), we just return the job generated for the dependences. For
 // c) and d) we override the current action with the host/device dependence
 // if the current toolchain is host/device and set the 

[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

2022-08-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: jdoerfert, tra, yaxunl, JonChesterfield.
Herald added subscribers: mattd, guansong.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, sstefan1, MaskRay.
Herald added a project: clang.

The new driver supports device-only compilation for the offloading
device. The way this is handlded is a little different from the old
offloading driver. The old driver would put all the outputs in the final
action list akin to a linker job. The new driver however generated these
in the middle of the host's job so we instead put them all in a single
offloading action. However, we only handled these kinds of offloading
actions correctly when there was only a single input. When we had
multiple inputs we would instead attempt to get the host job, which
didn't exist, and crash.

This patch simply adds some extra logic to generate the jobs for all
dependencies if there is not host action.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D132248

Files:
  clang/lib/Driver/Driver.cpp
  clang/test/Driver/cuda-bindings.cu


Index: clang/test/Driver/cuda-bindings.cu
===
--- clang/test/Driver/cuda-bindings.cu
+++ clang/test/Driver/cuda-bindings.cu
@@ -146,3 +146,15 @@
 // RUN:--cuda-gpu-arch=sm_52 --cuda-device-only -c -o foo.o %s 2>&1 \
 // RUN: | FileCheck -check-prefix=D_ONLY %s
 // D_ONLY: "foo.o"
+
+//
+// Check to make sure we can generate multiple outputs for device-only
+// compilation.
+//
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-bindings \
+// RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only 
-c -o foo.o %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY %s
+//  MULTI-D-ONLY: # "nvptx64-nvidia-cuda" - "clang", inputs: 
["[[INPUT:.+]]"], output: "[[PTX_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[PTX_70]]"], output: "[[CUBIN_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: 
["[[INPUT]]"], output: "[[PTX_52:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[PTX_52]]"], output: "[[CUBIN_52:.+]]"
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -5261,20 +5261,21 @@
 //  \
 //Device Action 1  ---> OffloadAction -> Device Action 2
 //
-// For a) and b), we just return the job generated for the dependence. For
+// For a) and b), we just return the job generated for the dependences. For
 // c) and d) we override the current action with the host/device dependence
 // if the current toolchain is host/device and set the offload dependences
 // info with the jobs obtained from the device/host dependence(s).
 
-// If there is a single device option, just generate the job for it.
-if (OA->hasSingleDeviceDependence()) {
+// If there is a single device option or has no host action, just generate
+// the job for it.
+if (OA->hasSingleDeviceDependence() || !OA->hasHostDependence()) {
   InputInfoList DevA;
   OA->doOnEachDeviceDependence([&](Action *DepA, const ToolChain *DepTC,
const char *DepBoundArch) {
-DevA =
-BuildJobsForAction(C, DepA, DepTC, DepBoundArch, AtTopLevel,
-   /*MultipleArchs*/ !!DepBoundArch, LinkingOutput,
-   CachedResults, DepA->getOffloadingDeviceKind());
+DevA.append(BuildJobsForAction(C, DepA, DepTC, DepBoundArch, 
AtTopLevel,
+   /*MultipleArchs*/ !!DepBoundArch,
+   LinkingOutput, CachedResults,
+   DepA->getOffloadingDeviceKind()));
   });
   return DevA;
 }


Index: clang/test/Driver/cuda-bindings.cu
===
--- clang/test/Driver/cuda-bindings.cu
+++ clang/test/Driver/cuda-bindings.cu
@@ -146,3 +146,15 @@
 // RUN:--cuda-gpu-arch=sm_52 --cuda-device-only -c -o foo.o %s 2>&1 \
 // RUN: | FileCheck -check-prefix=D_ONLY %s
 // D_ONLY: "foo.o"
+
+//
+// Check to make sure we can generate multiple outputs for device-only
+// compilation.
+//
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-bindings \
+// RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only -c -o foo.o %s 2>&1 \
+// RUN: | FileCheck -check-prefix=MULTI-D-ONLY %s
+//  MULTI-D-ONLY: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX_70]]"], output: "[[CUBIN_70:.+]]"
+// MULTI-D-ONLY-NEXT: # "nvptx64-nvidia-cuda" -