[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> LGTM overall, with docs/comment nits.

Done, thanks for the review.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/84367

>From 47d36058da1604d33023d1ed69221b3ee5bfee62 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 7 Mar 2024 13:44:50 -0600
Subject: [PATCH] [CUDA] Include PTX in non-RDC mode using the new driver

Summary:
The old driver embed PTX in rdc-mode and so does the `nvcc` compiler.
The new drivers currently does not do this, so we should keep it
consistent in this case. This simply requires adding the assembler
output as an input to the offloading action that gets fed to fatbin.
---
 clang/docs/ReleaseNotes.rst  |  3 +++
 clang/lib/Driver/Driver.cpp  |  8 
 clang/lib/Driver/ToolChains/Cuda.cpp | 22 --
 clang/test/Driver/cuda-phases.cu | 25 +
 4 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 1b901a27fd19d1..42c4a7c4d4bd14 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -391,6 +391,9 @@ RISC-V Support
 CUDA/HIP Language Changes
 ^
 
+- PTX is no longer included by default when compiling for CUDA. Using 
+  ``--cuda-include-ptx=all`` will return the old behavior.
+
 CUDA Support
 
 
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index cecd34acbc92c0..96e6ad77f5e50d 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
   DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadAction::DeviceDependences DDep;
   DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+  // Compiling CUDA in non-RDC mode uses the PTX output if available.
+  for (Action *Input : A->getInputs())
+if (Kind == Action::OFK_Cuda && A->getType() == types::TY_Object &&
+!Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
+  false))
+  DDep.add(*Input, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadActions.push_back(C.MakeAction(DDep, 
A->getType()));
+
   ++TCAndArch;
 }
   }
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 177fd6310e7ee2..c6007d3cfab864 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -503,18 +503,20 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const 
JobAction &JA,
   Exec, CmdArgs, Inputs, Output));
 }
 
-static bool shouldIncludePTX(const ArgList &Args, const char *gpu_arch) {
-  bool includePTX = true;
-  for (Arg *A : Args) {
-if (!(A->getOption().matches(options::OPT_cuda_include_ptx_EQ) ||
-  A->getOption().matches(options::OPT_no_cuda_include_ptx_EQ)))
-  continue;
+static bool shouldIncludePTX(const ArgList &Args, StringRef InputArch) {
+  // The new driver does not include PTX by default to avoid overhead.
+  bool includePTX = !Args.hasFlag(options::OPT_offload_new_driver,
+  options::OPT_no_offload_new_driver, false);
+  for (Arg *A : Args.filtered(options::OPT_cuda_include_ptx_EQ,
+  options::OPT_no_cuda_include_ptx_EQ)) {
 A->claim();
 const StringRef ArchStr = A->getValue();
-if (ArchStr == "all" || ArchStr == gpu_arch) {
-  includePTX = A->getOption().matches(options::OPT_cuda_include_ptx_EQ);
-  continue;
-}
+if (A->getOption().matches(options::OPT_cuda_include_ptx_EQ) &&
+(ArchStr == "all" || ArchStr == InputArch))
+  includePTX = true;
+else if (A->getOption().matches(options::OPT_no_cuda_include_ptx_EQ) &&
+ (ArchStr == "all" || ArchStr == InputArch))
+  includePTX = false;
   }
   return includePTX;
 }
diff --git a/clang/test/Driver/cuda-phases.cu b/clang/test/Driver/cuda-phases.cu
index 9a231091de2bdc..a1c3c9b51b1e41 100644
--- a/clang/test/Driver/cuda-phases.cu
+++ b/clang/test/Driver/cuda-phases.cu
@@ -244,31 +244,32 @@
 // NEW-DRIVER-RDC-NEXT: 18: assembler, {17}, object, (host-cuda)
 // NEW-DRIVER-RDC-NEXT: 19: clang-linker-wrapper, {18}, image, (host-cuda)
 
-// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver -fgpu-rdc \
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver \
 // RUN:   --offload-arch=sm_52 --offload-arch=sm_70 %s 2>&1 | FileCheck 
--check-prefix=NEW-DRIVER %s
-//  NEW-DRIVER: 0: input, "[[INPUT:.+]]", cuda
-// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output
-// NEW-DRIVER-NEXT: 2: compiler, {1}, ir
-// NEW-DRIVER-NEXT: 3: input, "[[INPUT]]", cuda, (device-cuda, sm_52)
+//  NEW-DRIVER: 0: input, "[[CUDA:.+]]", cuda, (host-cuda)
+// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
+// NEW-DRIVER-NEXT: 2: compiler, {1}, ir, (host-cuda)
+// NEW-DRIVER-NEXT: 3: input, "

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits

https://github.com/Artem-B approved this pull request.

LGTM overall, with docs/comment nits.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits


@@ -503,18 +503,20 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const 
JobAction &JA,
   Exec, CmdArgs, Inputs, Output));
 }
 
-static bool shouldIncludePTX(const ArgList &Args, const char *gpu_arch) {
-  bool includePTX = true;
-  for (Arg *A : Args) {
-if (!(A->getOption().matches(options::OPT_cuda_include_ptx_EQ) ||
-  A->getOption().matches(options::OPT_no_cuda_include_ptx_EQ)))
-  continue;
+static bool shouldIncludePTX(const ArgList &Args, StringRef InputArch) {
+  // The new driver does not include PTX by default.
+  bool includePTX = !Args.hasFlag(options::OPT_offload_new_driver,

Artem-B wrote:

I'd add a comment on why we're making this decision based on the new vs old 
driver.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> > > Should I make `shouldIncludePTX` default to `false` for the new driver?
> > 
> > 
> > Yes, I think that's a better default.
> 
> Done, now requires `--cuda-include-ptx=`.

This may be worth adding to the release notes.


https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/84367

>From afac73145dede37a847064b0bf0b9681c431f7d3 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 7 Mar 2024 13:44:50 -0600
Subject: [PATCH] [CUDA] Include PTX in non-RDC mode using the new driver

Summary:
The old driver embed PTX in rdc-mode and so does the `nvcc` compiler.
The new drivers currently does not do this, so we should keep it
consistent in this case. This simply requires adding the assembler
output as an input to the offloading action that gets fed to fatbin.
---
 clang/lib/Driver/Driver.cpp  |  8 
 clang/lib/Driver/ToolChains/Cuda.cpp | 22 --
 clang/test/Driver/cuda-phases.cu | 25 +
 3 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index cecd34acbc92c0..96e6ad77f5e50d 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
   DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadAction::DeviceDependences DDep;
   DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+  // Compiling CUDA in non-RDC mode uses the PTX output if available.
+  for (Action *Input : A->getInputs())
+if (Kind == Action::OFK_Cuda && A->getType() == types::TY_Object &&
+!Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
+  false))
+  DDep.add(*Input, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadActions.push_back(C.MakeAction(DDep, 
A->getType()));
+
   ++TCAndArch;
 }
   }
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 177fd6310e7ee2..196ec29ef1465f 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -503,18 +503,20 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const 
JobAction &JA,
   Exec, CmdArgs, Inputs, Output));
 }
 
-static bool shouldIncludePTX(const ArgList &Args, const char *gpu_arch) {
-  bool includePTX = true;
-  for (Arg *A : Args) {
-if (!(A->getOption().matches(options::OPT_cuda_include_ptx_EQ) ||
-  A->getOption().matches(options::OPT_no_cuda_include_ptx_EQ)))
-  continue;
+static bool shouldIncludePTX(const ArgList &Args, StringRef InputArch) {
+  // The new driver does not include PTX by default.
+  bool includePTX = !Args.hasFlag(options::OPT_offload_new_driver,
+  options::OPT_no_offload_new_driver, false);
+  for (Arg *A : Args.filtered(options::OPT_cuda_include_ptx_EQ,
+  options::OPT_no_cuda_include_ptx_EQ)) {
 A->claim();
 const StringRef ArchStr = A->getValue();
-if (ArchStr == "all" || ArchStr == gpu_arch) {
-  includePTX = A->getOption().matches(options::OPT_cuda_include_ptx_EQ);
-  continue;
-}
+if (A->getOption().matches(options::OPT_cuda_include_ptx_EQ) &&
+(ArchStr == "all" || ArchStr == InputArch))
+  includePTX = true;
+else if (A->getOption().matches(options::OPT_no_cuda_include_ptx_EQ) &&
+ (ArchStr == "all" || ArchStr == InputArch))
+  includePTX = false;
   }
   return includePTX;
 }
diff --git a/clang/test/Driver/cuda-phases.cu b/clang/test/Driver/cuda-phases.cu
index 9a231091de2bdc..a1c3c9b51b1e41 100644
--- a/clang/test/Driver/cuda-phases.cu
+++ b/clang/test/Driver/cuda-phases.cu
@@ -244,31 +244,32 @@
 // NEW-DRIVER-RDC-NEXT: 18: assembler, {17}, object, (host-cuda)
 // NEW-DRIVER-RDC-NEXT: 19: clang-linker-wrapper, {18}, image, (host-cuda)
 
-// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver -fgpu-rdc \
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver \
 // RUN:   --offload-arch=sm_52 --offload-arch=sm_70 %s 2>&1 | FileCheck 
--check-prefix=NEW-DRIVER %s
-//  NEW-DRIVER: 0: input, "[[INPUT:.+]]", cuda
-// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output
-// NEW-DRIVER-NEXT: 2: compiler, {1}, ir
-// NEW-DRIVER-NEXT: 3: input, "[[INPUT]]", cuda, (device-cuda, sm_52)
+//  NEW-DRIVER: 0: input, "[[CUDA:.+]]", cuda, (host-cuda)
+// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
+// NEW-DRIVER-NEXT: 2: compiler, {1}, ir, (host-cuda)
+// NEW-DRIVER-NEXT: 3: input, "[[CUDA]]", cuda, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 5: compiler, {4}, ir, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 6: backend, {5}, assembler, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 7: assembler, {6}, object, (device-cuda, sm_52)
-// NEW-DRIVER-NEXT: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, 
object
-// NEW-DRIVER-NEXT: 9: input, "[[INPUT]]", cuda, (device-cuda, sm_70)
+// NEW-DRIVER-NEXT: 8

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > Should I make `shouldIncludePTX` default to `false` for the new driver?
> 
> Yes, I think that's a better default.

Done, now requires `--cuda-include-ptx=`.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/84367

>From 6d1b32556e53a290903e67ac722c1ad9da876188 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 7 Mar 2024 13:44:50 -0600
Subject: [PATCH] [CUDA] Include PTX in non-RDC mode using the new driver

Summary:
The old driver embed PTX in rdc-mode and so does the `nvcc` compiler.
The new drivers currently does not do this, so we should keep it
consistent in this case. This simply requires adding the assembler
output as an input to the offloading action that gets fed to fatbin.
---
 clang/lib/Driver/Driver.cpp  |  8 
 clang/lib/Driver/ToolChains/Cuda.cpp |  4 +++-
 clang/test/Driver/cuda-phases.cu | 25 +
 3 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index cecd34acbc92c0..96e6ad77f5e50d 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
   DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadAction::DeviceDependences DDep;
   DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+  // Compiling CUDA in non-RDC mode uses the PTX output if available.
+  for (Action *Input : A->getInputs())
+if (Kind == Action::OFK_Cuda && A->getType() == types::TY_Object &&
+!Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
+  false))
+  DDep.add(*Input, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadActions.push_back(C.MakeAction(DDep, 
A->getType()));
+
   ++TCAndArch;
 }
   }
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 177fd6310e7ee2..8053c4089b1a02 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -504,7 +504,9 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const 
JobAction &JA,
 }
 
 static bool shouldIncludePTX(const ArgList &Args, const char *gpu_arch) {
-  bool includePTX = true;
+  // The new driver does not include PTX by default.
+  bool includePTX = !Args.hasFlag(options::OPT_offload_new_driver,
+  options::OPT_no_offload_new_driver, false);
   for (Arg *A : Args) {
 if (!(A->getOption().matches(options::OPT_cuda_include_ptx_EQ) ||
   A->getOption().matches(options::OPT_no_cuda_include_ptx_EQ)))
diff --git a/clang/test/Driver/cuda-phases.cu b/clang/test/Driver/cuda-phases.cu
index 9a231091de2bdc..a1c3c9b51b1e41 100644
--- a/clang/test/Driver/cuda-phases.cu
+++ b/clang/test/Driver/cuda-phases.cu
@@ -244,31 +244,32 @@
 // NEW-DRIVER-RDC-NEXT: 18: assembler, {17}, object, (host-cuda)
 // NEW-DRIVER-RDC-NEXT: 19: clang-linker-wrapper, {18}, image, (host-cuda)
 
-// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver -fgpu-rdc \
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver \
 // RUN:   --offload-arch=sm_52 --offload-arch=sm_70 %s 2>&1 | FileCheck 
--check-prefix=NEW-DRIVER %s
-//  NEW-DRIVER: 0: input, "[[INPUT:.+]]", cuda
-// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output
-// NEW-DRIVER-NEXT: 2: compiler, {1}, ir
-// NEW-DRIVER-NEXT: 3: input, "[[INPUT]]", cuda, (device-cuda, sm_52)
+//  NEW-DRIVER: 0: input, "[[CUDA:.+]]", cuda, (host-cuda)
+// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
+// NEW-DRIVER-NEXT: 2: compiler, {1}, ir, (host-cuda)
+// NEW-DRIVER-NEXT: 3: input, "[[CUDA]]", cuda, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 5: compiler, {4}, ir, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 6: backend, {5}, assembler, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 7: assembler, {6}, object, (device-cuda, sm_52)
-// NEW-DRIVER-NEXT: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, 
object
-// NEW-DRIVER-NEXT: 9: input, "[[INPUT]]", cuda, (device-cuda, sm_70)
+// NEW-DRIVER-NEXT: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, 
"device-cuda (nvptx64-nvidia-cuda:sm_52)" {6}, object
+// NEW-DRIVER-NEXT: 9: input, "[[CUDA]]", cuda, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 10: preprocessor, {9}, cuda-cpp-output, (device-cuda, 
sm_70)
 // NEW-DRIVER-NEXT: 11: compiler, {10}, ir, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 12: backend, {11}, assembler, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 13: assembler, {12}, object, (device-cuda, sm_70)
-// NEW-DRIVER-NEXT: 14: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" 
{13}, object
-// NEW-DRIVER-NEXT: 15: clang-offload-packager, {8, 14}, image
-// NEW-DRIVER-NEXT: 16: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, 
"device-cuda (powerpc64le-ibm-linux-gnu)" {15}, ir
+// NEW-DRIVER-NEXT: 14: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" 
{13}, "

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> Should I make `shouldIncludePTX` default to `false` for the new driver?

Yes, I think that's a better default.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Should I make `shouldIncludePTX` default to `false` for the new driver?

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Joseph Huber via cfe-commits


@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
   DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadAction::DeviceDependences DDep;
   DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+  // Compiling CUDA in non-RDC mode uses the PTX output if available.
+  for (Action *Input : A->getInputs())
+if (Kind == Action::OFK_Cuda && A->getType() == types::TY_Object &&
+!Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,

jhuber6 wrote:

Yeah, I don't have my finger on the pulse of the CUDA users here. I think we 
want this patch to match the current behavior with `--cuda-include-ptx` as it 
seems to make the decision whether or not to include it at job creation time. 
We could then potentially change the default of `--cuda-include-ptx` if that's 
the preferred solution.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Joseph Huber via cfe-commits


@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
   DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadAction::DeviceDependences DDep;
   DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+  // Compiling CUDA in non-RDC mode uses the PTX output if available.

jhuber6 wrote:

So, the current behavior is that it will "always" set the PTX in the job and 
optionally include it in the `fatbinary` job depending on those settings.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits


@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
   DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadAction::DeviceDependences DDep;
   DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+  // Compiling CUDA in non-RDC mode uses the PTX output if available.

Artem-B wrote:

Do we still respect `--cuda-include-ptx=...` ?



https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits


@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
   DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadAction::DeviceDependences DDep;
   DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+  // Compiling CUDA in non-RDC mode uses the PTX output if available.
+  for (Action *Input : A->getInputs())
+if (Kind == Action::OFK_Cuda && A->getType() == types::TY_Object &&
+!Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,

Artem-B wrote:

I'm not quite sure why we would need to include PTX for RDC compilation.

In retrospect, including PTX by default with all compilations turned out to be 
a wrong default choice.
It's just a waste of space for most of the users, and it allows problems to go 
unnoticed for longer than they should (e.g. something was compiled for a wrong 
GPU).

Switching to the new driver is a good point to make a better choice. I would 
argue that we should not be including PTX by default or, if we do deem that it 
may be useful, only add it for the most recent chosen GPU variant, to provide 
some forward compatibility, not for all of them.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-driver

Author: Joseph Huber (jhuber6)


Changes

Summary:
The old driver embed PTX in rdc-mode and so does the `nvcc` compiler.
The new drivers currently does not do this, so we should keep it
consistent in this case. This simply requires adding the assembler
output as an input to the offloading action that gets fed to fatbin.


---
Full diff: https://github.com/llvm/llvm-project/pull/84367.diff


2 Files Affected:

- (modified) clang/lib/Driver/Driver.cpp (+8) 
- (modified) clang/test/Driver/cuda-phases.cu (+13-12) 


``diff
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index cecd34acbc92c0..96e6ad77f5e50d 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
   DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadAction::DeviceDependences DDep;
   DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+  // Compiling CUDA in non-RDC mode uses the PTX output if available.
+  for (Action *Input : A->getInputs())
+if (Kind == Action::OFK_Cuda && A->getType() == types::TY_Object &&
+!Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
+  false))
+  DDep.add(*Input, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadActions.push_back(C.MakeAction(DDep, 
A->getType()));
+
   ++TCAndArch;
 }
   }
diff --git a/clang/test/Driver/cuda-phases.cu b/clang/test/Driver/cuda-phases.cu
index 9a231091de2bdc..a1c3c9b51b1e41 100644
--- a/clang/test/Driver/cuda-phases.cu
+++ b/clang/test/Driver/cuda-phases.cu
@@ -244,31 +244,32 @@
 // NEW-DRIVER-RDC-NEXT: 18: assembler, {17}, object, (host-cuda)
 // NEW-DRIVER-RDC-NEXT: 19: clang-linker-wrapper, {18}, image, (host-cuda)
 
-// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver -fgpu-rdc \
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver \
 // RUN:   --offload-arch=sm_52 --offload-arch=sm_70 %s 2>&1 | FileCheck 
--check-prefix=NEW-DRIVER %s
-//  NEW-DRIVER: 0: input, "[[INPUT:.+]]", cuda
-// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output
-// NEW-DRIVER-NEXT: 2: compiler, {1}, ir
-// NEW-DRIVER-NEXT: 3: input, "[[INPUT]]", cuda, (device-cuda, sm_52)
+//  NEW-DRIVER: 0: input, "[[CUDA:.+]]", cuda, (host-cuda)
+// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
+// NEW-DRIVER-NEXT: 2: compiler, {1}, ir, (host-cuda)
+// NEW-DRIVER-NEXT: 3: input, "[[CUDA]]", cuda, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 5: compiler, {4}, ir, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 6: backend, {5}, assembler, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 7: assembler, {6}, object, (device-cuda, sm_52)
-// NEW-DRIVER-NEXT: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, 
object
-// NEW-DRIVER-NEXT: 9: input, "[[INPUT]]", cuda, (device-cuda, sm_70)
+// NEW-DRIVER-NEXT: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, 
"device-cuda (nvptx64-nvidia-cuda:sm_52)" {6}, object
+// NEW-DRIVER-NEXT: 9: input, "[[CUDA]]", cuda, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 10: preprocessor, {9}, cuda-cpp-output, (device-cuda, 
sm_70)
 // NEW-DRIVER-NEXT: 11: compiler, {10}, ir, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 12: backend, {11}, assembler, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 13: assembler, {12}, object, (device-cuda, sm_70)
-// NEW-DRIVER-NEXT: 14: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" 
{13}, object
-// NEW-DRIVER-NEXT: 15: clang-offload-packager, {8, 14}, image
-// NEW-DRIVER-NEXT: 16: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, 
"device-cuda (powerpc64le-ibm-linux-gnu)" {15}, ir
+// NEW-DRIVER-NEXT: 14: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" 
{13}, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {12}, object
+// NEW-DRIVER-NEXT: 15: linker, {8, 14}, cuda-fatbin, (device-cuda)
+// NEW-DRIVER-NEXT: 16: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, 
"device-cuda (nvptx64-nvidia-cuda)" {15}, ir
 // NEW-DRIVER-NEXT: 17: backend, {16}, assembler, (host-cuda)
 // NEW-DRIVER-NEXT: 18: assembler, {17}, object, (host-cuda)
 // NEW-DRIVER-NEXT: 19: clang-linker-wrapper, {18}, image, (host-cuda)
 
 // RUN: %clang -### --target=powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver \
 // RUN:   --offload-arch=sm_52 --offload-arch=sm_70 %s %S/Inputs/empty.cpp 
2>&1 | FileCheck --check-prefix=NON-CUDA-INPUT %s
+
 //  NON-CUDA-INPUT: 0: input, "[[CUDA:.+]]", cuda, (host-cuda)
 // NON-CUDA-INPUT-NEXT: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
 // NON-CUDA-INPUT-NEXT: 2: compiler, {1}, ir, (host-cuda)
@@ -277,13 +278,13 @@
 // NON-CUDA-INPUT-NEXT: 5: compiler, {4}, ir, (device-cuda, sm_52)
 // NON-CUDA-INPUT-NEXT: 6: bac

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/84367

Summary:
The old driver embed PTX in rdc-mode and so does the `nvcc` compiler.
The new drivers currently does not do this, so we should keep it
consistent in this case. This simply requires adding the assembler
output as an input to the offloading action that gets fed to fatbin.


>From fdd40f727525b0d61382af51dd5d59d9b4d7999c Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 7 Mar 2024 13:44:50 -0600
Subject: [PATCH] [CUDA] Include PTX in non-RDC mode using the new driver

Summary:
The old driver embed PTX in rdc-mode and so does the `nvcc` compiler.
The new drivers currently does not do this, so we should keep it
consistent in this case. This simply requires adding the assembler
output as an input to the offloading action that gets fed to fatbin.
---
 clang/lib/Driver/Driver.cpp  |  8 
 clang/test/Driver/cuda-phases.cu | 25 +
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index cecd34acbc92c0..96e6ad77f5e50d 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
   DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadAction::DeviceDependences DDep;
   DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+  // Compiling CUDA in non-RDC mode uses the PTX output if available.
+  for (Action *Input : A->getInputs())
+if (Kind == Action::OFK_Cuda && A->getType() == types::TY_Object &&
+!Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
+  false))
+  DDep.add(*Input, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadActions.push_back(C.MakeAction(DDep, 
A->getType()));
+
   ++TCAndArch;
 }
   }
diff --git a/clang/test/Driver/cuda-phases.cu b/clang/test/Driver/cuda-phases.cu
index 9a231091de2bdc..a1c3c9b51b1e41 100644
--- a/clang/test/Driver/cuda-phases.cu
+++ b/clang/test/Driver/cuda-phases.cu
@@ -244,31 +244,32 @@
 // NEW-DRIVER-RDC-NEXT: 18: assembler, {17}, object, (host-cuda)
 // NEW-DRIVER-RDC-NEXT: 19: clang-linker-wrapper, {18}, image, (host-cuda)
 
-// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver -fgpu-rdc \
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver \
 // RUN:   --offload-arch=sm_52 --offload-arch=sm_70 %s 2>&1 | FileCheck 
--check-prefix=NEW-DRIVER %s
-//  NEW-DRIVER: 0: input, "[[INPUT:.+]]", cuda
-// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output
-// NEW-DRIVER-NEXT: 2: compiler, {1}, ir
-// NEW-DRIVER-NEXT: 3: input, "[[INPUT]]", cuda, (device-cuda, sm_52)
+//  NEW-DRIVER: 0: input, "[[CUDA:.+]]", cuda, (host-cuda)
+// NEW-DRIVER-NEXT: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
+// NEW-DRIVER-NEXT: 2: compiler, {1}, ir, (host-cuda)
+// NEW-DRIVER-NEXT: 3: input, "[[CUDA]]", cuda, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 5: compiler, {4}, ir, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 6: backend, {5}, assembler, (device-cuda, sm_52)
 // NEW-DRIVER-NEXT: 7: assembler, {6}, object, (device-cuda, sm_52)
-// NEW-DRIVER-NEXT: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, 
object
-// NEW-DRIVER-NEXT: 9: input, "[[INPUT]]", cuda, (device-cuda, sm_70)
+// NEW-DRIVER-NEXT: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_52)" {7}, 
"device-cuda (nvptx64-nvidia-cuda:sm_52)" {6}, object
+// NEW-DRIVER-NEXT: 9: input, "[[CUDA]]", cuda, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 10: preprocessor, {9}, cuda-cpp-output, (device-cuda, 
sm_70)
 // NEW-DRIVER-NEXT: 11: compiler, {10}, ir, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 12: backend, {11}, assembler, (device-cuda, sm_70)
 // NEW-DRIVER-NEXT: 13: assembler, {12}, object, (device-cuda, sm_70)
-// NEW-DRIVER-NEXT: 14: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" 
{13}, object
-// NEW-DRIVER-NEXT: 15: clang-offload-packager, {8, 14}, image
-// NEW-DRIVER-NEXT: 16: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, 
"device-cuda (powerpc64le-ibm-linux-gnu)" {15}, ir
+// NEW-DRIVER-NEXT: 14: offload, "device-cuda (nvptx64-nvidia-cuda:sm_70)" 
{13}, "device-cuda (nvptx64-nvidia-cuda:sm_70)" {12}, object
+// NEW-DRIVER-NEXT: 15: linker, {8, 14}, cuda-fatbin, (device-cuda)
+// NEW-DRIVER-NEXT: 16: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, 
"device-cuda (nvptx64-nvidia-cuda)" {15}, ir
 // NEW-DRIVER-NEXT: 17: backend, {16}, assembler, (host-cuda)
 // NEW-DRIVER-NEXT: 18: assembler, {17}, object, (host-cuda)
 // NEW-DRIVER-NEXT: 19: clang-linker-wrapper, {18}, image, (host-cuda)
 
 // RUN: %clang -### --target=powerpc64le-ibm-linux-gnu -ccc-print-phases 
--offload-new-driver \
 // RUN:   --offload