[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-24 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGd50dacd7c3c2: [Clang] Only emit textual LLVM-IR in device 
only mode (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

Files:
  clang/lib/Driver/Driver.cpp
  clang/test/Driver/openmp-offload-gpu.c


Index: clang/test/Driver/openmp-offload-gpu.c
===
--- clang/test/Driver/openmp-offload-gpu.c
+++ clang/test/Driver/openmp-offload-gpu.c
@@ -304,9 +304,16 @@
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "clang", inputs: 
["[[INPUT_IR:.+]]"], output: "[[OBJECT:.+]]"
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[OBJECT]]"], output: "a.out"
 
-// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S 
-fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 | FileCheck 
%s --check-prefix=CHECK-EMIT-LLVM-IR
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S 
-fopenmp=libomp --offload-device-only \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
 // CHECK-EMIT-LLVM-IR: "-cc1"{{.*}}"-triple" 
"nvptx64-nvidia-cuda"{{.*}}"-emit-llvm"
 
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S 
-fopenmp=libomp \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR-BC
+// CHECK-EMIT-LLVM-IR-BC: "-cc1"{{.*}}"-triple" 
"nvptx64-nvidia-cuda"{{.*}}"-emit-llvm-bc"
+
 // RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvida-cuda -march=sm_70 \
 // RUN:  
--libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-new-nvptx-test.bc
 \
 // RUN:  -nogpulib %s -o openmp-offload-gpu 2>&1 \
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -4634,7 +4634,14 @@
false) ||
   TargetDeviceOffloadKind == Action::OFK_OpenMP))) {
   types::ID Output =
-  Args.hasArg(options::OPT_S) ? types::TY_LLVM_IR : types::TY_LLVM_BC;
+  Args.hasArg(options::OPT_S) &&
+  (TargetDeviceOffloadKind == Action::OFK_None ||
+   offloadDeviceOnly() ||
+   (TargetDeviceOffloadKind == Action::OFK_HIP &&
+!Args.hasFlag(options::OPT_offload_new_driver,
+  options::OPT_no_offload_new_driver, false)))
+  ? types::TY_LLVM_IR
+  : types::TY_LLVM_BC;
   return C.MakeAction(Input, Output);
 }
 return C.MakeAction(Input, types::TY_PP_Asm);


Index: clang/test/Driver/openmp-offload-gpu.c
===
--- clang/test/Driver/openmp-offload-gpu.c
+++ clang/test/Driver/openmp-offload-gpu.c
@@ -304,9 +304,16 @@
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT_IR:.+]]"], output: "[[OBJECT:.+]]"
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[OBJECT]]"], output: "a.out"
 
-// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp=libomp --offload-device-only \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
 // CHECK-EMIT-LLVM-IR: "-cc1"{{.*}}"-triple" "nvptx64-nvidia-cuda"{{.*}}"-emit-llvm"
 
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp=libomp \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR-BC
+// CHECK-EMIT-LLVM-IR-BC: "-cc1"{{.*}}"-triple" "nvptx64-nvidia-cuda"{{.*}}"-emit-llvm-bc"
+
 // RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvida-cuda -march=sm_70 \
 // RUN:  --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-new-nvptx-test.bc \
 // RUN:  -nogpulib %s -o openmp-offload-gpu 2>&1 \
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -4634,7 

[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision.
tra added a comment.
This revision is now accepted and ready to land.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-24 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 491866.
jhuber6 added a comment.

Exempting HIP using the old driver.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

Files:
  clang/lib/Driver/Driver.cpp
  clang/test/Driver/openmp-offload-gpu.c


Index: clang/test/Driver/openmp-offload-gpu.c
===
--- clang/test/Driver/openmp-offload-gpu.c
+++ clang/test/Driver/openmp-offload-gpu.c
@@ -304,9 +304,16 @@
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "clang", inputs: 
["[[INPUT_IR:.+]]"], output: "[[OBJECT:.+]]"
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[OBJECT]]"], output: "a.out"
 
-// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S 
-fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 | FileCheck 
%s --check-prefix=CHECK-EMIT-LLVM-IR
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S 
-fopenmp=libomp --offload-device-only \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
 // CHECK-EMIT-LLVM-IR: "-cc1"{{.*}}"-triple" 
"nvptx64-nvidia-cuda"{{.*}}"-emit-llvm"
 
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S 
-fopenmp=libomp \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR-BC
+// CHECK-EMIT-LLVM-IR-BC: "-cc1"{{.*}}"-triple" 
"nvptx64-nvidia-cuda"{{.*}}"-emit-llvm-bc"
+
 // RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvida-cuda -march=sm_70 \
 // RUN:  
--libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-new-nvptx-test.bc
 \
 // RUN:  -nogpulib %s -o openmp-offload-gpu 2>&1 \
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -4634,7 +4634,14 @@
false) ||
   TargetDeviceOffloadKind == Action::OFK_OpenMP))) {
   types::ID Output =
-  Args.hasArg(options::OPT_S) ? types::TY_LLVM_IR : types::TY_LLVM_BC;
+  Args.hasArg(options::OPT_S) &&
+  (TargetDeviceOffloadKind == Action::OFK_None ||
+   offloadDeviceOnly() ||
+   (TargetDeviceOffloadKind == Action::OFK_HIP &&
+!Args.hasFlag(options::OPT_offload_new_driver,
+  options::OPT_no_offload_new_driver, false)))
+  ? types::TY_LLVM_IR
+  : types::TY_LLVM_BC;
   return C.MakeAction(Input, Output);
 }
 return C.MakeAction(Input, types::TY_PP_Asm);


Index: clang/test/Driver/openmp-offload-gpu.c
===
--- clang/test/Driver/openmp-offload-gpu.c
+++ clang/test/Driver/openmp-offload-gpu.c
@@ -304,9 +304,16 @@
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT_IR:.+]]"], output: "[[OBJECT:.+]]"
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[OBJECT]]"], output: "a.out"
 
-// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp=libomp --offload-device-only \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
 // CHECK-EMIT-LLVM-IR: "-cc1"{{.*}}"-triple" "nvptx64-nvidia-cuda"{{.*}}"-emit-llvm"
 
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp=libomp \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR-BC
+// CHECK-EMIT-LLVM-IR-BC: "-cc1"{{.*}}"-triple" "nvptx64-nvidia-cuda"{{.*}}"-emit-llvm-bc"
+
 // RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvida-cuda -march=sm_70 \
 // RUN:  --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-new-nvptx-test.bc \
 // RUN:  -nogpulib %s -o openmp-offload-gpu 2>&1 \
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -4634,7 +4634,14 @@
false) ||
   TargetDeviceOffloadKind == Action::OFK_OpenMP))) {
   types::ID Output =
- 

[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-24 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141717#4074842 , @yaxunl wrote:

> Can we keep the original behaviour for the old driver for HIP? Only enable 
> the change for the new driver.

That's probably fair because AFAIK that will still use the 
`clang-offload-bundler` which shows both of them at the same time side-by-side. 
I'll need to check the behavior there.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-23 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

Can we keep the original behaviour for the older driver for HIP? Only enable 
the change for the new driver.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-23 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added a comment.

In D141717#4056971 , @yaxunl wrote:

> The intention of -emit-llvm -S is usually to get LLVM assembly for all 
> targets for inspection or modification. HIP emits a bundled LLVM assembly in 
> textual format in this case. Users can modify it directly, or extract 
> assembly for each device and bundle them together again. The modified file 
> can then be passed on to the next compilation stage.

If you already unboundle, manually or automatically, you can also run llvm-dis. 
Embedded text is really not that helpful, IMHO.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-23 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

Ping, is it possible to resolve this before the fork tomorrow?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141717#4056971 , @yaxunl wrote:

> The intention of -emit-llvm -S is usually to get LLVM assembly for all 
> targets for inspection or modification. HIP emits a bundled LLVM assembly in 
> textual format in this case. Users can modify it directly, or extract 
> assembly for each device and bundle them together again. The modified file 
> can then be passed on to the next compilation stage.

Yeah, this is part of where I'm questioning how we want to treat the 
compilation pipeline when we perform offloading. When I set up the new driver 
support I decided to eschew the idea of perfectly bundled phases and instead 
decided that if users did not want the full pipeline they should use 
`--offload-device-only` and built it for themselves. In that vein I'd prefer it 
to be where we always generate a full (To Assembler) phase and avoid textual 
formats since it's supposed to be embedded as a binary blob.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-16 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

The intention of -emit-llvm -S is usually to get LLVM assembly for all targets 
for inspection or modification. HIP emits a bundled LLVM assembly in textual 
format in this case. Users can modify it directly, or extract assembly for each 
device and bundle them together again. The modified file can then be passed on 
to the next compilation stage.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

I made the phases always go to `Assemble` but it didn't make a difference. We 
still get the textual IR here without the exception I added.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141717#4053164 , @tra wrote:

>> So are you suggesting that we complete the whole pipeline? So -S -emit-llvm 
>> gives host IR, but the device will go all the way to object?
>
> That would match my expectations and would solve the " it can't be handled by 
> LTO or anything else." issue you've highlighted above. It will give us a 
> valid output in the form user specified. I.e. you will be able to pass the IR 
> further down the compilation pipeline. I.e. if we get it for the host 
> compilation it should be the same IR we'd have captured if we we'd get it 
> from `-save-temps`.

Might work, I'll see how difficult it is to override the phases.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

> So are you suggesting that we complete the whole pipeline? So -S -emit-llvm 
> gives host IR, but the device will go all the way to object?

That would match my expectations and would solve the " it can't be handled by 
LTO or anything else." issue you've highlighted above. It will give us a valid 
output in the form user specified. I.e. you will be able to pass the IR further 
down the compilation pipeline. I.e. if we get it for the host compilation it 
should be the same IR we'd have captured if we we'd get it from `-save-temps`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141717#4052986 , @tra wrote:

> In D141717#4052824 , @jhuber6 wrote:
>
>> For `-E` we don't embed anything,
>
> That was just an exaggerated example of top-level options affecting 
> sub-compilation output where you can't magically tweak it to produce the kind 
> of output your sub-compilation needs.
>
> The fundamental problem I have is that compiler should not magically fix what 
> the user has specified.  "-S -emit-llvm" has very specific meaning and it's 
> not "produce binary IR".  However, when we're dealing with 'interesting' 
> compilation pipelines like CUDA, things get complicated, as in your example. 
> Presumably the user wants the compiler to produce the textual IR for the 
> "top-level" compilation. In the case of CUDA it would be host-side IR (with 
> embedded GPU binary, which should still be a *binary*). The fact that in your 
> case that binary happens to be binary LLVM IR is an implementation detail. 
> Without the new driver it should be the real GPU fatbin. Hence my argument 
> that what we want is to avoid passing "-S -emit-llvm" (and potentially other 
> output-altering options) to the GPU sub-compilation, which would work 
> regardless of whether the GPU binary is a fatbin or LLVM bitcode.
> For the compilation with --cuda-device-only, GPU compilation is the top-level 
> compilation and it will get all the right options to make it produce textual 
> IR and stop after that.
>
> Does it make sense?

So are you suggesting that we complete the whole pipeline? So `-S -emit-llvm` 
gives host IR, but the device will go all the way to object?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D141717#4052824 , @jhuber6 wrote:

> For `-E` we don't embed anything,

That was just an exaggerated example of top-level options affecting 
sub-compilation output where you can't magically tweak it to produce the kind 
of output your sub-compilation needs.

The fundamental problem I have is that compiler should not magically fix what 
the user has specified.  "-S -emit-llvm" has very specific meaning and it's not 
"produce binary IR".  However, when we're dealing with 'interesting' 
compilation pipelines like CUDA, things get complicated, as in your example. 
Presumably the user wants the compiler to produce the textual IR for the 
"top-level" compilation. In the case of CUDA it would be host-side IR (with 
embedded GPU binary, which should still be a *binary*). The fact that in your 
case that binary happens to be binary LLVM IR is an implementation detail. 
Without the new driver it should be the real GPU fatbin. Hence my argument that 
what we want is to avoid passing "-S -emit-llvm" (and potentially other 
output-altering options) to the GPU sub-compilation, which would work 
regardless of whether the GPU binary is a fatbin or LLVM bitcode.
For the compilation with --cuda-device-only, GPU compilation is the top-level 
compilation and it will get all the right options to make it produce textual IR 
and stop after that.

Does it make sense?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141717#4052753 , @tra wrote:

> In D141717#4052587 , @jhuber6 wrote:
>
>> Well you'll get textual output for the host output, but the device code 
>> embedded in the host module will be bitcode instead. So the final output 
>> from the compiler is still textual IR. It just won't be some weird global 
>> like this
>>
>> This is bad because it can't be handled by LTO or anything else. It makes 
>> the resulting IR file difficult to use for its intended purpose.
>
> I understand your use case and I do agree that you do need the embedded IR to 
> be binary.
>
> It appears to me that the real problem here is that "-S"  should not have 
> been propagated to the GPU sub-compilation.
>
> The fact that producing binary IR works for you seems to be just a 
> coincidence. What if the user would specify "-E" instead? I think the right 
> fix would be to tell compiler to generate the right kind of output, not try 
> to tell it to produce textual assembly and undo the "textual" part somewhere 
> downstream.

For `-E` we don't embed anything, so I think we just ignore the device portion 
unless they used `--offload-device-only`. And I don't think we have access to 
any "per-toolchain" options at this point in the compilation phase.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D141717#4052587 , @jhuber6 wrote:

> Well you'll get textual output for the host output, but the device code 
> embedded in the host module will be bitcode instead. So the final output from 
> the compiler is still textual IR. It just won't be some weird global like this
>
> This is bad because it can't be handled by LTO or anything else. It makes the 
> resulting IR file difficult to use for its intended purpose.

I understand your use case and I do agree that you do need the embedded IR to 
be binary.

It appears to me that the real problem here is that "-S"  should not have been 
propagated to the GPU sub-compilation.

The fact that producing binary IR works for you seems to be just a coincidence. 
What if the user would specify "-E" instead? I think the right fix would be to 
tell compiler to generate the right kind of output, not try to tell it to 
produce textual assembly and undo the "textual" part somewhere downstream.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141717#4052514 , @tra wrote:

> Textual output for "-S -emit-llvm" is the canonical behavior, so I would 
> prefer it working that way in as many cases as possible and only override it 
> when necessary.
>
> Would it be possible to enforce binary IR generation in cases you need it? Or 
> to prove that this is equivalent to what the patch does now?

Well you'll get textual output for the host section, but the device code 
embedded in the host module will be bitcode instead. So the final output from 
the compiler is still textual IR. It just won't be some weird global like this

  @llvm.embedded.object = private constant [138032 x i8] 
c"\10\FF\10\AD\01\00\00\000\1B\02\00\00\00\00\00 
\00\00\00\00\00\00\00(\00\00\00\00\00\00\00\02\00\01\00\00\00\00\00H\00\00\00\00\00\00\00\02\00\00\00\0
  
0\00\00\00\90\00\00\00\00\00\00\00\9D\1A\02\00\00\00\00\00n\00\00\00\00\00\00\00u\00\00\00\00\00\00\00i\00\00\00\00\00\00\00\87\00\00\00\00\00\00\00\00arch\00triple\00amdgcn-amd-amdhsa\00gfx90a\00\00\00;
 Mod  uleID = 'tl2.c'\0Asource_filename = \22tl2.c\22\0Atarget datalayout = 
\22e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-
  v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7\22\0Atarget triple = 
\22amdgcn-amd-amdhsa\22\0A\0A%struct.ident_t = type { i32, i32, i32, i32, ptr 
}\0A%struct.DeviceEnvironmentTy = type { i32, i32, i32, i32 }\0A  
%\22struct.ompx::state::TeamStateTy\22 = type { 
%\22struct.ompx::state::ICVStateTy\22, i32, i32, ptr 
}\0A%\22struct.ompx::state::ICVStateTy\22 = type { i32, i32, i32, i32, i32, i32 
}\0A%\22struct.(anonymous   namespace)::SharedMemorySmartStackTy\22 = type { 
[512 x i8], [1024 x i8] }\0A%\22struct.ompx::state::ThreadStateTy\22 = type { 
%\22struct.ompx::state::ICVStateTy\22, ptr 
}\0A\0A@__omp_rtl_assume_teams_oversu  bscription = weak_odr hidden 
addrspace(1) constant i32 0\0A@__omp_rtl_assume_threads_oversubscription = 
weak_odr hidden addrspace(1) constant i32 0\0A@0 = private unnamed_addr 
constant [23 x i8] c\22;unknown  ;unknown;0;0;;\\00\22, align 1\0A@1 = private 
unnamed_addr addrspace(1) constant %struct.ident_t { i32 0, i32 2, i32 0, i32 
22, ptr @0 }, align 8\0A@__omp_offloading_16_5d6227e_thread_limit_l2_exec_mode 
= we  ak protected addrspace(1) constant i8 
1\0A@__omp_offloading_16_5d6227e_thread_limit_l5_exec_mode = weak protected 
addrspace(1) constant i8 
1\0A@__omp_offloading_16_5d6227e_thread_limit_l8_exec_mode = weak pr  otected 
addrspace(1) constant i8 1\0...@llvm.compiler.used = appending addrspace(1) 
global [3 x ptr] [ptr addrspacecast (ptr addrspace(1) 
@__omp_offloading_16_5d6227e_thread_limit_l2_exec_mode to ptr), ptr add  
rspacecast (ptr addrspace(1) @__omp_offloading_16_5d6227e_

This is bad because it can't be handled by LTO or anything else. It makes the 
resulting IR file difficult to use for its intended purpose.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

Textual output for "-S -emit-llvm" is the canonical behavior, so I would prefer 
it working that way in as many cases as possible and only override it when 
necessary.

Would it be possible to enforce binary IR generation in cases you need it? Or 
to prove that this is equivalent to what the patch does now?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141717/new/

https://reviews.llvm.org/D141717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141717: [Clang] Only emit textual LLVM-IR in device only mode

2023-01-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: jdoerfert, tianshilei1992, tra, yaxunl, 
JonChesterfield.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, sstefan1, MaskRay.
Herald added a project: clang.

Currently, we embed device code into the host to perform
multi-architecture linking and handling of device code. If the user
specified `-S -emit-llvm` then the embedded output will be textual
LLVM-IR. This is a problem because it can't be used by the LTO backend
and it makes reading the file confusing.

This patch changes the behaviour to only emit textual device IR if we
are in device only mode, that is, if the device code is presented
directly to the user instead of being embedded. Otherwise we should
always embed device bitcode instead.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D141717

Files:
  clang/lib/Driver/Driver.cpp
  clang/test/Driver/openmp-offload-gpu.c


Index: clang/test/Driver/openmp-offload-gpu.c
===
--- clang/test/Driver/openmp-offload-gpu.c
+++ clang/test/Driver/openmp-offload-gpu.c
@@ -304,9 +304,16 @@
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "clang", inputs: 
["[[INPUT_IR:.+]]"], output: "[[OBJECT:.+]]"
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[OBJECT]]"], output: "a.out"
 
-// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S 
-fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 | FileCheck 
%s --check-prefix=CHECK-EMIT-LLVM-IR
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S 
-fopenmp=libomp --offload-device-only \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
 // CHECK-EMIT-LLVM-IR: "-cc1"{{.*}}"-triple" 
"nvptx64-nvidia-cuda"{{.*}}"-emit-llvm"
 
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S 
-fopenmp=libomp \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR-BC
+// CHECK-EMIT-LLVM-IR-BC: "-cc1"{{.*}}"-triple" 
"nvptx64-nvidia-cuda"{{.*}}"-emit-llvm-bc"
+
 // RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvida-cuda -march=sm_70 \
 // RUN:  
--libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-new-nvptx-test.bc
 \
 // RUN:  -nogpulib %s -o openmp-offload-gpu 2>&1 \
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -4633,8 +4633,11 @@
  (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
false) ||
   TargetDeviceOffloadKind == Action::OFK_OpenMP))) {
-  types::ID Output =
-  Args.hasArg(options::OPT_S) ? types::TY_LLVM_IR : types::TY_LLVM_BC;
+  types::ID Output = Args.hasArg(options::OPT_S) &&
+ (TargetDeviceOffloadKind == Action::OFK_None 
||
+  offloadDeviceOnly())
+ ? types::TY_LLVM_IR
+ : types::TY_LLVM_BC;
   return C.MakeAction(Input, Output);
 }
 return C.MakeAction(Input, types::TY_PP_Asm);


Index: clang/test/Driver/openmp-offload-gpu.c
===
--- clang/test/Driver/openmp-offload-gpu.c
+++ clang/test/Driver/openmp-offload-gpu.c
@@ -304,9 +304,16 @@
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT_IR:.+]]"], output: "[[OBJECT:.+]]"
 // CHECK-IR: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[OBJECT]]"], output: "a.out"
 
-// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp=libomp --offload-device-only \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
 // CHECK-EMIT-LLVM-IR: "-cc1"{{.*}}"-triple" "nvptx64-nvidia-cuda"{{.*}}"-emit-llvm"
 
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp=libomp \
+// RUN: -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR-BC
+// CHECK-EMIT-LLVM-IR-BC: "-cc1"{{.*}}"-triple" "nvptx64-nvidia-cuda"{{.*}}"-emit-llvm-bc"
+
 // RUN:   %clang -###