[PATCH] D60620: [HIP] Support -offloading-target-id

2019-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: arsenm.
tra added a comment.

@arsenm Matt, FYI, this patch seems to be a continuation of D59863 
 you've commented on.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support -offloading-target-id

2019-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: echristo.
tra added a comment.

It looks like you are solving two problems here.
a) you want to create multiple device passes for the same GPU, but with 
different options.
b) you may want to pass different compiler options to different device 
compilations.
The patch effectively hard-codes {gpu, options} tuple into 
--offloading-target-id variant.
Is that correct?

This looks essentially the same as your previous patch D59863 
.

We have a limited way to deal with (b), but there's currently no way to deal 
with (a).

For (a), I think, the real problem is that until now we've assumed that there's 
only one device-side compilation per target GPU arch. If we need multiple 
device-side compilations, we need a way to name them.  Using 
`offloading-target-id` as  a super-set of `--cuda-gpu-arch` is OK with me. 
However, I'm on the fence about the option serving a double-duty of setting 
magic compiler flags. On one hand, that's what driver does, so it may be OK. On 
the other hand, it's unnecessarily strict. I.e. if we provide ability to create 
multiple compilation passes for the same GPU arch, why limit that to only 
changing those hard-coded options? A general approach would allow a way to 
create more than one device-side compilation and provide arbitrary compiler 
options only to *that* compilation. Thiw will also help solving number of 
issues we have right now when some host-side compilation options break 
device-side compilation and we have to work around that by filtering out some 
of them in the driver.

A while back @echristo and I have discussed how it could be handled in a more 
generic way.
IIRC we ended up with a strawman proposal that looked roughly like this:

Currently we have rudimentary -Xarch_smXX options implemented for various 
toolchains in the driver.
E.g. for HIP: 
https://github.com/llvm-mirror/clang/blob/master/lib/Driver/ToolChains/HIP.cpp#L341
We want to generalize it and make it less awkward to use. One way to do it 
would be to introduce a `-Xarch TARGET` flag where the option(s) following the 
flag would apply only to the compilation for that particular target. `TARGET` 
could have special values like `HOST` and `DEVICE` and `ALL` which would widen 
the option scope to host/device/all compilation. The `-Xarch` flag could be 
either sticky (all following options are affected by it, until the next -Xarch 
option) or only affect one option (the way -X options work now). Make option 
parser aware of the current compilation target, and it should be fairly 
straightforward to control compilation options for particular target.

We could add `--offloading-target-id=X` to create and name a device-side 
compilation and then use that name in `-Xarch X` or `-Xtarget X` to pass 
appropriate options.
`--cuda-gpu-arch=GPU` would be treated as `--offloading-target-id=GPU -mcpu 
GPU`. If we had something like that, then your goal could be achieved with 
something like this:

  ... --offloading-target-id=foo -Xtarget foo -mcpu gfx906 
  ... --offloading-target-id=bar -Xtarget bar -mcpu gfx906 -mxnack -msram-ecc

We could also provide target aliases for the 'standard' offloading targets, so 
users do not have to type *all* options specific to the target, but would still 
have a way to override them.

This would also give us a flexible way to avoid passing some host-only options 
to device-side compilation without having to hard code every special case.

That may be a somewhat larger chunk of work.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support -offloading-target-id

2019-04-12 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl created this revision.
yaxunl added reviewers: tra, b-sumner, ashi1, scchan, t-tye.
Herald added a subscriber: mgorny.

This patch introduces a new option -offloading-target-id for HIP.

Offloading target id is a generalization of CUDA/HIP GPU arch.
It is a device name plus optional feature strings delimited by
plus sign, e.g. gfx906+xnack+sram-ecc. GPU arch is the degenerated
case of offloading target id where there is no feature string. For
each device name, there is a limited number of predefined feature
strings which are allowed to show up in offloading target it. When
feature strings show up in offloading target id, they must follow
predefined order. Therefore offloading target id is a unique id
to convey device name and enabled target feature.

For each offloading target id, a device compilation will be performed
by the driver. If the device compilation results in a device object,
the offloading target id is used in the fat binary to uniquely identify
the device object. This is to allow runtime to load the device
object suitable for the device configuration.

This patches changes HIP action builder to handle -offloading-target-id
option. It generalizes GPUArchList in CUDA/HIP action builder so that
it can handle both CUDA GPU arch and HIP offloading target id. It changes
HIP toolchain to handle offloading target id as bound arch.

This patch is NFC for CUDA toolchain.


https://reviews.llvm.org/D60620

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Basic/HIP.h
  include/clang/Driver/Options.td
  lib/Basic/CMakeLists.txt
  lib/Basic/HIP.cpp
  lib/Driver/Driver.cpp
  lib/Driver/ToolChains/HIP.cpp
  test/Driver/hip-invalid-offloading-target-id.hip
  test/Driver/hip-offloading-target-id.hip

Index: test/Driver/hip-offloading-target-id.hip
===
--- /dev/null
+++ test/Driver/hip-offloading-target-id.hip
@@ -0,0 +1,55 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offloading-target-id=gfx906 \
+// RUN:   --offloading-target-id=gfx906+xnack \
+// RUN:   --offloading-target-id=gfx906+xnack+sram-ecc \
+// RUN:   --hip-device-lib-path=%S/Inputs/hip_dev_lib \
+// RUN:   %s 2>&1 | FileCheck %s
+
+// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
+// CHECK-SAME: "-emit-llvm-bc"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx906" "-mno-xnack" "-mno-sram-ecc"
+
+// CHECK: [[OPT:".*opt"]] {{".*-gfx906-linked.*bc"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx906"
+// CHECK-SAME: "-o" [[OPT_906_BC:".*-gfx906-optimized.*bc"]]
+
+// CHECK: [[LLC: ".*llc"]] [[OPT_906_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa" "-filetype=obj"
+// CHECK-SAME: {{.*}} "-mcpu=gfx906"
+// CHECK-SAME: "-o" {{".*-gfx906-.*o"}}
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
+// CHECK-SAME: "-emit-llvm-bc"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx906" "-mxnack" "-mno-sram-ecc"
+
+// CHECK: [[OPT]] {{".*-gfx906\+xnack.*bc"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx906"
+// CHECK-SAME: "-o" [[OPT_906XN_BC:".*-gfx906\+xnack.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XN_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa" "-filetype=obj"
+// CHECK-SAME: {{.*}} "-mcpu=gfx906"
+// CHECK-SAME: "-o" {{".*-gfx906\+xnack.*o"}}
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
+// CHECK-SAME: "-emit-llvm-bc"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx906" "-mxnack" "-msram-ecc"
+
+// CHECK: [[OPT]] {{".*-gfx906\+xnack\+sram-ecc.*bc"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx906"
+// CHECK-SAME: "-o" [[OPT_906XE_BC:".*-gfx906\+xnack\+sram-ecc.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XE_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa" "-filetype=obj"
+// CHECK-SAME: {{.*}} "-mcpu=gfx906"
+// CHECK-SAME: "-o" {{".*-gfx906\+xnack\+sram-ecc.*o"}}
+
+// CHECK: {{".*clang-offload-bundler"}}
+// CHECK-SAME: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx906,hip-amdgcn-amd-amdhsa-gfx906+xnack,hip-amdgcn-amd-amdhsa-gfx906+xnack+sram-ecc"
Index: test/Driver/hip-invalid-offloading-target-id.hip
===
--- /dev/null
+++ test/Driver/hip-invalid-offloading-target-id.hip
@@ -0,0 +1,48 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offloading-target-id=gfx906 \
+// RUN:   --offloading-target-id=gfx906xnack \
+// RUN:   --hip-device-lib-path=%S/Inputs/hip_dev_lib \
+// RUN:   %s 2>&1 | FileCheck -check-prefix=NOPLUS %s
+
+// NOPLUS: error: Invalid HIP offloading target id: gfx906xnack
+
+// RUN: %clang -### -target