[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-08-18 Thread Yaxun Liu via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG7546b29e7616: [HIP] Support target id by --offload-arch 
(authored by yaxunl).
Herald added a project: clang.

Changed prior to commit:
  https://reviews.llvm.org/D60620?vs=285482=286469#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/TargetID.h
  clang/include/clang/Basic/TargetInfo.h
  clang/include/clang/Driver/Compilation.h
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/TargetID.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_isa_version_908.bc
  clang/test/Driver/amdgpu-features.c
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  clang/test/Driver/hip-invalid-target-id.hip
  clang/test/Driver/hip-target-id.hip
  clang/test/Driver/hip-toolchain-features.hip
  clang/test/Driver/invalid-target-id.cl
  clang/test/Driver/target-id-macros.cl
  clang/test/Driver/target-id-macros.hip
  clang/test/Driver/target-id.cl
  llvm/include/llvm/Support/TargetParser.h
  llvm/lib/Support/TargetParser.cpp

Index: llvm/lib/Support/TargetParser.cpp
===
--- llvm/lib/Support/TargetParser.cpp
+++ llvm/lib/Support/TargetParser.cpp
@@ -83,26 +83,26 @@
   {{"mullins"},   {"gfx703"},  GK_GFX703,  FEATURE_NONE},
   {{"gfx704"},{"gfx704"},  GK_GFX704,  FEATURE_NONE},
   {{"bonaire"},   {"gfx704"},  GK_GFX704,  FEATURE_NONE},
-  {{"gfx801"},{"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"carrizo"},   {"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx802"},{"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"iceland"},   {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"tonga"}, {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx803"},{"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"fiji"},  {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"polaris10"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"polaris11"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx810"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
-  {{"stoney"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx904"},{"gfx904"},  GK_GFX904,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx906"},{"gfx906"},  GK_GFX906,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx908"},{"gfx908"},  GK_GFX908,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx909"},{"gfx909"},  GK_GFX909,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx1010"},   {"gfx1010"}, GK_GFX1010, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1011"},   {"gfx1011"}, GK_GFX1011, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1012"},   {"gfx1012"}, GK_GFX1012, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
+  {{"gfx801"},{"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"carrizo"},   {"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx802"},{"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"iceland"},   {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"tonga"}, {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx803"},{"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"fiji"},  {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"polaris10"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"polaris11"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx810"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"stoney"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx904"},{"gfx904"},  GK_GFX904,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx906"},

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-08-13 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked 3 inline comments as done.
yaxunl added a comment.

In D60620#2216796 , @tra wrote:

> Couple of minor nits. LGTM otherwise.

Will revise as suggested when committing. Thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-08-13 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision.
tra added a comment.
This revision is now accepted and ready to land.

Couple of minor nits. LGTM otherwise.




Comment at: clang/include/clang/Basic/TargetID.h:42
+/// is not a null pointer.
+/// If \p CanonicalizeProc is true, canonicalize returned processor name.
+llvm::Optional

yaxunl wrote:
> tra wrote:
> > Comment needs updating as parameters and return value have changed.
> done
The comment still mentions `\p IsValid`.



Comment at: clang/include/clang/Basic/TargetID.h:56-58
+bool isValidTargetIDCombination(
+const std::set ,
+std::pair *ConflictingTIDs = nullptr);

yaxunl wrote:
> tra wrote:
> > Looks like a good candidate for using a std::optional return 
> > value.
> > 
> done
`hasConflictingTargetIDCombination()` ? Optional is convertible to bool and 
`has` better reflects the purpose of the function -- you want to know whether 
there's a conflict. What exactly conflicts is sort of secondary info, only used 
to provide additional details for diags.



Comment at: clang/lib/Basic/TargetID.cpp:161
+  if (llvm::any_of(Features, [&](auto ) {
+return ExistingFeatures.find(F.first()) == ExistingFeatures.end();
+  }))

Nit: `find(...) == end()` -> `count == 0` ? Makes it shorter and arguably 
easier to read.



Comment at: clang/lib/Driver/Driver.cpp:2795-2799
+  auto CTID = getConflictTargetIDCombination(GpuArchs);
+  if (!CTID)
+return true;
+  ConflictingTIDs = CTID.getValue();
+  return false;

Could be simplified a bit:
```
if (auto CTID = getConflictTargetIDCombination(GpuArchs)) {
  ConflictingTIDs = CTID.getValue();
  return false
}
return true;
```

Also, it does not seem to add any new functionality to 
getConflictTargetIDCombination(). Perhaps it would make sense to change the 
function signatures to match and just use `return 
getConflictTargetIDCombination()`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-08-13 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/include/clang/Basic/TargetID.h:42
+/// is not a null pointer.
+/// If \p CanonicalizeProc is true, canonicalize returned processor name.
+llvm::Optional

tra wrote:
> Comment needs updating as parameters and return value have changed.
done



Comment at: clang/include/clang/Basic/TargetID.h:56-58
+bool isValidTargetIDCombination(
+const std::set ,
+std::pair *ConflictingTIDs = nullptr);

tra wrote:
> Looks like a good candidate for using a std::optional return value.
> 
done



Comment at: clang/lib/Basic/TargetID.cpp:161-169
+  for (auto & : Features) {
+if (ExistingFeatures.find(F.first()) == ExistingFeatures.end()) {
+  if (ConflictingTIDs) {
+ConflictingTIDs->first = Loc->second.TargetID;
+ConflictingTIDs->second = ID;
+  }
+  return false;

tra wrote:
> This could probably be expressed better with any_of():
> ```
> if (llvm::any_of(Features, [](auto ){
>return ExistingFeatures.count(F.first) == 0;
>   })
>   return {Loc->second.TargetID, ID};
> ```
> 
> The outer loop could also be transformed into a form of llvm::for_each or 
> llvm::any_of() with an inner lambda returning an optional tuple on conflict.
> 
will only change the inner loop since changing the outer loop makes the code 
more difficult to understand.



Comment at: clang/lib/Basic/Targets/AMDGPU.h:404
+  // pre-defined macros.
+  bool handleTargetFeatures(std::vector ,
+DiagnosticsEngine ) override {

tra wrote:
> We never return anything but true. Change return to void?
This is a target hook which allows target specific handling. Some targets may 
return false.



Comment at: clang/lib/Basic/Targets/AMDGPU.h:412-417
+  auto Loc =
+  std::find(TargetIDFeatures.begin(), TargetIDFeatures.end(), Name);
+  if (Loc == TargetIDFeatures.end())
+continue;
+  assert(OffloadArchFeatures.find(Name) == OffloadArchFeatures.end());
+  OffloadArchFeatures[Name] = IsOn;

tra wrote:
> Nit: for small-ish loops over ranges, I generally find that standard 
> functional-stile functions to be more expressive. 
> IMO, it's easier to read something like this:
> 
> ```
> llvm::for_each(Features, [](auto F){
> ...
>Name = ...
>if (llvm::any_of(TargetIDFeatures, [](N){ return N == Name; })) { // or 
> use llvm::find()
>   // update OffloadArchFeatures.
>}
> })
> ```
> 
> Again, it's a personal style choice. The function is OK as is, I'm just 
> flagging places where I had to think what the code does, where the code could 
> convey the intent in a more direct way.
done



Comment at: clang/lib/Driver/Driver.cpp:98-99
+static llvm::Triple getHIPOffloadTargetTriple() {
+  static const llvm::Triple T("amdgcn-amd-amdhsa");
+  return T;
+}

tra wrote:
> Why not just return llvmTriple("amdgcn-amd-amdhsa") ?
to avoid construct this multiple times and have multiple copies



Comment at: clang/lib/Driver/Driver.cpp:2403
+  /// Target ID string which is persistent throughout the compilation.
+  const char *ID;
+  TargetID(CudaArch Arch) { ID = CudaArchToString(Arch); }

tra wrote:
> just make it std::string. There's no point tinkering with pointers here.
> 
> Also, I'm not sure why the whole TargetID can't be just a std::string.
This is used by both CUDA and HIP. For CUDA it is the GPU arch string, for HIP 
it is target ID. The const char* passed to the ctor is persistent through the 
whole compilation already. And their usage expect them to be persistent across 
the whole compilation. Changing this to std::string make it not persist across 
the whole compilation since it is a member of ActionBuilder.



Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:543-545
+auto Pos = FeatureMap.find(Feature);
+if (Pos == FeatureMap.end())
+  continue;

tra wrote:
> `FeatureMap.count() == 0` ? 
we need to use Pos below


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-08-13 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 285482.
yaxunl marked 13 inline comments as done.
yaxunl added a comment.

revised by Artem's comments, with minor bug fixes.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/TargetID.h
  clang/include/clang/Basic/TargetInfo.h
  clang/include/clang/Driver/Compilation.h
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/TargetID.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_isa_version_908.bc
  clang/test/Driver/amdgpu-features.c
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  clang/test/Driver/hip-invalid-target-id.hip
  clang/test/Driver/hip-target-id.hip
  clang/test/Driver/hip-toolchain-features.hip
  clang/test/Driver/hip-toolchain-no-rdc.hip
  clang/test/Driver/hip-toolchain-rdc-separate.hip
  clang/test/Driver/hip-toolchain-rdc-static-lib.hip
  clang/test/Driver/hip-toolchain-rdc.hip
  clang/test/Driver/invalid-target-id.cl
  clang/test/Driver/target-id-macros.cl
  clang/test/Driver/target-id-macros.hip
  clang/test/Driver/target-id.cl
  llvm/include/llvm/Support/TargetParser.h
  llvm/lib/Support/TargetParser.cpp

Index: llvm/lib/Support/TargetParser.cpp
===
--- llvm/lib/Support/TargetParser.cpp
+++ llvm/lib/Support/TargetParser.cpp
@@ -83,26 +83,26 @@
   {{"mullins"},   {"gfx703"},  GK_GFX703,  FEATURE_NONE},
   {{"gfx704"},{"gfx704"},  GK_GFX704,  FEATURE_NONE},
   {{"bonaire"},   {"gfx704"},  GK_GFX704,  FEATURE_NONE},
-  {{"gfx801"},{"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"carrizo"},   {"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx802"},{"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"iceland"},   {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"tonga"}, {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx803"},{"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"fiji"},  {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"polaris10"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"polaris11"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx810"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
-  {{"stoney"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx904"},{"gfx904"},  GK_GFX904,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx906"},{"gfx906"},  GK_GFX906,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx908"},{"gfx908"},  GK_GFX908,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx909"},{"gfx909"},  GK_GFX909,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx1010"},   {"gfx1010"}, GK_GFX1010, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1011"},   {"gfx1011"}, GK_GFX1011, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1012"},   {"gfx1012"}, GK_GFX1012, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
+  {{"gfx801"},{"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"carrizo"},   {"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx802"},{"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"iceland"},   {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"tonga"}, {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx803"},{"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"fiji"},  {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"polaris10"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"polaris11"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx810"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"stoney"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx904"},{"gfx904"},  GK_GFX904,  

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-08-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

Looks good in general. Mostly C++ style comments below.




Comment at: clang/include/clang/Basic/TargetID.h:30
+/// Returns canonical processor name or empty if the processor name is invalid.
+llvm::StringRef getProcessorFromTargetID(const llvm::Triple ,
+ llvm::StringRef OffloadArch);

Nit: In cases where performance is not absolutely critical, I'd prefer to use 
std::string. This way I don't need to worry what exactly is that reference 
referencing and I can just store the result. Keeps things simple. With 
StringRef one has to be more cautious -- how long will that reference keep 
pointing to the right value? In general, the answer requires knowing the 
details of the implementation. With std::string, you just use the value and let 
compiler eliminate intermediate values.
In this case you have used StringRef in other places and it's also used for 
similar purposes all over the place, so it's just my personal preference. 



Comment at: clang/include/clang/Basic/TargetID.h:42
+/// is not a null pointer.
+/// If \p CanonicalizeProc is true, canonicalize returned processor name.
+llvm::Optional

Comment needs updating as parameters and return value have changed.



Comment at: clang/include/clang/Basic/TargetID.h:56-58
+bool isValidTargetIDCombination(
+const std::set ,
+std::pair *ConflictingTIDs = nullptr);

Looks like a good candidate for using a std::optional return value.




Comment at: clang/lib/Basic/TargetID.cpp:161-169
+  for (auto & : Features) {
+if (ExistingFeatures.find(F.first()) == ExistingFeatures.end()) {
+  if (ConflictingTIDs) {
+ConflictingTIDs->first = Loc->second.TargetID;
+ConflictingTIDs->second = ID;
+  }
+  return false;

This could probably be expressed better with any_of():
```
if (llvm::any_of(Features, [](auto ){
   return ExistingFeatures.count(F.first) == 0;
  })
  return {Loc->second.TargetID, ID};
```

The outer loop could also be transformed into a form of llvm::for_each or 
llvm::any_of() with an inner lambda returning an optional tuple on conflict.




Comment at: clang/lib/Basic/Targets/AMDGPU.h:404
+  // pre-defined macros.
+  bool handleTargetFeatures(std::vector ,
+DiagnosticsEngine ) override {

We never return anything but true. Change return to void?



Comment at: clang/lib/Basic/Targets/AMDGPU.h:412-417
+  auto Loc =
+  std::find(TargetIDFeatures.begin(), TargetIDFeatures.end(), Name);
+  if (Loc == TargetIDFeatures.end())
+continue;
+  assert(OffloadArchFeatures.find(Name) == OffloadArchFeatures.end());
+  OffloadArchFeatures[Name] = IsOn;

Nit: for small-ish loops over ranges, I generally find that standard 
functional-stile functions to be more expressive. 
IMO, it's easier to read something like this:

```
llvm::for_each(Features, [](auto F){
...
   Name = ...
   if (llvm::any_of(TargetIDFeatures, [](N){ return N == Name; })) { // or use 
llvm::find()
  // update OffloadArchFeatures.
   }
})
```

Again, it's a personal style choice. The function is OK as is, I'm just 
flagging places where I had to think what the code does, where the code could 
convey the intent in a more direct way.



Comment at: clang/lib/Driver/Driver.cpp:98-99
+static llvm::Triple getHIPOffloadTargetTriple() {
+  static const llvm::Triple T("amdgcn-amd-amdhsa");
+  return T;
+}

Why not just return llvmTriple("amdgcn-amd-amdhsa") ?



Comment at: clang/lib/Driver/Driver.cpp:2403
+  /// Target ID string which is persistent throughout the compilation.
+  const char *ID;
+  TargetID(CudaArch Arch) { ID = CudaArchToString(Arch); }

just make it std::string. There's no point tinkering with pointers here.

Also, I'm not sure why the whole TargetID can't be just a std::string.



Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:523
+  StringRef GpuArch;
+  llvm::StringMap FeatureMap;
+  StringRef TargetID = DriverArgs.getLastArgValue(options::OPT_mcpu_EQ);

I'd move both vars  down to where they are used first.



Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:526
+  if (TargetID.empty())
+return GpuArch;
+

`StringRef()` would make it more explicit that it's a failure.



Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:531
+getDriver().Diag(clang::diag::err_drv_bad_target_id) << TargetID;
+return GpuArch;
+  }

ditto.



Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:543-545
+auto Pos = FeatureMap.find(Feature);
+if (Pos == FeatureMap.end())
+  continue;


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-08-11 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 284882.
yaxunl added a comment.

rebase to ToT and minor bug fixes


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/TargetID.h
  clang/include/clang/Basic/TargetInfo.h
  clang/include/clang/Driver/Compilation.h
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/TargetID.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_isa_version_908.bc
  clang/test/Driver/amdgpu-features.c
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  clang/test/Driver/hip-invalid-target-id.hip
  clang/test/Driver/hip-target-id.hip
  clang/test/Driver/hip-toolchain-features.hip
  clang/test/Driver/invalid-target-id.cl
  clang/test/Driver/target-id-macros.cl
  clang/test/Driver/target-id-macros.hip
  clang/test/Driver/target-id.cl
  llvm/include/llvm/Support/TargetParser.h
  llvm/lib/Support/TargetParser.cpp

Index: llvm/lib/Support/TargetParser.cpp
===
--- llvm/lib/Support/TargetParser.cpp
+++ llvm/lib/Support/TargetParser.cpp
@@ -94,15 +94,15 @@
   {{"polaris11"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
   {{"gfx810"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
   {{"stoney"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx904"},{"gfx904"},  GK_GFX904,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx906"},{"gfx906"},  GK_GFX906,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx908"},{"gfx908"},  GK_GFX908,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx909"},{"gfx909"},  GK_GFX909,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx1010"},   {"gfx1010"}, GK_GFX1010, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1011"},   {"gfx1011"}, GK_GFX1011, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1012"},   {"gfx1012"}, GK_GFX1012, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
+  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx904"},{"gfx904"},  GK_GFX904,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx906"},{"gfx906"},  GK_GFX906,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK|FEATURE_SRAM_ECC},
+  {{"gfx908"},{"gfx908"},  GK_GFX908,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK|FEATURE_SRAM_ECC},
+  {{"gfx909"},{"gfx909"},  GK_GFX909,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx1010"},   {"gfx1010"}, GK_GFX1010, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32|FEATURE_XNACK},
+  {{"gfx1011"},   {"gfx1011"}, GK_GFX1011, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32|FEATURE_XNACK},
+  {{"gfx1012"},   {"gfx1012"}, GK_GFX1012, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32|FEATURE_XNACK},
   {{"gfx1030"},   {"gfx1030"}, GK_GFX1030, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
   {{"gfx1031"},   {"gfx1031"}, GK_GFX1031, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
 };
@@ -212,6 +212,15 @@
   }
 }
 
+StringRef AMDGPU::getCanonicalArchName(const Triple , StringRef Arch) {
+  assert(T.isAMDGPU());
+  auto ProcKind = T.isAMDGCN() ? parseArchAMDGCN(Arch) : parseArchR600(Arch);
+  if (ProcKind == GK_NONE)
+return StringRef();
+
+  return T.isAMDGCN() ? getArchNameAMDGCN(ProcKind) : getArchNameR600(ProcKind);
+}
+
 namespace llvm {
 namespace RISCV {
 
Index: llvm/include/llvm/Support/TargetParser.h
===
--- llvm/include/llvm/Support/TargetParser.h
+++ llvm/include/llvm/Support/TargetParser.h
@@ -113,12 +113,18 @@
   FEATURE_FAST_DENORMAL_F32 = 1 << 5,
 
   // Wavefront 32 is available.
-  FEATURE_WAVE32 = 1 << 6
+  FEATURE_WAVE32 = 1 << 6,
+
+  // Xnack is available.
+  FEATURE_XNACK = 1 << 7,
+
+  // Sram-ecc is available.
+  FEATURE_SRAM_ECC = 1 << 8,
 };
 
 StringRef getArchNameAMDGCN(GPUKind AK);
 StringRef getArchNameR600(GPUKind AK);
-StringRef getCanonicalArchName(StringRef Arch);
+StringRef getCanonicalArchName(const Triple , StringRef Arch);
 GPUKind parseArchAMDGCN(StringRef CPU);
 GPUKind 

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-07-28 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 281465.
yaxunl added a comment.

separate emitting target-id module flag to a different patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/TargetID.h
  clang/include/clang/Basic/TargetInfo.h
  clang/include/clang/Driver/Compilation.h
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/TargetID.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_isa_version_908.bc
  clang/test/Driver/amdgpu-features.c
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  clang/test/Driver/hip-invalid-target-id.hip
  clang/test/Driver/hip-target-id.hip
  clang/test/Driver/hip-toolchain-features.hip
  clang/test/Driver/invalid-target-id.cl
  clang/test/Driver/target-id-macros.cl
  clang/test/Driver/target-id-macros.hip
  clang/test/Driver/target-id.cl
  llvm/include/llvm/Support/TargetParser.h
  llvm/lib/Support/TargetParser.cpp

Index: llvm/lib/Support/TargetParser.cpp
===
--- llvm/lib/Support/TargetParser.cpp
+++ llvm/lib/Support/TargetParser.cpp
@@ -83,26 +83,26 @@
   {{"mullins"},   {"gfx703"},  GK_GFX703,  FEATURE_NONE},
   {{"gfx704"},{"gfx704"},  GK_GFX704,  FEATURE_NONE},
   {{"bonaire"},   {"gfx704"},  GK_GFX704,  FEATURE_NONE},
-  {{"gfx801"},{"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"carrizo"},   {"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx802"},{"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"iceland"},   {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"tonga"}, {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx803"},{"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"fiji"},  {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"polaris10"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"polaris11"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx810"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
-  {{"stoney"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx904"},{"gfx904"},  GK_GFX904,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx906"},{"gfx906"},  GK_GFX906,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx908"},{"gfx908"},  GK_GFX908,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx909"},{"gfx909"},  GK_GFX909,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx1010"},   {"gfx1010"}, GK_GFX1010, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1011"},   {"gfx1011"}, GK_GFX1011, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1012"},   {"gfx1012"}, GK_GFX1012, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
+  {{"gfx801"},{"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"carrizo"},   {"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx802"},{"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"iceland"},   {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"tonga"}, {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx803"},{"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"fiji"},  {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"polaris10"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"polaris11"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx810"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"stoney"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx904"},{"gfx904"},  GK_GFX904,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx906"},{"gfx906"},  GK_GFX906,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx908"},{"gfx908"},  GK_GFX908,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK|FEATURE_SRAM_ECC},
+  {{"gfx909"},

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-07-28 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

@tra target ID documentation is added by https://reviews.llvm.org/D84822


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-07-24 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 280567.
yaxunl marked 9 inline comments as done.
yaxunl added a comment.

revised by Artem's comments


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/TargetID.h
  clang/include/clang/Basic/TargetInfo.h
  clang/include/clang/Driver/Compilation.h
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/TargetID.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/CodeGenCUDA/target-id.hip
  clang/test/CodeGenOpenCL/target-id.cl
  clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_isa_version_908.bc
  clang/test/Driver/amdgpu-features.c
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  clang/test/Driver/hip-invalid-target-id.hip
  clang/test/Driver/hip-target-id.hip
  clang/test/Driver/hip-toolchain-features.hip
  clang/test/Driver/invalid-target-id.cl
  clang/test/Driver/target-id-macros.cl
  clang/test/Driver/target-id-macros.hip
  clang/test/Driver/target-id.cl
  llvm/include/llvm/Support/TargetParser.h
  llvm/lib/Support/TargetParser.cpp

Index: llvm/lib/Support/TargetParser.cpp
===
--- llvm/lib/Support/TargetParser.cpp
+++ llvm/lib/Support/TargetParser.cpp
@@ -83,26 +83,26 @@
   {{"mullins"},   {"gfx703"},  GK_GFX703,  FEATURE_NONE},
   {{"gfx704"},{"gfx704"},  GK_GFX704,  FEATURE_NONE},
   {{"bonaire"},   {"gfx704"},  GK_GFX704,  FEATURE_NONE},
-  {{"gfx801"},{"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"carrizo"},   {"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx802"},{"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"iceland"},   {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"tonga"}, {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx803"},{"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"fiji"},  {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"polaris10"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"polaris11"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx810"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
-  {{"stoney"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx904"},{"gfx904"},  GK_GFX904,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx906"},{"gfx906"},  GK_GFX906,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx908"},{"gfx908"},  GK_GFX908,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx909"},{"gfx909"},  GK_GFX909,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx1010"},   {"gfx1010"}, GK_GFX1010, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1011"},   {"gfx1011"}, GK_GFX1011, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1012"},   {"gfx1012"}, GK_GFX1012, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
+  {{"gfx801"},{"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"carrizo"},   {"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx802"},{"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"iceland"},   {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"tonga"}, {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx803"},{"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"fiji"},  {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"polaris10"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"polaris11"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx810"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"stoney"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx904"},{"gfx904"},  GK_GFX904,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx906"},{"gfx906"},  GK_GFX906,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx908"},

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-07-24 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked 20 inline comments as done.
yaxunl added inline comments.



Comment at: clang/include/clang/Driver/Driver.h:332
 
+  llvm::Triple getHIPOffloadTargetTriple() const;
+

tra wrote:
> This is used exclusively by the Driver.cpp and does not have to be a public 
> API call.
done



Comment at: clang/lib/Basic/TargetID.cpp:18
+
+static const llvm::SmallVector
+getAllPossibleAMDGPUTargetIDFeatures(const llvm::Triple ,

tra wrote:
> Nit. You could use llvm::SmallVectorImpl -- caller only 
> cares that it's an array of StringRef and does not need to know the size hint.
> Makes it easier to change the hint w/o having to replace the constant 
> evrywhere.
It seems I cannot return a SmallVector as SmallVectorImpl since copy ctor is 
deleted.



Comment at: clang/lib/Basic/TargetID.cpp:53
+
+llvm::StringRef parseTargetID(const llvm::Triple ,
+  llvm::StringRef OffloadArch,

tra wrote:
> tra wrote:
> > A comment describing expected format would be helpful.
> > 
> I'd restructure things a bit.
> First, I'd make return type std::optionaland fold IsValid into it.
> Then I would make FeatureMap argument a non-optional, so the parsing can 
> concentrate on parsing only.
> Then I'd add another overload without FeatureMap argument, which would be a 
> warpper over the real parser with a temp FeatureMap which will be discarded.
> 
> This should make things easier to read.
done



Comment at: clang/lib/Basic/TargetID.cpp:53-69
+llvm::StringRef parseTargetID(const llvm::Triple ,
+  llvm::StringRef OffloadArch,
+  llvm::StringMap *FeatureMap,
+  bool *IsValid) {
+  llvm::StringRef ArchStr;
+  auto SetValid = [&](bool Valid) {
+if (IsValid)

yaxunl wrote:
> tra wrote:
> > tra wrote:
> > > A comment describing expected format would be helpful.
> > > 
> > I'd restructure things a bit.
> > First, I'd make return type std::optionaland fold IsValid into 
> > it.
> > Then I would make FeatureMap argument a non-optional, so the parsing can 
> > concentrate on parsing only.
> > Then I'd add another overload without FeatureMap argument, which would be a 
> > warpper over the real parser with a temp FeatureMap which will be discarded.
> > 
> > This should make things easier to read.
> done
parseTargetID actually has two usage pattern: 1. parse the entire target ID 
including processor and features and returns the processor, features, and 
whether the target ID is valid 2. parse the processor part of the target ID 
only and returns the processor or an empty string if the processor is invalid

For usage 1 I will revise it by your suggestion.

For usage 2 I will separate it to a different function getProcessorFromTargetID



Comment at: clang/lib/Basic/TargetID.cpp:103
+
+std::string getCanonicalTargetID(llvm::StringRef Processor,
+ const llvm::StringMap ) {

tra wrote:
> What does 'canonical' mean? A comment would be helpful.
done



Comment at: clang/lib/Basic/TargetID.cpp:116
+static llvm::StringRef
+parseCanonicalTargetIDWithoutCheck(llvm::StringRef OffloadArch,
+   llvm::StringMap *FeatureMap) {

tra wrote:
> Perhaps we can further split parsing offloadID vs checking whether it's valid 
> and make parseTargetID above call this parse-only helper.
> 
> E.g. something like this:
> 
> ```
> something parseTargetIDhelper(something); // Parses targetID 
> something isTargetIdValid(something);  // Verivies validity of parsed 
> parts.
> std::optional parseTargetID(FeatureMap) {
>parseTargetIDhelper(...);
>if (!targetIDValid())
>   return None;
>return Good;
> }
> std::optional parseTargetID() {
>auto TempFeatureMap;
>   return parseTargetID();
> }
> ```
done



Comment at: clang/lib/Basic/Targets/AMDGPU.cpp:366
+  std::replace(NewF.begin(), NewF.end(), '-', '_');
+  Builder.defineMacro(Twine("__amdgcn_") + Twine(NewF) + Twine("__"),
+  Loc->second ? "1" : "0");

tra wrote:
> Nit: Should it be "__amdgcn_feature_" to make it more explicit where these 
> macros are derived from?
done



Comment at: clang/lib/CodeGen/CodeGenModule.cpp:601-605
+llvm::MDString::get(
+getModule().getContext(),
+TargetIDStr == ""
+? TargetIDStr
+: (Twine(getTriple().str()) + "-" + TargetIDStr).str()));

tra wrote:
> I think this may cause problems.
> 
> Twine.str() will return a temporary std::string.
> MDString::get takes a StringRef as the second parameter, so it will be a 
> reference to the temporary. It will then get added to the module's metadata 
> which will probably outlive the temporary 

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-07-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments.



Comment at: clang/include/clang/Driver/Driver.h:332
 
+  llvm::Triple getHIPOffloadTargetTriple() const;
+

This is used exclusively by the Driver.cpp and does not have to be a public API 
call.



Comment at: clang/lib/Basic/TargetID.cpp:18
+
+static const llvm::SmallVector
+getAllPossibleAMDGPUTargetIDFeatures(const llvm::Triple ,

Nit. You could use llvm::SmallVectorImpl -- caller only cares 
that it's an array of StringRef and does not need to know the size hint.
Makes it easier to change the hint w/o having to replace the constant evrywhere.



Comment at: clang/lib/Basic/TargetID.cpp:53
+
+llvm::StringRef parseTargetID(const llvm::Triple ,
+  llvm::StringRef OffloadArch,

A comment describing expected format would be helpful.




Comment at: clang/lib/Basic/TargetID.cpp:53-69
+llvm::StringRef parseTargetID(const llvm::Triple ,
+  llvm::StringRef OffloadArch,
+  llvm::StringMap *FeatureMap,
+  bool *IsValid) {
+  llvm::StringRef ArchStr;
+  auto SetValid = [&](bool Valid) {
+if (IsValid)

tra wrote:
> A comment describing expected format would be helpful.
> 
I'd restructure things a bit.
First, I'd make return type std::optionaland fold IsValid into it.
Then I would make FeatureMap argument a non-optional, so the parsing can 
concentrate on parsing only.
Then I'd add another overload without FeatureMap argument, which would be a 
warpper over the real parser with a temp FeatureMap which will be discarded.

This should make things easier to read.



Comment at: clang/lib/Basic/TargetID.cpp:103
+
+std::string getCanonicalTargetID(llvm::StringRef Processor,
+ const llvm::StringMap ) {

What does 'canonical' mean? A comment would be helpful.



Comment at: clang/lib/Basic/TargetID.cpp:116
+static llvm::StringRef
+parseCanonicalTargetIDWithoutCheck(llvm::StringRef OffloadArch,
+   llvm::StringMap *FeatureMap) {

Perhaps we can further split parsing offloadID vs checking whether it's valid 
and make parseTargetID above call this parse-only helper.

E.g. something like this:

```
something parseTargetIDhelper(something); // Parses targetID 
something isTargetIdValid(something);  // Verivies validity of parsed parts.
std::optional parseTargetID(FeatureMap) {
   parseTargetIDhelper(...);
   if (!targetIDValid())
  return None;
   return Good;
}
std::optional parseTargetID() {
   auto TempFeatureMap;
  return parseTargetID();
}
```



Comment at: clang/lib/Basic/Targets/AMDGPU.cpp:366
+  std::replace(NewF.begin(), NewF.end(), '-', '_');
+  Builder.defineMacro(Twine("__amdgcn_") + Twine(NewF) + Twine("__"),
+  Loc->second ? "1" : "0");

Nit: Should it be "__amdgcn_feature_" to make it more explicit where these 
macros are derived from?



Comment at: clang/lib/CodeGen/CodeGenModule.cpp:601-605
+llvm::MDString::get(
+getModule().getContext(),
+TargetIDStr == ""
+? TargetIDStr
+: (Twine(getTriple().str()) + "-" + TargetIDStr).str()));

I think this may cause problems.

Twine.str() will return a temporary std::string.
MDString::get takes a StringRef as the second parameter, so it will be a 
reference to the temporary. It will then get added to the module's metadata 
which will probably outlive the temporary string. The tests for the MDString do 
appear to use string storage that outlives MDString.




Comment at: clang/lib/Driver/Driver.cpp:2602-2603
 
+  if (!isValidOffloadArchCombination(GpuArchs))
+return true;
+

This is something we may want to diagnose. 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-07-18 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 279000.
yaxunl added a comment.
Herald added subscribers: llvm-commits, dang, hiraditya.
Herald added a project: LLVM.

rebase and added more checks.

The documentation work is still under development.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/TargetID.h
  clang/include/clang/Basic/TargetInfo.h
  clang/include/clang/Driver/Compilation.h
  clang/include/clang/Driver/Driver.h
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/TargetID.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/CodeGenCUDA/target-id.hip
  clang/test/CodeGenOpenCL/target-id.cl
  clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_isa_version_908.bc
  clang/test/Driver/amdgpu-features.c
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  clang/test/Driver/hip-invalid-target-id.hip
  clang/test/Driver/hip-target-id.hip
  clang/test/Driver/hip-toolchain-features.hip
  clang/test/Driver/invalid-target-id.cl
  clang/test/Driver/target-id-macros.cl
  clang/test/Driver/target-id-macros.hip
  clang/test/Driver/target-id.cl
  llvm/include/llvm/Support/TargetParser.h
  llvm/lib/Support/TargetParser.cpp

Index: llvm/lib/Support/TargetParser.cpp
===
--- llvm/lib/Support/TargetParser.cpp
+++ llvm/lib/Support/TargetParser.cpp
@@ -83,26 +83,26 @@
   {{"mullins"},   {"gfx703"},  GK_GFX703,  FEATURE_NONE},
   {{"gfx704"},{"gfx704"},  GK_GFX704,  FEATURE_NONE},
   {{"bonaire"},   {"gfx704"},  GK_GFX704,  FEATURE_NONE},
-  {{"gfx801"},{"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"carrizo"},   {"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx802"},{"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"iceland"},   {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"tonga"}, {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx803"},{"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"fiji"},  {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"polaris10"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"polaris11"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx810"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
-  {{"stoney"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32},
-  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx904"},{"gfx904"},  GK_GFX904,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx906"},{"gfx906"},  GK_GFX906,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx908"},{"gfx908"},  GK_GFX908,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx909"},{"gfx909"},  GK_GFX909,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32},
-  {{"gfx1010"},   {"gfx1010"}, GK_GFX1010, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1011"},   {"gfx1011"}, GK_GFX1011, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
-  {{"gfx1012"},   {"gfx1012"}, GK_GFX1012, FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32},
+  {{"gfx801"},{"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"carrizo"},   {"gfx801"},  GK_GFX801,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx802"},{"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"iceland"},   {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"tonga"}, {"gfx802"},  GK_GFX802,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx803"},{"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"fiji"},  {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"polaris10"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"polaris11"}, {"gfx803"},  GK_GFX803,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx810"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"stoney"},{"gfx810"},  GK_GFX810,  FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx900"},{"gfx900"},  GK_GFX900,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx902"},{"gfx902"},  GK_GFX902,  FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_XNACK},
+  {{"gfx904"},{"gfx904"},  GK_GFX904,  

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-06-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D60620#2089722 , @yaxunl wrote:

> In D60620#2067134 , @tra wrote:
>
> > Do you expect users to specify these IDs? How do you see it being used in 
> > practice? I think you do need to implement a user-friendly shortcut and 
> > expand it to the detailed offload-id internally. I'm fine with allowing 
> > explicit offload id as a hidden argument, but I don't think it's suitable 
> > for something that will be used by everyone who can't be expected to be 
> > aware of all the gory details of particular GPU features.
>
>
> The good thing about this target id is that it is backward compatible with 
> GPU arch. For common users who are not concerned with specific GPU 
> configurations, they can just use the old GPU arch and nothing changes. This 
> is because GPU arch without features implies default value for these 
> features, which work on all configurations. For advanced users who do need to 
> build for specific GPU configurations, they should already have the knowledge 
> about the name and meaning of these configurations by reading the AMDGPU user 
> guide (http://llvm.org/docs/AMDGPUUsage.html). Therefore a target id in the 
> form of gfx908:xnack+ is not something cryptic to them. On the other hand, an 
> encoded GPU arch like gfx908a is cryptic since it has no meaning at all.


I don't quite agree with the `gfx908:xnack+ is not something cryptic` 
assertion. I've looked at the AMDGPUUsage.html and I am pretty sure that I 
still have no clue which ID will be correct for my WX8200. It does not mention 
the card, nor does it specify the offload format. Having to type the IDs with 
the features ordered just so (i.e. without normalization) puts a fair amount of 
burden on the user. Not only they must remember which features must be on or 
off, but they also need to specify them in a very specific order (it's not even 
lexicographically ordered) . I think adding normalization to make it possible 
to specify features in arbitrary order would mitigate some of it.

As it's implemented now, my bet is that it will be *very* annoying to use in 
practice.

At the very least,  you should document the requirements for the offload ID 
format with the specific examples. It would also be useful to provide specific 
offload IDs for particular GPU cards as that's what regular users will have 
info about. Right now the AMDGPUUsage doc does not provide sufficient details 
to derive correct offload ID if all you have is a name of the GPU card. That's 
going to be the case for most of clang users who just want to build things for 
their GPU.

That said, the scheme in the current version of the patch is flexible enough to 
retrofit simplified names later, so I'm overall OK with proceeding with the 
patch once documentation has been updated.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-06-12 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D60620#2067134 , @tra wrote:

> Do you expect users to specify these IDs? How do you see it being used in 
> practice? I think you do need to implement a user-friendly shortcut and 
> expand it to the detailed offload-id internally. I'm fine with allowing 
> explicit offload id as a hidden argument, but I don't think it's suitable for 
> something that will be used by everyone who can't be expected to be aware of 
> all the gory details of particular GPU features.


The good thing about this target id is that it is backward compatible with GPU 
arch. For common users who are not concerned with specific GPU configurations, 
they can just use the old GPU arch and nothing changes. This is because GPU 
arch without features implies default value for these features, which work on 
all configurations. For advanced users who do need to build for specific GPU 
configurations, they should already have the knowledge about the name and 
meaning of these configurations by reading the AMDGPU user guide 
(http://llvm.org/docs/AMDGPUUsage.html). Therefore a target id in the form of 
gfx908:xnack+ is not something cryptic to them. On the other hand, an encoded 
GPU arch like gfx908a is cryptic since it has no meaning at all.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-06-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D60620#2064839 , @yaxunl wrote:

> It means HIP will create two compilation passes: one for gfx908 and one for 
> gfx908:xnack+:sramecc+.


OK, so empty feature list may also be valid.

> 
> 
>> One thing you'll need is a way to normalize the arch+features tuple so we 
>> can compare them.
> 
> We require features in target id follow a pre-defined order. This may not be 
> alphabetical order since later on we may add more features.

Do you expect users to specify these IDs? How do you see it being used in 
practice? I think you do need to implement a user-friendly shortcut and expand 
it to the detailed offload-id internally. I'm fine with allowing explicit 
offload id as a hidden argument, but I don't think it's suitable for something 
that will be used by everyone who can't be expected to be aware of all the gory 
details of particular GPU features.

> 
> 
>> What I mean -- are users free to speficy any combination of {feature[+-]} 
>> and would it be expected for all/most of them to make sense to the user?
>> Or does it only make sense for a few specific arch:featureA+:featureB- 
>> combinations?
>> If we only have a limited set of valid combinations, it would make sense to 
>> give users easy-to-use names.
> 
> 
> 
>> I.e. if the only valid ids for gfx111 are gfx111:foo+:bar- and gfx111:buz+, 
>> we could call them gfx111a and gfx111b and expand it into the right set of 
>> features ourselves without relying on the users not to make a typo.
> 
> This was the scheme we used before but it did not work well.
> 
> For each GPU we have a predefined set of features. Currently some GPU's 
> support xnack, some GPU's support sramecc, some GPU's support both. In the 
> future we may introduce more features. If we let each GPU has its own 
> encoding for features, it will be confusing since each letter will have 
> different meanings depending on GPU. If we let all GPU share one encoding 
> scheme, we are facing combination explosion. Most importantly, target ids are 
> used by developers for whom the GPU+Features are meaningful terms to refer to 
> a GPU configuration they want to compile for. For example, in daily life, we 
> would say "we need to build for gfx908 with xnack on and sramecc off for this 
> machine", then just use -offload-arch=gfx908:xnack+:sramecc- to compile. If 
> we use an encoding for features, then developers have to look up the encoding 
> scheme for xnack on and sramecc off, then use it in -offload-arch, which is 
> inconvenient.

It sounds like we need both something easy to use for general users and full 
control for someone who needs it.
How about this -- keep `--gpu-arch=foo` as a user-friendly interface which only 
covers known released GPUs and allow using `--offload-id` as an alternative 
which allows precise control, if/when needed? `--gpu-arch=` will internally get 
treated as a predefined `--offload-id=...` for that GPU variant.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-06-01 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 267693.
yaxunl added a comment.

emit empty target id module flag if no -target-cpu is set


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/OffloadArch.h
  clang/include/clang/Basic/TargetInfo.h
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/OffloadArch.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/CodeGenCUDA/target-id.hip
  clang/test/CodeGenOpenCL/target-id.cl
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/invalid-target-id.cl
  clang/test/Driver/invalid-target-id.hip
  clang/test/Driver/target-id-macros.cl
  clang/test/Driver/target-id-macros.hip
  clang/test/Driver/target-id.cl
  clang/test/Driver/target-id.hip

Index: clang/test/Driver/target-id.hip
===
--- /dev/null
+++ clang/test/Driver/target-id.hip
@@ -0,0 +1,51 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offload-arch=gfx908 \
+// RUN:   --offload-arch=gfx908:xnack+:sramecc+ \
+// RUN:   --offload-arch=gfx908:xnack+:sramecc- \
+// RUN:   --hip-device-lib-path=%S/Inputs/hip_dev_lib \
+// RUN:   %s 2>&1 | FileCheck %s
+
+// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908" "-fcuda-is-device"
+
+// CHECK: [[OPT:".*opt"]] {{.*}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906_BC:".*-gfx908-optimized.*bc"]]
+
+// CHECK: [[LLC: ".*llc"]] [[OPT_906_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908"
+// CHECK-SAME: {{.*}} "-target-feature" "+xnack"
+// CHECK-SAME: {{.*}} "-target-feature" "+sram-ecc"
+
+// CHECK: [[OPT]] {{.*"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906XE_BC:".*-gfx908:xnack\+:sramecc\+.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XE_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908"
+// CHECK-SAME: {{.*}} "-target-feature" "+xnack"
+// CHECK-SAME: {{.*}} "-target-feature" "-sram-ecc"
+
+// CHECK: [[OPT]] {{.*}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906XN_BC:".*-gfx908:xnack\+:sramecc\-.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XN_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+
+
+// CHECK: {{".*clang-offload-bundler"}}
+// CHECK-SAME: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx908,hip-amdgcn-amd-amdhsa-gfx908:xnack+:sramecc+,hip-amdgcn-amd-amdhsa-gfx908:xnack+:sramecc-"
Index: clang/test/Driver/target-id.cl
===
--- /dev/null
+++ clang/test/Driver/target-id.cl
@@ -0,0 +1,25 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target amdgcn-amd-amdhsa \
+// RUN:   -mcpu=gfx908:xnack+:sramecc- \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+// RUN: %clang -### -target amdgcn-amd-amdpal \
+// RUN:   -mcpu=gfx908:xnack+:sramecc- \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+// RUN: %clang -### -target amdgcn--mesa3d \
+// RUN:   -mcpu=gfx908:xnack+:sramecc- \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+// RUN: %clang -### -target amdgcn-amd-amdhsa \
+// RUN:   -nostdlib %s 2>&1 | FileCheck -check-prefix=NONE %s
+
+// CHECK: "-target-cpu" "gfx908"
+// CHECK-SAME: "-target-feature" "+xnack"
+// CHECK-SAME: "-target-feature" "-sram-ecc"
+
+// NONE-NOT: "-target-cpu"
+// NONE-NOT: "-target-feature"
Index: clang/test/Driver/target-id-macros.hip
===
--- /dev/null
+++ clang/test/Driver/target-id-macros.hip
@@ -0,0 +1,12 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -E -dM -target x86_64-linux-gnu --cuda-device-only \
+// RUN:   --offload-arch=gfx908:xnack+:sramecc- -nogpulib -o - %s 2>&1 \
+// RUN:   | FileCheck %s
+
+// CHECK-DAG: #define __amdgcn_processor__ "gfx908"
+// CHECK-DAG: #define __amdgcn_xnack__ 1
+// CHECK-DAG: #define __amdgcn_sramecc__ 0
+// CHECK-DAG: #define __amdgcn_target_id__ "gfx908:xnack+:sramecc-"
Index: clang/test/Driver/target-id-macros.cl

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-30 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

> // ...
> // RUN:   -x hip --offload-arch=gfx908 \
> // RUN:   --offload-arch=gfx908:sramecc+:xnack+
> Does this mean that HIP will create two compilation passes -- one for gfx908 
> and one for gfx908:sramecc+:xnack+ ?
> Or does it mean that the first line is ignored if you get a more detailed 
> offload arch?

It means HIP will create two compilation passes: one for gfx908 and one for 
gfx908:xnack+:sramecc+.

> One thing you'll need is a way to normalize the arch+features tuple so we can 
> compare them.

We require features in target id follow a pre-defined order. This may not be 
alphabetical order since later on we may add more features.

> What I mean -- are users free to speficy any combination of {feature[+-]} and 
> would it be expected for all/most of them to make sense to the user?
> Or does it only make sense for a few specific arch:featureA+:featureB- 
> combinations?
> If we only have a limited set of valid combinations, it would make sense to 
> give users easy-to-use names.



> I.e. if the only valid ids for gfx111 are gfx111:foo+:bar- and gfx111:buz+, 
> we could call them gfx111a and gfx111b and expand it into the right set of 
> features ourselves without relying on the users not to make a typo.

This was the scheme we used before but it did not work well.

For each GPU we have a predefined set of features. Currently some GPU's support 
xnack, some GPU's support sramecc, some GPU's support both. In the future we 
may introduce more features. If we let each GPU has its own encoding for 
features, it will be confusing since each letter will have different meanings 
depending on GPU. If we let all GPU share one encoding scheme, we are facing 
combination explosion. Most importantly, target ids are used by developers for 
whom the GPU+Features are meaningful terms to refer to a GPU configuration they 
want to compile for. For example, in daily life, we would say "we need to build 
for gfx908 with xnack on and sramecc off for this machine", then just use 
-offload-arch=gfx908:xnack+:sramecc- to compile. If we use an encoding for 
features, then developers have to look up the encoding scheme for xnack on and 
sramecc off, then use it in -offload-arch, which is inconvenient.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-28 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 266976.
yaxunl added a comment.

Emit target id module flag metadata.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/OffloadArch.h
  clang/include/clang/Basic/TargetInfo.h
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/OffloadArch.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/CodeGenCUDA/target-id.hip
  clang/test/CodeGenOpenCL/target-id.cl
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/invalid-target-id.cl
  clang/test/Driver/invalid-target-id.hip
  clang/test/Driver/target-id-macros.cl
  clang/test/Driver/target-id-macros.hip
  clang/test/Driver/target-id.cl
  clang/test/Driver/target-id.hip

Index: clang/test/Driver/target-id.hip
===
--- /dev/null
+++ clang/test/Driver/target-id.hip
@@ -0,0 +1,51 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offload-arch=gfx908 \
+// RUN:   --offload-arch=gfx908:xnack+:sramecc+ \
+// RUN:   --offload-arch=gfx908:xnack+:sramecc- \
+// RUN:   --hip-device-lib-path=%S/Inputs/hip_dev_lib \
+// RUN:   %s 2>&1 | FileCheck %s
+
+// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908" "-fcuda-is-device"
+
+// CHECK: [[OPT:".*opt"]] {{.*}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906_BC:".*-gfx908-optimized.*bc"]]
+
+// CHECK: [[LLC: ".*llc"]] [[OPT_906_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908"
+// CHECK-SAME: {{.*}} "-target-feature" "+xnack"
+// CHECK-SAME: {{.*}} "-target-feature" "+sram-ecc"
+
+// CHECK: [[OPT]] {{.*"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906XE_BC:".*-gfx908:xnack\+:sramecc\+.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XE_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908"
+// CHECK-SAME: {{.*}} "-target-feature" "+xnack"
+// CHECK-SAME: {{.*}} "-target-feature" "-sram-ecc"
+
+// CHECK: [[OPT]] {{.*}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906XN_BC:".*-gfx908:xnack\+:sramecc\-.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XN_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+
+
+// CHECK: {{".*clang-offload-bundler"}}
+// CHECK-SAME: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx908,hip-amdgcn-amd-amdhsa-gfx908:xnack+:sramecc+,hip-amdgcn-amd-amdhsa-gfx908:xnack+:sramecc-"
Index: clang/test/Driver/target-id.cl
===
--- /dev/null
+++ clang/test/Driver/target-id.cl
@@ -0,0 +1,21 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target amdgcn-amd-amdhsa \
+// RUN:   -mcpu=gfx908:xnack+:sramecc- \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+
+// RUN: %clang -### -target amdgcn-amd-amdpal \
+// RUN:   -mcpu=gfx908:xnack+:sramecc- \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+
+// RUN: %clang -### -target amdgcn--mesa3d \
+// RUN:   -mcpu=gfx908:xnack+:sramecc- \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+// CHECK: "-target-cpu" "gfx908"
+// CHECK-SAME: "-target-feature" "+xnack"
+// CHECK-SAME: "-target-feature" "-sram-ecc"
Index: clang/test/Driver/target-id-macros.hip
===
--- /dev/null
+++ clang/test/Driver/target-id-macros.hip
@@ -0,0 +1,12 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -E -dM -target x86_64-linux-gnu --cuda-device-only \
+// RUN:   --offload-arch=gfx908:xnack+:sramecc- -nogpulib -o - %s 2>&1 \
+// RUN:   | FileCheck %s
+
+// CHECK-DAG: #define __amdgcn_processor__ "gfx908"
+// CHECK-DAG: #define __amdgcn_xnack__ 1
+// CHECK-DAG: #define __amdgcn_sramecc__ 0
+// CHECK-DAG: #define __amdgcn_target_id__ "gfx908:xnack+:sramecc-"
Index: clang/test/Driver/target-id-macros.cl
===
--- /dev/null
+++ clang/test/Driver/target-id-macros.cl
@@ -0,0 +1,20 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: 

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments.



Comment at: clang/lib/Driver/ToolChains/HIP.cpp:121-123
+  auto Pos = SubArchName.find_first_of("+-");
+  if (Pos != SubArchName.npos)
+SubArchName = SubArchName.substr(0, Pos);

yaxunl wrote:
> tra wrote:
> > yaxunl wrote:
> > > tra wrote:
> > > > Parsing should probably be extracted into a separate function to avoid 
> > > > replicating it all over the place.
> > > > 
> > > > I'd also propose use a different syntax for the properties.
> > > > * use explicit character to separate individual elements. This way 
> > > > splitting the properties becomes independent of what those properties 
> > > > are. If you decide to make properties with values or change their 
> > > > meaning some other way, it would not affect how you compose them.
> > > > * use `name=value` or `name[+-]` for individual properties. This makes 
> > > > it easy to parse individual properties and normalize their names. This 
> > > > makes property map creation independent of the property values.
> > > > 
> > > > Right now `[+-]` serves as both a separator and as the value, which 
> > > > would present problems if you ever need more flexible parametrization 
> > > > of properties. What if a property must be a number or a string. 
> > > > Granted, you can always encode them as a set of bools, but that's 
> > > > rather unpractical. 
> > > > 
> > > > E.g. something like this would work a bit better: 
> > > > `gfx111:foo+:bar=33:buz=string`.
> > > > 
> > > I discussed this with our team.
> > > 
> > > The target id features are not raw GPU properties. They are distilled to 
> > > become a few target features to decide what the compiler should do.
> > > 
> > > Each target feature has 3 values: on, off, and default, which are encoded 
> > > as +feature, -feature, and not present.
> > > 
> > > For runtime, it is simple and clear how to choose device binaries based 
> > > on the target features: it will try exact match, otherwise choose the 
> > > default.
> > > 
> > > For compiler, it is also simple and clear what to do for each target 
> > > feature value, since they corresponding to backend target features.
> > > 
> > > Basically we expect the target id feature to be like flags, not key/value 
> > > pairs. In case we do need key/value pairs, they can still use + as 
> > > delimiter.
> > > 
> > > Another reason we use +/- format is that it is more in line with the 
> > > format of existing clang-offload-bundler id and target triple, which uses 
> > > - as delimiter.
> > > 
> > > Since the target id is an extension of offload arch and users will put it 
> > > into command line, we want to make it short, concise and aesthetically 
> > > appealing, we would avoid using non-alpha-numeric characters in the 
> > > target id features. Target triple components have similar requirements. 
> > > Using : as delimiter seems unnecessary, longer, and more difficult to 
> > > read.
> > > 
> > > Consider the following example
> > > 
> > > 
> > > ```
> > > clang -offload-id gfx908+xnack-sramecc a.hip
> > > 
> > > clang -offload-id gfx908:xnack+:sramecc- a.hip
> > > ```
> > > 
> > > We are more inclined to keep the original format. 
> > You're thinking in terms what's needed by AMDGPU *now*. The scheme you're 
> > proposing is sufficient for your use case and I'm fine with that. I'm 
> > suggesting that you should consider what happens once this change lands.
> > 
> > The functionality you're implementing is exposed to end-users via top-level 
> > clang driver argument. This is visible to users and will be relied on.
> > This will make it hard to change in the future without breaking someone. 
> > It's worth making sure we're not painting ourselves in the corner here.
> > 
> > Also, the functionality may be useful/applicable beyond the scope of amdgpu 
> > and the binary flags will not be sufficient for everyone. The scheme you're 
> > proposing would be somewhat restrictive if I need to pass an integer value 
> > or string. We could use something like `gfx123+foo=456-bar=789` but it 
> > would look rather odd, IMO. 
> > 
> > Granted, none of the above is a showstopper. I guess we could support 
> > multiple formats if it comes to that, but I'd rather not multiply things 
> > later because we didn't think of them earlier.
> > 
> > > Another reason we use +/- format is that it is more in line with the 
> > > format of existing clang-offload-bundler id and target triple, which uses 
> > > - as delimiter.
> > 
> > The point was that commingling field separator and the field value is not 
> > the cleanest approach, IMO. I'd be fine fine with some other character.
> > 
> > > Since the target id is an extension of offload arch and users will put it 
> > > into command line, we want to make it short, concise and aesthetically 
> > > appealing, we would avoid using non-alpha-numeric characters in the 
> > > target id features. Target triple components have similar requirements. 
> > > Using : as delimiter 

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-26 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 266410.
yaxunl added a comment.

Changed target id format to be like `gfx908:xnack+:sramecc-`.

I tried to introduce --offload-target-id but found that is not good because: 1. 
it will cause redundant code since I have to handle these options separately in 
CUDA and HIP action builder; 2. it causes unnecessary complexity since I have 
to handle interaction between `--offload-arch` and `--offload-target-id`, 
especially the special case of `all`; 3. `--offload-target-id` is really the 
same thing as `--offload-arch`. Therefore I kept using `--offload-arch`. For 
CUDA this is NFC, since it is not checked as target id.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/OffloadArch.h
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/OffloadArch.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/invalid-target-id.cl
  clang/test/Driver/invalid-target-id.hip
  clang/test/Driver/target-id-macros.cl
  clang/test/Driver/target-id-macros.hip
  clang/test/Driver/target-id.cl
  clang/test/Driver/target-id.hip

Index: clang/test/Driver/target-id.hip
===
--- /dev/null
+++ clang/test/Driver/target-id.hip
@@ -0,0 +1,51 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offload-arch=gfx908 \
+// RUN:   --offload-arch=gfx908:xnack+:sramecc+ \
+// RUN:   --offload-arch=gfx908:xnack+:sramecc- \
+// RUN:   --hip-device-lib-path=%S/Inputs/hip_dev_lib \
+// RUN:   %s 2>&1 | FileCheck %s
+
+// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908" "-fcuda-is-device"
+
+// CHECK: [[OPT:".*opt"]] {{.*}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906_BC:".*-gfx908-optimized.*bc"]]
+
+// CHECK: [[LLC: ".*llc"]] [[OPT_906_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908"
+// CHECK-SAME: {{.*}} "-target-feature" "+xnack"
+// CHECK-SAME: {{.*}} "-target-feature" "+sram-ecc"
+
+// CHECK: [[OPT]] {{.*"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906XE_BC:".*-gfx908:xnack\+:sramecc\+.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XE_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908"
+// CHECK-SAME: {{.*}} "-target-feature" "+xnack"
+// CHECK-SAME: {{.*}} "-target-feature" "-sram-ecc"
+
+// CHECK: [[OPT]] {{.*}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906XN_BC:".*-gfx908:xnack\+:sramecc\-.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XN_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+
+
+// CHECK: {{".*clang-offload-bundler"}}
+// CHECK-SAME: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx908,hip-amdgcn-amd-amdhsa-gfx908:xnack+:sramecc+,hip-amdgcn-amd-amdhsa-gfx908:xnack+:sramecc-"
Index: clang/test/Driver/target-id.cl
===
--- /dev/null
+++ clang/test/Driver/target-id.cl
@@ -0,0 +1,21 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target amdgcn-amd-amdhsa \
+// RUN:   -mcpu=gfx908:xnack+:sramecc- \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+
+// RUN: %clang -### -target amdgcn-amd-amdpal \
+// RUN:   -mcpu=gfx908:xnack+:sramecc- \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+
+// RUN: %clang -### -target amdgcn--mesa3d \
+// RUN:   -mcpu=gfx908:xnack+:sramecc- \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+// CHECK: "-target-cpu" "gfx908"
+// CHECK-SAME: "-target-feature" "+xnack"
+// CHECK-SAME: "-target-feature" "-sram-ecc"
Index: clang/test/Driver/target-id-macros.hip
===
--- /dev/null
+++ clang/test/Driver/target-id-macros.hip
@@ -0,0 +1,12 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -E -dM -target x86_64-linux-gnu --cuda-device-only \
+// RUN:   --offload-arch=gfx908:xnack+:sramecc- -nogpulib -o - %s 2>&1 \
+// RUN:   | FileCheck %s
+
+// CHECK-DAG: #define __amdgcn_processor__ "gfx908"
+// CHECK-DAG: #define 

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-26 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked an inline comment as done.
yaxunl added inline comments.



Comment at: clang/lib/Driver/ToolChains/HIP.cpp:121-123
+  auto Pos = SubArchName.find_first_of("+-");
+  if (Pos != SubArchName.npos)
+SubArchName = SubArchName.substr(0, Pos);

tra wrote:
> yaxunl wrote:
> > tra wrote:
> > > Parsing should probably be extracted into a separate function to avoid 
> > > replicating it all over the place.
> > > 
> > > I'd also propose use a different syntax for the properties.
> > > * use explicit character to separate individual elements. This way 
> > > splitting the properties becomes independent of what those properties 
> > > are. If you decide to make properties with values or change their meaning 
> > > some other way, it would not affect how you compose them.
> > > * use `name=value` or `name[+-]` for individual properties. This makes it 
> > > easy to parse individual properties and normalize their names. This makes 
> > > property map creation independent of the property values.
> > > 
> > > Right now `[+-]` serves as both a separator and as the value, which would 
> > > present problems if you ever need more flexible parametrization of 
> > > properties. What if a property must be a number or a string. Granted, you 
> > > can always encode them as a set of bools, but that's rather unpractical. 
> > > 
> > > E.g. something like this would work a bit better: 
> > > `gfx111:foo+:bar=33:buz=string`.
> > > 
> > I discussed this with our team.
> > 
> > The target id features are not raw GPU properties. They are distilled to 
> > become a few target features to decide what the compiler should do.
> > 
> > Each target feature has 3 values: on, off, and default, which are encoded 
> > as +feature, -feature, and not present.
> > 
> > For runtime, it is simple and clear how to choose device binaries based on 
> > the target features: it will try exact match, otherwise choose the default.
> > 
> > For compiler, it is also simple and clear what to do for each target 
> > feature value, since they corresponding to backend target features.
> > 
> > Basically we expect the target id feature to be like flags, not key/value 
> > pairs. In case we do need key/value pairs, they can still use + as 
> > delimiter.
> > 
> > Another reason we use +/- format is that it is more in line with the format 
> > of existing clang-offload-bundler id and target triple, which uses - as 
> > delimiter.
> > 
> > Since the target id is an extension of offload arch and users will put it 
> > into command line, we want to make it short, concise and aesthetically 
> > appealing, we would avoid using non-alpha-numeric characters in the target 
> > id features. Target triple components have similar requirements. Using : as 
> > delimiter seems unnecessary, longer, and more difficult to read.
> > 
> > Consider the following example
> > 
> > 
> > ```
> > clang -offload-id gfx908+xnack-sramecc a.hip
> > 
> > clang -offload-id gfx908:xnack+:sramecc- a.hip
> > ```
> > 
> > We are more inclined to keep the original format. 
> You're thinking in terms what's needed by AMDGPU *now*. The scheme you're 
> proposing is sufficient for your use case and I'm fine with that. I'm 
> suggesting that you should consider what happens once this change lands.
> 
> The functionality you're implementing is exposed to end-users via top-level 
> clang driver argument. This is visible to users and will be relied on.
> This will make it hard to change in the future without breaking someone. It's 
> worth making sure we're not painting ourselves in the corner here.
> 
> Also, the functionality may be useful/applicable beyond the scope of amdgpu 
> and the binary flags will not be sufficient for everyone. The scheme you're 
> proposing would be somewhat restrictive if I need to pass an integer value or 
> string. We could use something like `gfx123+foo=456-bar=789` but it would 
> look rather odd, IMO. 
> 
> Granted, none of the above is a showstopper. I guess we could support 
> multiple formats if it comes to that, but I'd rather not multiply things 
> later because we didn't think of them earlier.
> 
> > Another reason we use +/- format is that it is more in line with the format 
> > of existing clang-offload-bundler id and target triple, which uses - as 
> > delimiter.
> 
> The point was that commingling field separator and the field value is not the 
> cleanest approach, IMO. I'd be fine fine with some other character.
> 
> > Since the target id is an extension of offload arch and users will put it 
> > into command line, we want to make it short, concise and aesthetically 
> > appealing, we would avoid using non-alpha-numeric characters in the target 
> > id features. Target triple components have similar requirements. Using : as 
> > delimiter seems unnecessary, longer, and more difficult to read.
> 
> The current use of `gfxXXX` seems to fit the 'short, concise & aesthetically 
> pleasing' part of your argument much 

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments.



Comment at: clang/lib/Driver/ToolChains/HIP.cpp:121-123
+  auto Pos = SubArchName.find_first_of("+-");
+  if (Pos != SubArchName.npos)
+SubArchName = SubArchName.substr(0, Pos);

yaxunl wrote:
> tra wrote:
> > Parsing should probably be extracted into a separate function to avoid 
> > replicating it all over the place.
> > 
> > I'd also propose use a different syntax for the properties.
> > * use explicit character to separate individual elements. This way 
> > splitting the properties becomes independent of what those properties are. 
> > If you decide to make properties with values or change their meaning some 
> > other way, it would not affect how you compose them.
> > * use `name=value` or `name[+-]` for individual properties. This makes it 
> > easy to parse individual properties and normalize their names. This makes 
> > property map creation independent of the property values.
> > 
> > Right now `[+-]` serves as both a separator and as the value, which would 
> > present problems if you ever need more flexible parametrization of 
> > properties. What if a property must be a number or a string. Granted, you 
> > can always encode them as a set of bools, but that's rather unpractical. 
> > 
> > E.g. something like this would work a bit better: 
> > `gfx111:foo+:bar=33:buz=string`.
> > 
> I discussed this with our team.
> 
> The target id features are not raw GPU properties. They are distilled to 
> become a few target features to decide what the compiler should do.
> 
> Each target feature has 3 values: on, off, and default, which are encoded as 
> +feature, -feature, and not present.
> 
> For runtime, it is simple and clear how to choose device binaries based on 
> the target features: it will try exact match, otherwise choose the default.
> 
> For compiler, it is also simple and clear what to do for each target feature 
> value, since they corresponding to backend target features.
> 
> Basically we expect the target id feature to be like flags, not key/value 
> pairs. In case we do need key/value pairs, they can still use + as delimiter.
> 
> Another reason we use +/- format is that it is more in line with the format 
> of existing clang-offload-bundler id and target triple, which uses - as 
> delimiter.
> 
> Since the target id is an extension of offload arch and users will put it 
> into command line, we want to make it short, concise and aesthetically 
> appealing, we would avoid using non-alpha-numeric characters in the target id 
> features. Target triple components have similar requirements. Using : as 
> delimiter seems unnecessary, longer, and more difficult to read.
> 
> Consider the following example
> 
> 
> ```
> clang -offload-id gfx908+xnack-sramecc a.hip
> 
> clang -offload-id gfx908:xnack+:sramecc- a.hip
> ```
> 
> We are more inclined to keep the original format. 
You're thinking in terms what's needed by AMDGPU *now*. The scheme you're 
proposing is sufficient for your use case and I'm fine with that. I'm 
suggesting that you should consider what happens once this change lands.

The functionality you're implementing is exposed to end-users via top-level 
clang driver argument. This is visible to users and will be relied on.
This will make it hard to change in the future without breaking someone. It's 
worth making sure we're not painting ourselves in the corner here.

Also, the functionality may be useful/applicable beyond the scope of amdgpu and 
the binary flags will not be sufficient for everyone. The scheme you're 
proposing would be somewhat restrictive if I need to pass an integer value or 
string. We could use something like `gfx123+foo=456-bar=789` but it would look 
rather odd, IMO. 

Granted, none of the above is a showstopper. I guess we could support multiple 
formats if it comes to that, but I'd rather not multiply things later because 
we didn't think of them earlier.

> Another reason we use +/- format is that it is more in line with the format 
> of existing clang-offload-bundler id and target triple, which uses - as 
> delimiter.

The point was that commingling field separator and the field value is not the 
cleanest approach, IMO. I'd be fine fine with some other character.

> Since the target id is an extension of offload arch and users will put it 
> into command line, we want to make it short, concise and aesthetically 
> appealing, we would avoid using non-alpha-numeric characters in the target id 
> features. Target triple components have similar requirements. Using : as 
> delimiter seems unnecessary, longer, and more difficult to read.

The current use of `gfxXXX` seems to fit the 'short, concise & aesthetically 
pleasing' part of your argument much better than the proposed scheme.

Is the end user allowed to specify an arbitrary set of the features? Or is the 
offload-id set restricted to a smaller number of combinations (i.e. tied to 
particular hardware variants). I vaguely recall that in 

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-26 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 266315.
yaxunl marked an inline comment as done.
yaxunl added a comment.
Herald added subscribers: kerbowa, nhaehnle, jvesely.

Fixed passing target id to clang -cc1. Added predefined macros for target id.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/OffloadArch.h
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/OffloadArch.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/invalid-target-id.cl
  clang/test/Driver/invalid-target-id.hip
  clang/test/Driver/target-id-macros.cl
  clang/test/Driver/target-id-macros.hip
  clang/test/Driver/target-id.cl
  clang/test/Driver/target-id.hip

Index: clang/test/Driver/target-id.hip
===
--- /dev/null
+++ clang/test/Driver/target-id.hip
@@ -0,0 +1,54 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offload-arch=gfx908 \
+// RUN:   --offload-arch=gfx908+xnack+sramecc \
+// RUN:   --offload-arch=gfx908+xnack-sramecc \
+// RUN:   --hip-device-lib-path=%S/Inputs/hip_dev_lib \
+// RUN:   %s 2>&1 | FileCheck %s
+
+// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908" "-fcuda-is-device"
+
+// CHECK: [[OPT:".*opt"]] {{".*-gfx908-linked.*bc"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906_BC:".*-gfx908-optimized.*bc"]]
+
+// CHECK: [[LLC: ".*llc"]] [[OPT_906_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+// CHECK-SAME: "-o" {{".*-gfx908-.*o"}}
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908"
+// CHECK-SAME: {{.*}} "-target-feature" "+xnack"
+// CHECK-SAME: {{.*}} "-target-feature" "+sram-ecc"
+
+// CHECK: [[OPT]] {{".*-gfx908\+xnack\+sramecc.*bc"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906XE_BC:".*-gfx908\+xnack\+sramecc.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XE_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+// CHECK-SAME: "-o" {{".*-gfx908\+xnack\+sramecc.*o"}}
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908"
+// CHECK-SAME: {{.*}} "-target-feature" "+xnack"
+// CHECK-SAME: {{.*}} "-target-feature" "-sram-ecc"
+
+// CHECK: [[OPT]] {{".*-gfx908\+xnack-sramecc.*bc"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906XN_BC:".*-gfx908\+xnack-sramecc.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XN_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+// CHECK-SAME: "-o" {{".*-gfx908\+xnack-sramecc.*o"}}
+
+
+// CHECK: {{".*clang-offload-bundler"}}
+// CHECK-SAME: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx908,hip-amdgcn-amd-amdhsa-gfx908+xnack+sramecc,hip-amdgcn-amd-amdhsa-gfx908+xnack-sramecc"
Index: clang/test/Driver/target-id.cl
===
--- /dev/null
+++ clang/test/Driver/target-id.cl
@@ -0,0 +1,21 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target amdgcn-amd-amdhsa \
+// RUN:   -mcpu=gfx908+xnack-sramecc \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+
+// RUN: %clang -### -target amdgcn-amd-amdpal \
+// RUN:   -mcpu=gfx908+xnack-sramecc \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+
+// RUN: %clang -### -target amdgcn--mesa3d \
+// RUN:   -mcpu=gfx908+xnack-sramecc \
+// RUN:   -nostdlib %s 2>&1 | FileCheck %s
+
+// CHECK: "-target-cpu" "gfx908"
+// CHECK-SAME: "-target-feature" "+xnack"
+// CHECK-SAME: "-target-feature" "-sram-ecc"
Index: clang/test/Driver/target-id-macros.hip
===
--- /dev/null
+++ clang/test/Driver/target-id-macros.hip
@@ -0,0 +1,12 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -E -dM -target x86_64-linux-gnu --cuda-device-only \
+// RUN:   --offload-arch=gfx908+xnack-sramecc -nogpulib -o - %s 2>&1 \
+// RUN:   | FileCheck %s
+
+// CHECK-DAG: #define __amdgcn_processor__ "gfx908"
+// CHECK-DAG: #define __amdgcn_xnack__ 1
+// CHECK-DAG: #define __amdgcn_sramecc__ 0
+// CHECK-DAG: #define __amdgcn_target_id__ "gfx908+xnack-sramecc"
Index: clang/test/Driver/target-id-macros.cl
===
--- /dev/null

[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-25 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked 3 inline comments as done.
yaxunl added inline comments.



Comment at: clang/lib/Basic/HIP.cpp:16
+const llvm::SmallVector
+getAllPossibleTargetIdFeatures(llvm::StringRef Device) {
+  llvm::SmallVector Ret;

tra wrote:
> Nit: there's an unfortunate clash with already [[ 
> https://github.com/llvm/llvm-project/blob/6a3469f58d0c230e86043f6975f048968334dfa4/clang/include/clang/Driver/CC1Options.td#L23
>  | target-feature ]] in clang & LLVM.
> 
> Would something like `GPUProperties` be a reasonable name?
We call it target id feature to differentiate it from target feature. A target 
id feature usually corresponds to a target feature although it may not 
necessarily true.

Since target id feature sounds too close to target feature, it is reasonable to 
give it a different name.

How about OffloadArchFeatures ? Since they are used as features of the extended 
-offload-arch option.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-25 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked 2 inline comments as done.
yaxunl added inline comments.



Comment at: clang/lib/Driver/ToolChains/HIP.cpp:121-123
+  auto Pos = SubArchName.find_first_of("+-");
+  if (Pos != SubArchName.npos)
+SubArchName = SubArchName.substr(0, Pos);

tra wrote:
> Parsing should probably be extracted into a separate function to avoid 
> replicating it all over the place.
> 
> I'd also propose use a different syntax for the properties.
> * use explicit character to separate individual elements. This way splitting 
> the properties becomes independent of what those properties are. If you 
> decide to make properties with values or change their meaning some other way, 
> it would not affect how you compose them.
> * use `name=value` or `name[+-]` for individual properties. This makes it 
> easy to parse individual properties and normalize their names. This makes 
> property map creation independent of the property values.
> 
> Right now `[+-]` serves as both a separator and as the value, which would 
> present problems if you ever need more flexible parametrization of 
> properties. What if a property must be a number or a string. Granted, you can 
> always encode them as a set of bools, but that's rather unpractical. 
> 
> E.g. something like this would work a bit better: 
> `gfx111:foo+:bar=33:buz=string`.
> 
I discussed this with our team.

The target id features are not raw GPU properties. They are distilled to become 
a few target features to decide what the compiler should do.

Each target feature has 3 values: on, off, and default, which are encoded as 
+feature, -feature, and not present.

For runtime, it is simple and clear how to choose device binaries based on the 
target features: it will try exact match, otherwise choose the default.

For compiler, it is also simple and clear what to do for each target feature 
value, since they corresponding to backend target features.

Basically we expect the target id feature to be like flags, not key/value 
pairs. In case we do need key/value pairs, they can still use + as delimiter.

Another reason we use +/- format is that it is more in line with the format of 
existing clang-offload-bundler id and target triple, which uses - as delimiter.

Since the target id is an extension of offload arch and users will put it into 
command line, we want to make it short, concise and aesthetically appealing, we 
would avoid using non-alpha-numeric characters in the target id features. 
Target triple components have similar requirements. Using : as delimiter seems 
unnecessary, longer, and more difficult to read.

Consider the following example


```
clang -offload-id gfx908+xnack-sramecc a.hip

clang -offload-id gfx908:xnack+:sramecc- a.hip
```

We are more inclined to keep the original format. 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments.



Comment at: clang/lib/Basic/HIP.cpp:16
+const llvm::SmallVector
+getAllPossibleTargetIdFeatures(llvm::StringRef Device) {
+  llvm::SmallVector Ret;

Nit: there's an unfortunate clash with already [[ 
https://github.com/llvm/llvm-project/blob/6a3469f58d0c230e86043f6975f048968334dfa4/clang/include/clang/Driver/CC1Options.td#L23
 | target-feature ]] in clang & LLVM.

Would something like `GPUProperties` be a reasonable name?



Comment at: clang/lib/Driver/ToolChains/HIP.cpp:121-123
+  auto Pos = SubArchName.find_first_of("+-");
+  if (Pos != SubArchName.npos)
+SubArchName = SubArchName.substr(0, Pos);

Parsing should probably be extracted into a separate function to avoid 
replicating it all over the place.

I'd also propose use a different syntax for the properties.
* use explicit character to separate individual elements. This way splitting 
the properties becomes independent of what those properties are. If you decide 
to make properties with values or change their meaning some other way, it would 
not affect how you compose them.
* use `name=value` or `name[+-]` for individual properties. This makes it easy 
to parse individual properties and normalize their names. This makes property 
map creation independent of the property values.

Right now `[+-]` serves as both a separator and as the value, which would 
present problems if you ever need more flexible parametrization of properties. 
What if a property must be a number or a string. Granted, you can always encode 
them as a set of bools, but that's rather unpractical. 

E.g. something like this would work a bit better: 
`gfx111:foo+:bar=33:buz=string`.



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-19 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D60620#1464633 , @tra wrote:

> It looks like you are solving two problems here.
>  a) you want to create multiple device passes for the same GPU, but with 
> different options.
>  b) you may want to pass different compiler options to different device 
> compilations.
>  The patch effectively hard-codes {gpu, options} tuple into 
> --offloading-target-id variant.
>  Is that correct?
>
> This looks essentially the same as your previous patch D59863 
> .
>
> We have a limited way to deal with (b), but there's currently no way to deal 
> with (a).
>
> For (a), I think, the real problem is that until now we've assumed that 
> there's only one device-side compilation per target GPU arch. If we need 
> multiple device-side compilations, we need a way to name them.  Using 
> `offloading-target-id` as  a super-set of `--cuda-gpu-arch` is OK with me. 
> However, I'm on the fence about the option serving a double-duty of setting 
> magic compiler flags. On one hand, that's what driver does, so it may be OK. 
> On the other hand, it's unnecessarily strict. I.e. if we provide ability to 
> create multiple compilation passes for the same GPU arch, why limit that to 
> only changing those hard-coded options? A general approach would allow a way 
> to create more than one device-side compilation and provide arbitrary 
> compiler options only to *that* compilation. Thiw will also help solving 
> number of issues we have right now when some host-side compilation options 
> break device-side compilation and we have to work around that by filtering 
> out some of them in the driver.


This patch is trying to solve the issue about GPU arch explosion due to 
combination of GPU configurations. A GPU may have several configurations which 
require different ISA's. From the compiler point of view, the GPU plus 
configuration behaves like different GPU archs. Previously we have been using 
different gfx names for the same GPU with different configurations. However, 
that does not scale. Therefore in this patch we extend GPU arch to `target id`, 
which is something like gpu+feature1-feature2.

The features allowed in target id are not arbitrary target features. They 
corresponding a limited number of GPU configurations that HIP runtime 
understands. Basically HIP runtime looks at the target id of the device objects 
in a fat binary and knows which one is best for the current GPU configuration. 
On the other hand, this is not some feature that can be easily implemented by 
users, since it needs knowledge about GPU configurations and corresponding 
compiler options for such configurations. Therefore, this is some feature 
better implemented within HIP compiler/runtime.

For embedding multiple device binaries for the same GPU but compiled with 
different options in one fat binary, since HIP runtime does not know which one 
to load, I don't think it is useful. On the other hand, users can always 
implement their own mechanisms for using device binaries compiled with 
different options with their own logic about how to choose them, therefore this 
is better left to the users.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60620: [HIP] Support target id by --offload-arch

2020-05-19 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 265025.
yaxunl retitled this revision from "[HIP] Support -offloading-target-id" to 
"[HIP] Support target id by --offload-arch".
yaxunl edited the summary of this revision.
yaxunl added a comment.

rebased the patch and revised by passing target id by `--offload-arch`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/HIP.h
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/HIP.cpp
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/Driver/hip-invalid-target-id.hip
  clang/test/Driver/hip-target-id.hip

Index: clang/test/Driver/hip-target-id.hip
===
--- /dev/null
+++ clang/test/Driver/hip-target-id.hip
@@ -0,0 +1,56 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offload-arch=gfx908 \
+// RUN:   --offload-arch=gfx908+xnack+sramecc \
+// RUN:   --offload-arch=gfx908+xnack-sramecc \
+// RUN:   --hip-device-lib-path=%S/Inputs/hip_dev_lib \
+// RUN:   %s 2>&1 | FileCheck %s
+
+// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
+// CHECK-SAME: "-emit-llvm-bc"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908"
+
+// CHECK: [[OPT:".*opt"]] {{".*-gfx908-linked.*bc"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906_BC:".*-gfx908-optimized.*bc"]]
+
+// CHECK: [[LLC: ".*llc"]] [[OPT_906_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+// CHECK-SAME: "-o" {{".*-gfx908-.*o"}}
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
+// CHECK-SAME: "-emit-llvm-bc"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908" "-mxnack" "-msram-ecc"
+
+// CHECK: [[OPT]] {{".*-gfx908\+xnack\+sramecc.*bc"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906XE_BC:".*-gfx908\+xnack\+sramecc.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XE_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+// CHECK-SAME: "-o" {{".*-gfx908\+xnack\+sramecc.*o"}}
+
+// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
+// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
+// CHECK-SAME: "-emit-llvm-bc"
+// CHECK-SAME: {{.*}} "-target-cpu" "gfx908" "-mxnack" "-mno-sram-ecc"
+
+// CHECK: [[OPT]] {{".*-gfx908\+xnack.*bc"}} "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: "-mcpu=gfx908"
+// CHECK-SAME: "-o" [[OPT_906XN_BC:".*-gfx908\+xnack.*bc"]]
+
+// CHECK: [[LLC]] [[OPT_906XN_BC]]
+// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa"
+// CHECK-SAME: {{.*}} "-mcpu=gfx908"
+// CHECK-SAME: "-o" {{".*-gfx908\+xnack.*o"}}
+
+
+// CHECK: {{".*clang-offload-bundler"}}
+// CHECK-SAME: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx908,hip-amdgcn-amd-amdhsa-gfx908+xnack+sramecc,hip-amdgcn-amd-amdhsa-gfx908+xnack-sramecc"
Index: clang/test/Driver/hip-invalid-target-id.hip
===
--- /dev/null
+++ clang/test/Driver/hip-invalid-target-id.hip
@@ -0,0 +1,48 @@
+// REQUIRES: clang-driver
+// REQUIRES: x86-registered-target
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offload-arch=gfx908 \
+// RUN:   --offload-arch=gfx908xnack \
+// RUN:   --hip-device-lib-path=%S/Inputs/hip_dev_lib \
+// RUN:   %s 2>&1 | FileCheck -check-prefix=NOPLUS %s
+
+// NOPLUS: error: Unsupported CUDA gpu architecture
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offload-arch=gfx908 \
+// RUN:   --offload-arch=gfx908+sramecc+xnack \
+// RUN:   --hip-device-lib-path=%S/Inputs/hip_dev_lib \
+// RUN:   %s 2>&1 | FileCheck -check-prefix=ORDER %s
+
+// ORDER: error: Invalid HIP offloading target id: gfx908+sramecc+xnack
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offload-arch=gfx908 \
+// RUN:   --offload-arch=gfx908+unknown \
+// RUN:   --offload-arch=gfx908+sramecc+unknown \
+// RUN:   --offload-arch=gfx900+xnack \
+// RUN:   --hip-device-lib-path=%S/Inputs/hip_dev_lib \
+// RUN:   %s 2>&1 | FileCheck -check-prefix=UNK %s
+
+// UNK: error: Invalid HIP offloading target id: gfx908+unknown
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offload-arch=gfx908 \
+// RUN:   --offload-arch=gfx908+sramecc+unknown \
+// RUN:   --offload-arch=gfx900+xnack \
+// RUN:   --hip-device-lib-path=%S/Inputs/hip_dev_lib \
+// RUN:   %s 2>&1 | FileCheck -check-prefix=MIXED %s
+
+// MIXED: error: Invalid HIP offloading target id: gfx908+sramecc+unknown
+
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   -x hip --offload-arch=gfx908 \
+// RUN: