[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-09-08 Thread Francis Ricci via Phabricator via cfe-commits
fjricci added inline comments.



Comment at: lib/Driver/ToolChain.cpp:851
+  XOpenMPTargetArg->setBaseArg(A);
+  A = XOpenMPTargetArg.release();
+  DAL->append(A);

Hahnfeld wrote:
> This is a memory leak that is currently triggered in 
> `tests/Driver/openmp-offload-gpu.c` and found by ASan. How to fix this? I'm 
> not really familiar with OptTable...
Even with the follow-up patch to fix the memory leak, I'm still seeing this 
pointer leaked (on Darwin with ASan and detect_leaks=1). I've tried playing 
around with a few fixes myself, but haven't been able to get anything working.


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-08-14 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added inline comments.



Comment at: lib/Driver/ToolChain.cpp:851
+  XOpenMPTargetArg->setBaseArg(A);
+  A = XOpenMPTargetArg.release();
+  DAL->append(A);

This is a memory leak that is currently triggered in 
`tests/Driver/openmp-offload-gpu.c` and found by ASan. How to fix this? I'm not 
really familiar with OptTable...


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-08-07 Thread Gheorghe-Teod Bercea via cfe-commits
I am aware of the failure, I am attempting
to push a fix as we speak!Thanks for the patience.--DoruFrom:      
 Aleksey Shlyapnikov
via Phabricator To:      
 gheorghe-teod.ber...@ibm.com,
hfin...@anl.gov, hahnf...@itc.rwth-aachen.de, cber...@us.ibm.com, caom...@us.ibm.com,
a.bat...@hotmail.comCc:      
 aleks...@google.com,
guansong.zh...@amd.com, cfe-commits@lists.llvm.org, lil...@gmail.com, acja...@us.ibm.comDate:      
 08/07/2017 01:58 PMSubject:    
   [PATCH] D34784:
[OpenMP] Add flag for specifying the target device architecture for OpenMP
device offloadingalekseyshl added a comment.http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/7010is unhappy about this change, please fix.https://reviews.llvm.org/D34784

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-08-07 Thread Aleksey Shlyapnikov via Phabricator via cfe-commits
alekseyshl added a comment.

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/7010 is 
unhappy about this change, please fix.


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-08-07 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 110007.
gtbercea added a comment.

Fix test comments.


https://reviews.llvm.org/D34784

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  include/clang/Driver/ToolChain.h
  lib/Driver/Compilation.cpp
  lib/Driver/ToolChain.cpp
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -597,3 +597,35 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s
 
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}} "-aux-triple" "powerpc64le--linux" {{.*}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
+
+/// ###
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr7 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -mcpu=pwr7 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} "-target-cpu" "pwr7"
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr7 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -mcpu=pwr7 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} "-target-cpu" "pwr7"
+
+/// ###
+
+/// Check -Xopenmp-target triggers error when multiple triples are used.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu,powerpc64le-unknown-linux-gnu -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR %s
+
+// CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR: clang{{.*}} error: cannot deduce implicit triple value for -Xopenmp-target, specify triple using -Xopenmp-target=
+
+/// ###
+
+/// Check -Xopenmp-target triggers error when an option requiring arguments is passed to it.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-NESTED-ERROR %s
+
+// CHK-FOPENMP-TARGET-NESTED-ERROR: clang{{.*}} error: invalid -Xopenmp-target argument: '-Xopenmp-target -Xopenmp-target', options requiring arguments are unsupported
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,18 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  // If this is an OpenMP action we need to extract the device architecture
+  // from the -march=arch option. This option may come from -Xopenmp-target
+  // flag or the default value.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchName = Args.getLastArgValue(options::OPT_march_EQ);
+assert(!GPUArchName.empty() && "Must have an architecture passed in.");
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -405,7 +415,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +428,13 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+StringRef Arch = DAL->getLastArgValue(options::OPT_march_EQ);
+if (Arch.empty())
+  // Default compute capability for CUDA toolchain is sm_20.
+  DAL->AddJoinedArg(nullptr,
+  Opts.getOption(options::OPT_march_EQ), "sm_20");
+
 return DAL;
   }
 
Index: lib/Driver/ToolChain.cpp
===
--- lib/Driver/ToolChain.cpp
+++ lib/Driver/ToolChain.cpp
@@ -795,3 +795,65 @@
 
   return VersionTuple();
 }
+
+llvm::opt::DerivedArgList *
+ToolChain::TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList ,
+Action::OffloadKind DeviceOffloadKind) const {
+  if (DeviceOffloadKind == Action::OFK_OpenMP) {
+DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
+const OptTable  = getDriver().getOpts();
+
+// Handle -Xopenmp-target flags
+for (Arg 

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-08-06 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel accepted this revision.
hfinkel added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks for all of your work on this!




Comment at: test/Driver/openmp-offload.c:603
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 is passed when 
compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp 
-fopenmp-targets=powerpc64le-ibm-linux-gnu 
-Xopenmp-target=powerpc64le-ibm-linux-gnu -mcpu=pwr7 %s 2>&1 \

Comment should say pwr7, not pwr8, to match the test.



Comment at: test/Driver/openmp-offload.c:611
+
+/// Check -Xopenmp-target -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp 
-fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -mcpu=pwr7 %s 2>&1 \

Comment should say pwr7, not pwr8, to match the test.


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-08-06 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 109939.
gtbercea added a comment.

Fix -march special casing.


https://reviews.llvm.org/D34784

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  include/clang/Driver/ToolChain.h
  lib/Driver/Compilation.cpp
  lib/Driver/ToolChain.cpp
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -597,3 +597,35 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s
 
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}} "-aux-triple" "powerpc64le--linux" {{.*}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
+
+/// ###
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -mcpu=pwr7 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} "-target-cpu" "pwr7"
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -mcpu=pwr7 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} "-target-cpu" "pwr7"
+
+/// ###
+
+/// Check -Xopenmp-target triggers error when multiple triples are used.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu,powerpc64le-unknown-linux-gnu -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR %s
+
+// CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR: clang{{.*}} error: cannot deduce implicit triple value for -Xopenmp-target, specify triple using -Xopenmp-target=
+
+/// ###
+
+/// Check -Xopenmp-target triggers error when an option requiring arguments is passed to it.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-NESTED-ERROR %s
+
+// CHK-FOPENMP-TARGET-NESTED-ERROR: clang{{.*}} error: invalid -Xopenmp-target argument: '-Xopenmp-target -Xopenmp-target', options requiring arguments are unsupported
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,18 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  // If this is an OpenMP action we need to extract the device architecture
+  // from the -march=arch option. This option may come from -Xopenmp-target
+  // flag or the default value.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchName = Args.getLastArgValue(options::OPT_march_EQ);
+assert(!GPUArchName.empty() && "Must have an architecture passed in.");
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -405,7 +415,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +428,13 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+StringRef Arch = DAL->getLastArgValue(options::OPT_march_EQ);
+if (Arch.empty())
+  // Default compute capability for CUDA toolchain is sm_20.
+  DAL->AddJoinedArg(nullptr,
+  Opts.getOption(options::OPT_march_EQ), "sm_20");
+
 return DAL;
   }
 
Index: lib/Driver/ToolChain.cpp
===
--- lib/Driver/ToolChain.cpp
+++ lib/Driver/ToolChain.cpp
@@ -775,3 +775,65 @@
 
   return VersionTuple();
 }
+
+llvm::opt::DerivedArgList *
+ToolChain::TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList ,
+Action::OffloadKind DeviceOffloadKind) const {
+  if (DeviceOffloadKind == Action::OFK_OpenMP) {
+DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
+const OptTable  = getDriver().getOpts();
+
+// Handle -Xopenmp-target flags
+

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-08-06 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added inline comments.



Comment at: lib/Driver/ToolChain.cpp:808
+  continue;
+  } else if (XOpenMPTargetNoTriple)
+// Passing device args: -Xopenmp-target -opt=val.

hfinkel wrote:
> Please include {} around this else-if  code, even though it is not necessary, 
> because the other blocks require it.
Done



Comment at: lib/Driver/ToolChain.cpp:820
+  if (!XOpenMPTargetArg || Index > Prev + 1) {
+getDriver().Diag(diag::err_drv_invalid_Xopenmp_target_with_args)
+<< A->getAsString(Args);

hfinkel wrote:
> Is this covered by a test case?
Done



Comment at: lib/Driver/ToolChain.cpp:827
+  options::OPT_fopenmp_targets_EQ).size() != 1) {
+getDriver().Diag(diag::err_drv_Xopenmp_target_missing_triple);
+continue;

hfinkel wrote:
> Is this covered by a test case?
Done



Comment at: test/Driver/openmp-offload.c:615
+
+// CHK-FOPENMP-TARGET: clang{{.*}} argument unused during compilation: 
'-Xopenmp-target -march=pwr8'

hfinkel wrote:
> Now that this is in common code, why are these arguments still unused?
Fixed.


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-08-06 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 109938.
gtbercea added a comment.

Address comments.


https://reviews.llvm.org/D34784

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  include/clang/Driver/ToolChain.h
  lib/Driver/Compilation.cpp
  lib/Driver/ToolChain.cpp
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -597,3 +597,35 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s
 
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}} "-aux-triple" "powerpc64le--linux" {{.*}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
+
+/// ###
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -mcpu=pwr7 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} "-target-cpu" "pwr7"
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -mcpu=pwr7 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} "-target-cpu" "pwr7"
+
+/// ###
+
+/// Check -Xopenmp-target triggers error when multiple triples are used.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu,powerpc64le-unknown-linux-gnu -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR %s
+
+// CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR: clang{{.*}} error: cannot deduce implicit triple value for -Xopenmp-target, specify triple using -Xopenmp-target=
+
+/// ###
+
+/// Check -Xopenmp-target triggers error when an option requiring arguments is passed to it.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-NESTED-ERROR %s
+
+// CHK-FOPENMP-TARGET-NESTED-ERROR: clang{{.*}} error: invalid -Xopenmp-target argument: '-Xopenmp-target -Xopenmp-target', options requiring arguments are unsupported
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,20 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  std::vector GPUArchNames;
+  // If this is an OpenMP action we need to extract the device architecture from
+  // the -fopenmp-target-arch option.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchNames = Args.getAllArgValues(options::OPT_march_EQ);
+assert(GPUArchNames.size() == 1 &&
+   "Exactly one GPU Arch required for ptxas.");
+GPUArchName = GPUArchNames[0];
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -405,7 +417,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +430,15 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+auto MArchList = DAL->getAllArgValues(options::OPT_march_EQ);
+assert(MArchList.size() < 2 &&
+   "Too many archs under -Xopenmp-targets");
+if (MArchList.empty())
+  // Default compute capability for CUDA toolchain is sm_20.
+  DAL->AddJoinedArg(nullptr,
+  Opts.getOption(options::OPT_march_EQ), "sm_20");
+
 return DAL;
   }
 
Index: lib/Driver/ToolChain.cpp
===
--- lib/Driver/ToolChain.cpp
+++ lib/Driver/ToolChain.cpp
@@ -775,3 +775,69 @@
 
   return VersionTuple();
 }
+
+llvm::opt::DerivedArgList *
+ToolChain::TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList ,
+Action::OffloadKind DeviceOffloadKind) const {
+  if (DeviceOffloadKind == Action::OFK_OpenMP) {
+DerivedArgList *DAL = new 

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-08-05 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added inline comments.



Comment at: lib/Driver/ToolChain.cpp:808
+  continue;
+  } else if (XOpenMPTargetNoTriple)
+// Passing device args: -Xopenmp-target -opt=val.

Please include {} around this else-if  code, even though it is not necessary, 
because the other blocks require it.



Comment at: lib/Driver/ToolChain.cpp:820
+  if (!XOpenMPTargetArg || Index > Prev + 1) {
+getDriver().Diag(diag::err_drv_invalid_Xopenmp_target_with_args)
+<< A->getAsString(Args);

Is this covered by a test case?



Comment at: lib/Driver/ToolChain.cpp:827
+  options::OPT_fopenmp_targets_EQ).size() != 1) {
+getDriver().Diag(diag::err_drv_Xopenmp_target_missing_triple);
+continue;

Is this covered by a test case?



Comment at: lib/Driver/ToolChain.cpp:834
+  // Ignore all but last -march=arch flag.
+  if (A->getOption().matches(options::OPT_march_EQ))
+DAL->eraseArg(options::OPT_march_EQ);

Why is `-march` special in this regard? Shouldn't the consumers just take the 
last one specified (e.g., use getLastArgValue in the ToolChain code)?



Comment at: test/Driver/openmp-offload.c:615
+
+// CHK-FOPENMP-TARGET: clang{{.*}} argument unused during compilation: 
'-Xopenmp-target -march=pwr8'

Now that this is in common code, why are these arguments still unused?


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-08-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 109904.
gtbercea added a comment.

  Don't exclude flags when host matches offload toolchain.


https://reviews.llvm.org/D34784

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  include/clang/Driver/ToolChain.h
  lib/Driver/Compilation.cpp
  lib/Driver/ToolChain.cpp
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -597,3 +597,19 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s
 
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}} "-aux-triple" "powerpc64le--linux" {{.*}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
+
+/// ###
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8'
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target -march=pwr8'
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,20 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  std::vector GPUArchNames;
+  // If this is an OpenMP action we need to extract the device architecture from
+  // the -fopenmp-target-arch option.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchNames = Args.getAllArgValues(options::OPT_march_EQ);
+assert(GPUArchNames.size() == 1 &&
+   "Exactly one GPU Arch required for ptxas.");
+GPUArchName = GPUArchNames[0];
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -405,7 +417,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +430,15 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+auto MArchList = DAL->getAllArgValues(options::OPT_march_EQ);
+assert(MArchList.size() < 2 &&
+   "Too many archs under -Xopenmp-targets");
+if (MArchList.empty())
+  // Default compute capability for CUDA toolchain is sm_20.
+  DAL->AddJoinedArg(nullptr,
+  Opts.getOption(options::OPT_march_EQ), "sm_20");
+
 return DAL;
   }
 
Index: lib/Driver/ToolChain.cpp
===
--- lib/Driver/ToolChain.cpp
+++ lib/Driver/ToolChain.cpp
@@ -775,3 +775,69 @@
 
   return VersionTuple();
 }
+
+llvm::opt::DerivedArgList *
+ToolChain::TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList ,
+Action::OffloadKind DeviceOffloadKind) const {
+  if (DeviceOffloadKind == Action::OFK_OpenMP) {
+DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
+const OptTable  = getDriver().getOpts();
+
+// Handle -Xopenmp-target flags
+for (Arg *A : Args) {
+  // Exclude flags which may only apply to the host toolchain.
+  // Do not exclude flags when the host triple (AuxTriple),
+  // matches the current toolchain triple.
+  if (A->getOption().matches(options::OPT_m_Group)) {
+if (getAuxTriple() && getAuxTriple()->str() == getTriple().str())
+  DAL->append(A);
+continue;
+  }
+
+  unsigned Index;
+  unsigned Prev;
+  bool XOpenMPTargetNoTriple = A->getOption().matches(
+  options::OPT_Xopenmp_target);
+
+  if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
+// Passing device args: -Xopenmp-target= -opt=val.
+if (A->getValue(0) == getTripleString())
+  Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
+else
+  continue;
+  } 

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-08-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 109903.
gtbercea added a comment.

  New way to handle OpenMP target flags.


https://reviews.llvm.org/D34784

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  include/clang/Driver/ToolChain.h
  lib/Driver/Compilation.cpp
  lib/Driver/ToolChain.cpp
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -597,3 +597,19 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s
 
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}} "-aux-triple" "powerpc64le--linux" {{.*}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
+
+/// ###
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8'
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target -march=pwr8'
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,20 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  std::vector GPUArchNames;
+  // If this is an OpenMP action we need to extract the device architecture from
+  // the -fopenmp-target-arch option.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchNames = Args.getAllArgValues(options::OPT_march_EQ);
+assert(GPUArchNames.size() == 1 &&
+   "Exactly one GPU Arch required for ptxas.");
+GPUArchName = GPUArchNames[0];
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -405,7 +417,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +430,15 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+auto MArchList = DAL->getAllArgValues(options::OPT_march_EQ);
+assert(MArchList.size() < 2 &&
+   "Too many archs under -Xopenmp-targets");
+if (MArchList.empty())
+  // Default compute capability for CUDA toolchain is sm_20.
+  DAL->AddJoinedArg(nullptr,
+  Opts.getOption(options::OPT_march_EQ), "sm_20");
+
 return DAL;
   }
 
Index: lib/Driver/ToolChain.cpp
===
--- lib/Driver/ToolChain.cpp
+++ lib/Driver/ToolChain.cpp
@@ -775,3 +775,66 @@
 
   return VersionTuple();
 }
+
+llvm::opt::DerivedArgList *
+ToolChain::TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList ,
+Action::OffloadKind DeviceOffloadKind) const {
+  if (DeviceOffloadKind == Action::OFK_OpenMP) {
+DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
+const OptTable  = getDriver().getOpts();
+
+// Handle -Xopenmp-target flags
+for (Arg *A : Args) {
+  // Exclude flags which may only apply to the host toolchain.
+  // TODO: find a way to keep flags from this group if host
+  //   toolchain and this offload toolchain have the same triple.
+  if (A->getOption().matches(options::OPT_m_Group))
+continue;
+
+  unsigned Index;
+  unsigned Prev;
+  bool XOpenMPTargetNoTriple = A->getOption().matches(
+  options::OPT_Xopenmp_target);
+
+  if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
+// Passing device args: -Xopenmp-target= -opt=val.
+if (A->getValue(0) == getTripleString())
+  Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
+else
+  continue;
+  } else if (XOpenMPTargetNoTriple)
+// Passing device args: -Xopenmp-target -opt=val.
+Index = 

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-11 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added a comment.

This is much closer to what I had in mind, thanks! Now I think we're in a 
position to make this work for more than just the CUDA target. It looks like 
the added code is now:

1. Remove -march from the translated arguments (because any existing -march 
would apply only for the host compilation).
2. Process the -Xopenmp-target flags and add those arguments to the list.
3. If we don't have an -march in the translated arguments, then add 
-march=sm_20 so that there's a suitable default (noting that this default must 
be higher than the regular CUDA default).

I propose the following:

(1) is good, but should be more general. It is not just the host's -march that 
should not apply to any arbitrary toolchain, but any of the -m options. 
You should remove all options that are in the m_Group options group (which, as 
noted in Options.td, are "Target-dependent compilation options"). I believe 
that you can iterate over them all using something like:

  for (const Arg *A : Args.filtered(options::OPT_m_Group)) {

and that might help. This should be in toolchain-independent code, and I'd 
prefer that we always remove these options whenever the host and target 
toolchain differ, but leave them when they're the same.

(2) is good, but, along with (1), should be in toolchain-independent code. I 
recommend that we add a new member function to ToolChain, called, to make a 
specific suggestion, TranslateOpenMPTargetArgs, and put the logic from (1) and 
(2) in  this function. Then, we can augment Compilation::getArgsForToolChain to 
do something like the following:

  const DerivedArgList &
  Compilation::getArgsForToolChain(const ToolChain *TC, StringRef BoundArch,
   Action::OffloadKind DeviceOffloadKind) {
if (!TC)
  TC = 
  
DerivedArgList * = TCArgs[{TC, BoundArch, DeviceOffloadKind}];
if (!Entry) {
  // First, translate OpenMP toolchain arguments provided via the 
-Xopenmp-toolchain flags.
  Entry = TranslateOpenMPTargetArgs(*TranslatedArgs, BoundArch, 
DeviceOffloadKind);
  if (!Entry)
Entry = TranslatedArgs;
  
  Entry = TC->TranslateArgs(*Entry, BoundArch, DeviceOffloadKind);
  if (!Entry)
Entry = TranslatedArgs;
}
  
return *Entry;
  }

And then (3) we leave as it is (where it is).




Comment at: lib/Driver/ToolChains/Cuda.cpp:480
+if (MArchList.empty())
+  // Default compute capability for CUDA toolchain is sm_20.
+  DAL->AddJoinedArg(nullptr,

This default is only for OpenMP, right? Please explain in the comment why this 
is the default for OpenMP.


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-10 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

@hfinkel

I think I have something that works which is similar to what you were 
requesting. Please let me know your thoughts!

Thanks,
--Doru


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-10 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 105932.
gtbercea added a comment.

Address comments.


https://reviews.llvm.org/D34784

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -597,3 +597,19 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s
 
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}} "-aux-triple" "powerpc64le--linux" {{.*}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
+
+/// ###
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8'
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target -march=pwr8'
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,20 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  std::vector GPUArchNames;
+  // If this is an OpenMP action we need to extract the device architecture from
+  // the -fopenmp-target-arch option.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchNames = Args.getAllArgValues(options::OPT_march_EQ);
+assert(GPUArchNames.size() == 1 &&
+   "Exactly one GPU Arch required for ptxas.");
+GPUArchName = GPUArchNames[0];
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -405,7 +417,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +430,57 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+// Do not propagate host arch to device.
+// Passing arch to device is done via -Xopenmp-target flag.
+DAL->eraseArg(options::OPT_march_EQ);
+
+for (Arg *A : Args) {
+  unsigned Index = -1;
+  unsigned Prev;
+  bool XOpenMPTargetNoTriple = A->getOption().matches(
+  options::OPT_Xopenmp_target);
+
+  if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
+if (A->getValue(0) == getTripleString())
+  Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
+  } else if (XOpenMPTargetNoTriple)
+// Passing device args: -fopenmp-target -opt=val.
+Index = Args.getBaseArgs().MakeIndex(A->getValue(0));
+  else
+continue;
+
+  // Parse the argument to -Xopenmp-target.
+  Prev = Index;
+  std::unique_ptr XOpenMPTargetArg(Opts.ParseOneArg(Args, Index));
+  if (!XOpenMPTargetArg || Index > Prev + 1) {
+getDriver().Diag(diag::err_drv_invalid_Xopenmp_target_with_args)
+<< A->getAsString(Args);
+continue;
+  }
+  if (XOpenMPTargetNoTriple && XOpenMPTargetArg &&
+  Args.getAllArgValues(
+  options::OPT_fopenmp_targets_EQ).size() != 1) {
+getDriver().Diag(diag::err_drv_Xopenmp_target_missing_triple);
+continue;
+  }
+  XOpenMPTargetArg->setBaseArg(A);
+  A = XOpenMPTargetArg.release();
+
+  // Ignore all but last -march=arch flag.
+  if (A->getOption().matches(options::OPT_march_EQ))
+DAL->eraseArg(options::OPT_march_EQ);
+  DAL->append(A);
+}
+
+auto MArchList = DAL->getAllArgValues(options::OPT_march_EQ);
+assert(MArchList.size() < 2 &&
+   "Too many archs under -Xopenmp-targets");
+if (MArchList.empty())
+  // Default compute capability for CUDA toolchain is sm_20.
+  DAL->AddJoinedArg(nullptr,
+  Opts.getOption(options::OPT_march_EQ), 

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-10 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added inline comments.



Comment at: lib/Driver/ToolChains/Cuda.cpp:474
+for (StringRef Opt : OptList) {
+  AddMArchOption(DAL, Opts, Opt);
+}

gtbercea wrote:
> hfinkel wrote:
> > gtbercea wrote:
> > > hfinkel wrote:
> > > > Shouldn't you be adding all of the options, not just the -march= ones?
> > > I thought that that would be the case but there are a few issues:
> > > 
> > > 1. PTXAS and NVLINK each use a different flag for specifying the arch, 
> > > and, in turn, each flag is different from -march.
> > > 
> > > 2. -Xopenmp-target passes a flag to the entire toolchain not to 
> > > individual components of the toolchain so a translation of flags is 
> > > required in some cases to adapt the flag to the actual tool. -march is 
> > > one example, I'm not sure if there are others.
> > > 
> > > 3. At this point in the code, in order to add a flag and its value to the 
> > > DAL list, I need to be able to specify the option type (i.e. 
> > > options::OPT_march_EQ). I therefore need to manually recognize the flag 
> > > in the string representing the value of -Xopenmp-target or 
> > > -Xopenmp-target=triple.
> > > 
> > > 4. This patch handles the passing of the arch and can be extended to pass 
> > > other flags (as is stands, no other flags are passed through to the CUDA 
> > > toolchain). This can be extended on a flag by flag basis for flags that 
> > > need translating to a particular tool's flag. If the flag doesn't need 
> > > translating then the flag and it's value can be appended to the command 
> > > line as they are.
> > > 1. PTXAS and NVLINK each use a different flag for specifying the arch, 
> > > and, in turn, each flag is different from -march.
> > 
> > I don't understand why this is relevant. Don't 
> > NVPTX::Assembler::ConstructJob and NVPTX::Linker::ConstructJob handle that 
> > in either case?
> > 
> > This seems to be the same comment to point 2 as well.
> > 
> > > 3. At this point in the code, in order to add a flag and its value to the 
> > > DAL list, I need to be able to specify the option type (i.e. 
> > > options::OPT_march_EQ). I therefore need to manually recognize the flag 
> > > in the string representing the value of -Xopenmp-target or 
> > > -Xopenmp-target=triple.
> > 
> > I don't understand why this is true. Doesn't the code just below this, 
> > which handles -Xarch, do the general thing (it calls Opts.ParseOneArg and 
> > then adds it to the list of derived arguments)? Can't we handle this like 
> > -Xarch?
> > 
> > > This patch handles the passing of the arch and can be extended to pass 
> > > other flags (as is stands, no other flags are passed through to the CUDA 
> > > toolchain). This can be extended on a flag by flag basis for flags that 
> > > need translating to a particular tool's flag. If the flag doesn't need 
> > > translating then the flag and it's value can be appended to the command 
> > > line as they are.
> > 
> > I don't understand this either. If we need to extend this on a flag-by-flag 
> > basis, then it seems fundamentally broken. How could we append a flag to 
> > the command line without it also affecting the host compile?
> @hfinkel 
> 
> The problem is that when using -Xopenmp-target= -opt=val the value of 
> this flag is a list of two strings:
> 
> ['', '-opt=val']
> 
> What needs to happen next is to parse the string containing "-opt=val". The 
> reason I have to do this is because if I use -march, I can't pass -march as 
> is to PTXAS and NVLINK which have different flags for expressing the arch. I 
> need to translate the -march=sm_60 flag. I will have to do this for all flags 
> which require translation. There is no way I can just append this string to 
> the PTXAS and NVLINK commands because the flags for the 2 tools are 
> different. A flag which works for one of them, will not work for the other. 
> 
> So I need to actually parse that value to check whether it is a "-march" and 
> create an Arg object with the OPT_march_EQ identifier and the sm_60 value. 
> When invoking the commands for PTXAS and NVLINK, the dervied arg list will be 
> travered and every -march=sm_60 option will be transformed into "--gpu-name" 
> "sm_60" for PTXAS and into "-arch" "sm_60" for NVLINK. 
> 
> In the case of -Xarch, you will see that after we have traversed the entire 
> arg list we still have to special case -march and add it is manually added to 
> the DAL.
> 
> Let me know your thoughts on this.
> 
> Thanks,
> 
> --Doru
> What needs to happen next is to parse the string containing "-opt=val". 

Yes, that's what ParseOneArg will do.

> The reason I have to do this is because if I use -march, I can't pass -march 
> as is to PTXAS and NVLINK which have different flags for expressing the arch. 
> I need to translate the -march=sm_60 flag. I will have to do this for all 
> flags which require translation. There is no way I can just append this 
> string to the PTXAS and NVLINK commands because the flags 

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-10 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added inline comments.



Comment at: lib/Driver/ToolChains/Cuda.cpp:474
+for (StringRef Opt : OptList) {
+  AddMArchOption(DAL, Opts, Opt);
+}

hfinkel wrote:
> gtbercea wrote:
> > hfinkel wrote:
> > > Shouldn't you be adding all of the options, not just the -march= ones?
> > I thought that that would be the case but there are a few issues:
> > 
> > 1. PTXAS and NVLINK each use a different flag for specifying the arch, and, 
> > in turn, each flag is different from -march.
> > 
> > 2. -Xopenmp-target passes a flag to the entire toolchain not to individual 
> > components of the toolchain so a translation of flags is required in some 
> > cases to adapt the flag to the actual tool. -march is one example, I'm not 
> > sure if there are others.
> > 
> > 3. At this point in the code, in order to add a flag and its value to the 
> > DAL list, I need to be able to specify the option type (i.e. 
> > options::OPT_march_EQ). I therefore need to manually recognize the flag in 
> > the string representing the value of -Xopenmp-target or 
> > -Xopenmp-target=triple.
> > 
> > 4. This patch handles the passing of the arch and can be extended to pass 
> > other flags (as is stands, no other flags are passed through to the CUDA 
> > toolchain). This can be extended on a flag by flag basis for flags that 
> > need translating to a particular tool's flag. If the flag doesn't need 
> > translating then the flag and it's value can be appended to the command 
> > line as they are.
> > 1. PTXAS and NVLINK each use a different flag for specifying the arch, and, 
> > in turn, each flag is different from -march.
> 
> I don't understand why this is relevant. Don't NVPTX::Assembler::ConstructJob 
> and NVPTX::Linker::ConstructJob handle that in either case?
> 
> This seems to be the same comment to point 2 as well.
> 
> > 3. At this point in the code, in order to add a flag and its value to the 
> > DAL list, I need to be able to specify the option type (i.e. 
> > options::OPT_march_EQ). I therefore need to manually recognize the flag in 
> > the string representing the value of -Xopenmp-target or 
> > -Xopenmp-target=triple.
> 
> I don't understand why this is true. Doesn't the code just below this, which 
> handles -Xarch, do the general thing (it calls Opts.ParseOneArg and then adds 
> it to the list of derived arguments)? Can't we handle this like -Xarch?
> 
> > This patch handles the passing of the arch and can be extended to pass 
> > other flags (as is stands, no other flags are passed through to the CUDA 
> > toolchain). This can be extended on a flag by flag basis for flags that 
> > need translating to a particular tool's flag. If the flag doesn't need 
> > translating then the flag and it's value can be appended to the command 
> > line as they are.
> 
> I don't understand this either. If we need to extend this on a flag-by-flag 
> basis, then it seems fundamentally broken. How could we append a flag to the 
> command line without it also affecting the host compile?
@hfinkel 

The problem is that when using -Xopenmp-target= -opt=val the value of 
this flag is a list of two strings:

['', '-opt=val']

What needs to happen next is to parse the string containing "-opt=val". The 
reason I have to do this is because if I use -march, I can't pass -march as is 
to PTXAS and NVLINK which have different flags for expressing the arch. I need 
to translate the -march=sm_60 flag. I will have to do this for all flags which 
require translation. There is no way I can just append this string to the PTXAS 
and NVLINK commands because the flags for the 2 tools are different. A flag 
which works for one of them, will not work for the other. 

So I need to actually parse that value to check whether it is a "-march" and 
create an Arg object with the OPT_march_EQ identifier and the sm_60 value. When 
invoking the commands for PTXAS and NVLINK, the dervied arg list will be 
travered and every -march=sm_60 option will be transformed into "--gpu-name" 
"sm_60" for PTXAS and into "-arch" "sm_60" for NVLINK. 

In the case of -Xarch, you will see that after we have traversed the entire arg 
list we still have to special case -march and add it is manually added to the 
DAL.

Let me know your thoughts on this.

Thanks,

--Doru


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-06 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added inline comments.



Comment at: lib/Driver/ToolChains/Cuda.cpp:474
+for (StringRef Opt : OptList) {
+  AddMArchOption(DAL, Opts, Opt);
+}

gtbercea wrote:
> hfinkel wrote:
> > Shouldn't you be adding all of the options, not just the -march= ones?
> I thought that that would be the case but there are a few issues:
> 
> 1. PTXAS and NVLINK each use a different flag for specifying the arch, and, 
> in turn, each flag is different from -march.
> 
> 2. -Xopenmp-target passes a flag to the entire toolchain not to individual 
> components of the toolchain so a translation of flags is required in some 
> cases to adapt the flag to the actual tool. -march is one example, I'm not 
> sure if there are others.
> 
> 3. At this point in the code, in order to add a flag and its value to the DAL 
> list, I need to be able to specify the option type (i.e. 
> options::OPT_march_EQ). I therefore need to manually recognize the flag in 
> the string representing the value of -Xopenmp-target or 
> -Xopenmp-target=triple.
> 
> 4. This patch handles the passing of the arch and can be extended to pass 
> other flags (as is stands, no other flags are passed through to the CUDA 
> toolchain). This can be extended on a flag by flag basis for flags that need 
> translating to a particular tool's flag. If the flag doesn't need translating 
> then the flag and it's value can be appended to the command line as they are.
> 1. PTXAS and NVLINK each use a different flag for specifying the arch, and, 
> in turn, each flag is different from -march.

I don't understand why this is relevant. Don't NVPTX::Assembler::ConstructJob 
and NVPTX::Linker::ConstructJob handle that in either case?

This seems to be the same comment to point 2 as well.

> 3. At this point in the code, in order to add a flag and its value to the DAL 
> list, I need to be able to specify the option type (i.e. 
> options::OPT_march_EQ). I therefore need to manually recognize the flag in 
> the string representing the value of -Xopenmp-target or 
> -Xopenmp-target=triple.

I don't understand why this is true. Doesn't the code just below this, which 
handles -Xarch, do the general thing (it calls Opts.ParseOneArg and then adds 
it to the list of derived arguments)? Can't we handle this like -Xarch?

> This patch handles the passing of the arch and can be extended to pass other 
> flags (as is stands, no other flags are passed through to the CUDA 
> toolchain). This can be extended on a flag by flag basis for flags that need 
> translating to a particular tool's flag. If the flag doesn't need translating 
> then the flag and it's value can be appended to the command line as they are.

I don't understand this either. If we need to extend this on a flag-by-flag 
basis, then it seems fundamentally broken. How could we append a flag to the 
command line without it also affecting the host compile?



Comment at: test/Driver/openmp-offload.c:607
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} argument unused during compilation: 
'-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8'
+

gtbercea wrote:
> hfinkel wrote:
> > I don't see why you'd check that the arguments are unused. They should be 
> > used. One exception might be that you might want to force 
> > -Xopenmp-target=foo to be unused if foo is not a currently-targeted 
> > offloading triple. There could be a separate test case for that.
> > 
> > Otherwise, I think you should be able to check the relevant backend 
> > commands, no? (something like where CHK-COMMANDS is used above in this 
> > file).
> Only the CUDA toolchain currently contains code which considers the value of 
> the -Xopenmp-target flag. The CUDA toolchain is not capable of offloading 
> until the next patch lands so any test for how the flag propagates to the 
> CUDA toolchain will have to wait.
> 
> Passing a flag to some other toolchain again doesn't work because the other 
> toolchains have not been instructed to look at this flag so they won't 
> contain the passed flag in their respective command lines.
> 
> For a lack of a better test, what I wanted to show is that the usage of this 
> flag doesn't throw an error such as unknown flag and is correctly recognized: 
> "-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8". 
> 
> 
> 
> Passing a flag to some other toolchain again doesn't work because the other 
> toolchains have not been instructed to look at this flag so they won't 
> contain the passed flag in their respective command lines.

I think, however, that we need to refactor this so that it works for all 
toolchains. If you convince me otherwise, then this will be fine as well :-)



https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added inline comments.



Comment at: lib/Driver/ToolChains/Cuda.cpp:478
+auto MArchList = DAL->getAllArgValues(options::OPT_march_EQ);
+assert(MArchList.size() < 2 && "At most one GPU arch allowed.");
+if (MArchList.empty())

hfinkel wrote:
> Can a user hit this? If so, it must be an actual diagnostic.
A user cannot hit this now, -Xopenmp-target does not lead to duplicate -march 
flags in DAL anymore.


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 105355.
gtbercea added a comment.

Address Comments.


https://reviews.llvm.org/D34784

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -597,3 +597,19 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s
 
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}} "-aux-triple" "powerpc64le--linux" {{.*}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
+
+/// ###
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8'
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target -march=pwr8'
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,20 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  std::vector GPUArchNames;
+  // If this is an OpenMP action we need to extract the device architecture from
+  // the -fopenmp-target-arch option.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchNames = Args.getAllArgValues(options::OPT_march_EQ);
+assert(GPUArchNames.size() == 1 &&
+   "Exactly one GPU Arch required for ptxas.");
+GPUArchName = GPUArchNames[0];
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -392,6 +404,22 @@
   CudaInstallation.AddCudaIncludeArgs(DriverArgs, CC1Args);
 }
 
+void AddMArchOption(DerivedArgList *DAL,
+const OptTable ,
+StringRef Opt) {
+  if (Opt.startswith("-march=")) {
+StringRef Arch = Opt.split("=").second;
+// Check if the arch provided is valid for this toolchain.
+// If not valid, ignore it.
+if (StringToCudaArch(Arch) != CudaArch::UNKNOWN) {
+  DAL->eraseArg(options::OPT_march_EQ);
+  DAL->AddJoinedArg(nullptr,
+  Opts.getOption(options::OPT_march_EQ),
+  Arch.str());
+}
+  }
+}
+
 llvm::opt::DerivedArgList *
 CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList ,
  StringRef BoundArch,
@@ -405,7 +433,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +446,49 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+// Get the compute capability from the -Xopenmp-target flag.
+auto OptList = Args.getAllArgValues(options::OPT_Xopenmp_target_EQ);
+
+// For each OPT_Xopenmp_target_EQ option, the function returns
+// two strings, the triple and the option.
+// The following format is assumed:
+//
+// -Xopenmp-target=nvptx64-nvidia-cuda -opt=val
+for (unsigned i = 0; i < OptList.size(); i+=2) {
+  StringRef Opt = OptList[i+1];
+  if (OptList[i] == getTripleString())
+AddMArchOption(DAL, Opts, Opt);
+}
+
+OptList = Args.getAllArgValues(options::OPT_Xopenmp_target);
+// When there is only one option in the list, the following format
+// is assumed:
+//
+// -Xopenmp-target -opt=val
+
+// By default, if no triple is explicitely specified, we
+// associate -opt=val with the toolchain specified under the
+// -fopenmp-targets flag (provided that there is only one such
+// toolchain specified).
+if (!OptList.empty() &&
+Args.getAllArgValues(
+options::OPT_fopenmp_targets_EQ).size() != 1)
+  

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 105354.
gtbercea added a comment.

Address comments.


https://reviews.llvm.org/D34784

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  lib/Driver/Driver.cpp
  lib/Driver/ToolChains/Clang.cpp
  lib/Driver/ToolChains/CommonArgs.cpp
  lib/Driver/ToolChains/CommonArgs.h
  lib/Driver/ToolChains/Cuda.cpp
  lib/Driver/ToolChains/Cuda.h
  lib/Driver/ToolChains/Gnu.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -597,3 +597,76 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s
 
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}} "-aux-triple" "powerpc64le--linux" {{.*}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
+
+/// ###
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8'
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target -march=pwr8'
+
+/// ###
+
+/// Check -Xopenmp-target triggers error when multiple triples are used.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu,nvptx64-nvidia-cuda -Xopenmp-target -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-ERROR %s
+
+// CHK-FOPENMP-TARGET-ERROR: clang{{.*}} error: cannot deduce implicit triple value for -Xopenmp-target, specify triple using -Xopenmp-target=
+
+/// ###
+
+/// Check -Xopenmp-target uses one of the archs provided when several archs are used.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_35 -Xopenmp-target -march=sm_60 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-ARCHS %s
+
+// CHK-FOPENMP-TARGET-ARCHS: ptxas{{.*}}" "--gpu-name" "sm_60"
+// CHK-FOPENMP-TARGET-ARCHS: nvlink{{.*}}" "-arch" "sm_60"
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr8 is ignored in favor of sm_20 (default).
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu,nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-WARNING %s
+
+// CHK-FOPENMP-TARGET-WARNING: ptxas{{.*}}" "--gpu-name" "sm_20"
+// CHK-FOPENMP-TARGET-WARNING: nvlink{{.*}}" "-arch" "sm_20"
+
+/// ###
+
+/// Check -Xopenmp-target -march=sm_35 works as expected when two triples are present.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu,nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_35 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-COMPILATION %s
+
+// CHK-FOPENMP-TARGET-COMPILATION: ptxas{{.*}}" "--gpu-name" "sm_35"
+// CHK-FOPENMP-TARGET-COMPILATION: nvlink{{.*}}" "-arch" "sm_35"
+
+/// ###
+
+/// Check cubin file generation and usage by nvlink
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-CUBIN %s
+
+// CHK-CUBIN: clang{{.*}}" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: ptxas{{.*}}" "--output-file" "{{.*}}-openmp-nvptx64-nvidia-cuda.cubin" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" {{.*}} "openmp-offload-openmp-nvptx64-nvidia-cuda.cubin"
+
+/// ###
+
+/// Check cubin file generation and usage by nvlink
+// RUN:   touch %t1.o
+// RUN:   touch %t2.o
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %t1.o %t2.o 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-TWOCUBIN %s
+
+// CHK-TWOCUBIN: clang-offload-bundler" "-type=o" 

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added inline comments.



Comment at: test/Driver/openmp-offload.c:607
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} argument unused during compilation: 
'-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8'
+

hfinkel wrote:
> I don't see why you'd check that the arguments are unused. They should be 
> used. One exception might be that you might want to force -Xopenmp-target=foo 
> to be unused if foo is not a currently-targeted offloading triple. There 
> could be a separate test case for that.
> 
> Otherwise, I think you should be able to check the relevant backend commands, 
> no? (something like where CHK-COMMANDS is used above in this file).
Only the CUDA toolchain currently contains code which considers the value of 
the -Xopenmp-target flag. The CUDA toolchain is not capable of offloading until 
the next patch lands so any test for how the flag propagates to the CUDA 
toolchain will have to wait.

Passing a flag to some other toolchain again doesn't work because the other 
toolchains have not been instructed to look at this flag so they won't contain 
the passed flag in their respective command lines.

For a lack of a better test, what I wanted to show is that the usage of this 
flag doesn't throw an error such as unknown flag and is correctly recognized: 
"-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8". 





https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 105280.
gtbercea marked 3 inline comments as done.
gtbercea added a comment.

Address comments.


https://reviews.llvm.org/D34784

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -597,3 +597,19 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s
 
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}} "-aux-triple" "powerpc64le-unknown-linux-gnu" {{.*}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
+
+/// ###
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8'
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target -march=pwr8'
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,20 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  std::vector GPUArchNames;
+  // If this is an OpenMP action we need to extract the device architecture from
+  // the -fopenmp-target-arch option.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchNames = Args.getAllArgValues(options::OPT_march_EQ);
+assert(GPUArchNames.size() == 1 &&
+   "Exactly one GPU Arch required for ptxas.");
+GPUArchName = GPUArchNames[0];
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -392,6 +404,15 @@
   CudaInstallation.AddCudaIncludeArgs(DriverArgs, CC1Args);
 }
 
+void AddMArchOption(DerivedArgList *DAL,
+const OptTable ,
+StringRef Opt) {
+  if (Opt.startswith("-march="))
+DAL->AddJoinedArg(nullptr,
+Opts.getOption(options::OPT_march_EQ),
+Opt.split("=").second);
+}
+
 llvm::opt::DerivedArgList *
 CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList ,
  StringRef BoundArch,
@@ -405,7 +426,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +439,48 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+// Get the compute capability from the -Xopenmp-target flag.
+auto OptList = Args.getAllArgValues(options::OPT_Xopenmp_target_EQ);
+
+// For each OPT_Xopenmp_target_EQ option, the function returns
+// two strings, the triple and the option.
+// The following format is assumed:
+//
+// -Xopenmp-target=nvptx64-nvidia-cuda -opt=val
+for (unsigned i = 0; i < OptList.size(); i+=2) {
+  StringRef Opt = OptList[i+1];
+  if (OptList[i] == getTripleString())
+AddMArchOption(DAL, Opts, Opt);
+}
+
+OptList = Args.getAllArgValues(options::OPT_Xopenmp_target);
+// When there is only one option in the list, the following format
+// is assumed:
+//
+// -Xopenmp-target -opt=val
+
+// By default, if no triple is explicitely specified, we
+// associate -opt=val with the toolchain specified under the
+// -fopenmp-targets flag (provided that there is only one such
+// toolchain specified).
+if (!OptList.empty() &&
+Args.getAllArgValues(options::OPT_fopenmp_targets_EQ).size() != 1)
+  getDriver().Diag(diag::err_drv_Xopenmp_target_missing_triple);
+
+// Add arch
+for (StringRef Opt : OptList) {
+  AddMArchOption(DAL, Opts, Opt);
+}
+
+auto MArchList = DAL->getAllArgValues(options::OPT_march_EQ);
+if (MArchList.size() > 1)
+   

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-07-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added inline comments.



Comment at: lib/Driver/ToolChains/Cuda.cpp:443
+
+// Get the compute capability from the -fopenmp-targets flag.
+// The default compute capability is sm_20 since this is a CUDA

hfinkel wrote:
> Is this first sentence accurate?
Fixed. It should be -Xopenmp-target



Comment at: lib/Driver/ToolChains/Cuda.cpp:446
+// tool chain.
+auto OptList = Args.getAllArgValues(options::OPT_Xopenmp_target_EQ);
+

hfinkel wrote:
> Why is this logic in this function? Don't you need the same logic in 
> Generic_GCC::TranslateArgs to handle non-CUDA offloading toolchains?
> 
I would imagine that each toolchain needs to parse the list of flags since, 
given a toolchain, the flag may need to be passed to more than one tool and 
different tools may require different flags for passing the same information.



Comment at: lib/Driver/ToolChains/Cuda.cpp:474
+for (StringRef Opt : OptList) {
+  AddMArchOption(DAL, Opts, Opt);
+}

hfinkel wrote:
> Shouldn't you be adding all of the options, not just the -march= ones?
I thought that that would be the case but there are a few issues:

1. PTXAS and NVLINK each use a different flag for specifying the arch, and, in 
turn, each flag is different from -march.

2. -Xopenmp-target passes a flag to the entire toolchain not to individual 
components of the toolchain so a translation of flags is required in some cases 
to adapt the flag to the actual tool. -march is one example, I'm not sure if 
there are others.

3. At this point in the code, in order to add a flag and its value to the DAL 
list, I need to be able to specify the option type (i.e. 
options::OPT_march_EQ). I therefore need to manually recognize the flag in the 
string representing the value of -Xopenmp-target or -Xopenmp-target=triple.

4. This patch handles the passing of the arch and can be extended to pass other 
flags (as is stands, no other flags are passed through to the CUDA toolchain). 
This can be extended on a flag by flag basis for flags that need translating to 
a particular tool's flag. If the flag doesn't need translating then the flag 
and it's value can be appended to the command line as they are.


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-30 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added inline comments.



Comment at: include/clang/Driver/Options.td:453
+def Xopenmp_target : Separate<["-"], "Xopenmp-target">,
+  HelpText<"Pass arguments to target offloading toolchain.">;
+def Xopenmp_target_EQ : JoinedAndSeparate<["-"], "Xopenmp-target=">,

Can this be?

HelpText<"Pass  to the target offloading toolchain.">,
MetaVarName<"">;



Comment at: include/clang/Driver/Options.td:455
+def Xopenmp_target_EQ : JoinedAndSeparate<["-"], "Xopenmp-target=">,
+  HelpText<"Pass arguments to target offloading toolchain. First entry is a 
triple that identifies the toolchain.">;
 def z : Separate<["-"], "z">, Flags<[LinkerInput, RenderAsInput]>,

  HelpText<"Pass  to the specified target offloading toolchain. The triple 
that identifies the toolchain must be provided after the equals sign.">, 
MetaVarName<"">;



Comment at: lib/Driver/ToolChains/Cuda.cpp:443
+
+// Get the compute capability from the -fopenmp-targets flag.
+// The default compute capability is sm_20 since this is a CUDA

Is this first sentence accurate?



Comment at: lib/Driver/ToolChains/Cuda.cpp:444
+// Get the compute capability from the -fopenmp-targets flag.
+// The default compute capability is sm_20 since this is a CUDA
+// tool chain.

This comment should be moved down to where the sm_20 default is added.



Comment at: lib/Driver/ToolChains/Cuda.cpp:446
+// tool chain.
+auto OptList = Args.getAllArgValues(options::OPT_Xopenmp_target_EQ);
+

Why is this logic in this function? Don't you need the same logic in 
Generic_GCC::TranslateArgs to handle non-CUDA offloading toolchains?




Comment at: lib/Driver/ToolChains/Cuda.cpp:469
+// toolchain specified).
+assert(Args.getAllArgValues(options::OPT_fopenmp_targets_EQ).size() == 1 &&
+"Target toolchain not specified on -Xopenmp-target and cannot be 
deduced.");

A user can trigger this assert, right? Please make this a diagnostic error 
instead.



Comment at: lib/Driver/ToolChains/Cuda.cpp:474
+for (StringRef Opt : OptList) {
+  AddMArchOption(DAL, Opts, Opt);
+}

Shouldn't you be adding all of the options, not just the -march= ones?



Comment at: lib/Driver/ToolChains/Cuda.cpp:478
+auto MArchList = DAL->getAllArgValues(options::OPT_march_EQ);
+assert(MArchList.size() < 2 && "At most one GPU arch allowed.");
+if (MArchList.empty())

Can a user hit this? If so, it must be an actual diagnostic.



Comment at: test/Driver/openmp-offload.c:607
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} argument unused during compilation: 
'-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8'
+

I don't see why you'd check that the arguments are unused. They should be used. 
One exception might be that you might want to force -Xopenmp-target=foo to be 
unused if foo is not a currently-targeted offloading triple. There could be a 
separate test case for that.

Otherwise, I think you should be able to check the relevant backend commands, 
no? (something like where CHK-COMMANDS is used above in this file).


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

@hfinkel I've add the flag as suggested. There is one minor change, I used "=" 
instead of ":" when specifying the toolchain/triple. I also support the triple 
being omitted when there is only one offloading toolchain specified with 
-fopenmp-targets.


https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 104962.
gtbercea added a comment.

Check -fopenmp-targets has one entry when using default toolchain in 
-Xopenmp-target.


https://reviews.llvm.org/D34784

Files:
  include/clang/Driver/Options.td
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -597,3 +597,19 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s
 
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}} "-aux-triple" "powerpc64le-unknown-linux-gnu" {{.*}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
+
+/// ###
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8'
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target -march=pwr8'
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,20 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  std::vector GPUArchNames;
+  // If this is an OpenMP action we need to extract the device architecture from
+  // the -fopenmp-target-arch option.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchNames = Args.getAllArgValues(options::OPT_march_EQ);
+assert(GPUArchNames.size() == 1 &&
+   "Exactly one GPU Arch required for ptxas.");
+GPUArchName = GPUArchNames[0];
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -392,6 +404,15 @@
   CudaInstallation.AddCudaIncludeArgs(DriverArgs, CC1Args);
 }
 
+void AddMArchOption(DerivedArgList *DAL,
+const OptTable ,
+StringRef Opt) {
+  if (Opt.startswith("-march="))
+DAL->AddJoinedArg(nullptr,
+Opts.getOption(options::OPT_march_EQ),
+Opt.split("=").second);
+}
+
 llvm::opt::DerivedArgList *
 CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList ,
  StringRef BoundArch,
@@ -405,7 +426,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +439,47 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+// Get the compute capability from the -fopenmp-targets flag.
+// The default compute capability is sm_20 since this is a CUDA
+// tool chain.
+auto OptList = Args.getAllArgValues(options::OPT_Xopenmp_target_EQ);
+
+// For each OPT_Xopenmp_target_EQ option, the function returns
+// two strings, the triple and the option.
+// The following format is assumed:
+//
+// -Xopenmp-target=nvptx64-nvidia-cuda -opt=val
+for (unsigned i = 0; i < OptList.size(); i+=2) {
+  StringRef Opt = OptList[i+1];
+  if (OptList[i] == getTripleString())
+AddMArchOption(DAL, Opts, Opt);
+}
+
+OptList = Args.getAllArgValues(options::OPT_Xopenmp_target);
+// When there is only one option in the list, the following format
+// is assumed:
+//
+// -Xopenmp-target -opt=val
+
+// By default, if no triple is explicitely specified, we
+// associate -opt=val with the toolchain specified under the
+// -fopenmp-targets flag (provided that there is only one such
+// toolchain specified).
+assert(Args.getAllArgValues(options::OPT_fopenmp_targets_EQ).size() == 1 &&
+"Target toolchain not specified on -Xopenmp-target and cannot be deduced.");
+
+// Add arch
+for (StringRef Opt : OptList) {
+  AddMArchOption(DAL, Opts, Opt);
+}
+
+auto MArchList = 

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 104960.
gtbercea retitled this revision from "[OpenMP] Add flag for specifying the 
target device architecture for OpenMP device offloading " to "[OpenMP] Add flag 
for specifying the target device architecture for OpenMP device offloading".
gtbercea added a comment.

Pass OpenMP target options.


https://reviews.llvm.org/D34784

Files:
  include/clang/Driver/Options.td
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -597,3 +597,19 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s
 
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}} "-aux-triple" "powerpc64le-unknown-linux-gnu" {{.*}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
+
+/// ###
+
+/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s
+
+// CHK-FOPENMP-EQ-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8'
+
+/// ###
+
+/// Check -Xopenmp-target -march=pwr8 is passed when compiling for the device.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -march=pwr8 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET %s
+
+// CHK-FOPENMP-TARGET: clang{{.*}} argument unused during compilation: '-Xopenmp-target -march=pwr8'
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,20 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  std::vector GPUArchNames;
+  // If this is an OpenMP action we need to extract the device architecture from
+  // the -fopenmp-target-arch option.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchNames = Args.getAllArgValues(options::OPT_march_EQ);
+assert(GPUArchNames.size() == 1 &&
+   "Exactly one GPU Arch required for ptxas.");
+GPUArchName = GPUArchNames[0];
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -392,6 +404,15 @@
   CudaInstallation.AddCudaIncludeArgs(DriverArgs, CC1Args);
 }
 
+void AddMArchOption(DerivedArgList *DAL,
+const OptTable ,
+StringRef Opt) {
+  if (Opt.startswith("-march="))
+DAL->AddJoinedArg(nullptr,
+Opts.getOption(options::OPT_march_EQ),
+Opt.split("=").second);
+}
+
 llvm::opt::DerivedArgList *
 CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList ,
  StringRef BoundArch,
@@ -405,7 +426,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +439,39 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+// Get the compute capability from the -fopenmp-targets flag.
+// The default compute capability is sm_20 since this is a CUDA
+// tool chain.
+auto OptList = Args.getAllArgValues(options::OPT_Xopenmp_target_EQ);
+// For each OPT_Xopenmp_target_EQ option, the function returns
+// two strings, the triple and the option.
+// The following format is assumed:
+//
+// -Xopenmp-target=nvptx64-nvidia-cuda -opt=val
+//
+for (unsigned i = 0; i < OptList.size(); i+=2) {
+  StringRef Opt = OptList[i+1];
+  if (OptList[i] == getTripleString())
+AddMArchOption(DAL, Opts, Opt);
+}
+
+OptList = Args.getAllArgValues(options::OPT_Xopenmp_target);
+// When there is only one option in the list, the following format
+// is assumed:
+//
+// -Xopenmp-target -opt=val
+//
+for (StringRef Opt : OptList) {
+  AddMArchOption(DAL, Opts, Opt);
+}
+
+auto MArchList = DAL->getAllArgValues(options::OPT_march_EQ);
+assert(MArchList.size() < 2 && "At most one GPU arch allowed.");
+if (MArchList.empty())
+  DAL->AddJoinedArg(nullptr,
+  Opts.getOption(options::OPT_march_EQ), "sm_20");
+
 

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-30 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added a comment.

...

>   
 
 Okay, good, this is exactly where I was going when I said I was worried 
 about generalization. -march seems like one of many flags I might want to 
 pass to the target compilation. Moreover, it doesn't seem special in what 
 regard.
 
 We have -Xclang and -mllvm, etc. to pass flags through to other stages of 
 compilation. Could we do something similar here? Maybe something like: 
 ``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7``. That's 
 unfortunately long, but if there's only one target, we could omit the 
 triple?
>>> 
>>> The triple could be omitted, absolutely.
>>> 
>>> If you have the following:
>>> 
>>> -fopenmp-targets=openmp-powerpc64le-ibm-linux-gnu 
>>> ``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7`` 
>>> ``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr8``
>>> 
>>> This would end up having a toolchain called for each one of the 
>>> -Xopenmp-target sets of flags even though a single triple was specified 
>>> under the -fopenmp-targets. Would this be ok?
>> 
>> Why? That does not sound desirable. And could you even use these multiple 
>> outputs? I think you'd want to pass all of the arguments for each target 
>> triple to the one toolchain invocation for that target triple. Is that 
>> possible?
> 
> I agree, I don't think that is something we want (i.e. having one triple lead 
> to two toolchains), with the current flags you can't do that today. I wanted 
> to check with you though that's why i mentioned it.
> 
> I think appending all options for a particular triple together is more 
> desirable.

Good, let's do that.


Repository:
  rL LLVM

https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

In https://reviews.llvm.org/D34784#795988, @hfinkel wrote:

> In https://reviews.llvm.org/D34784#795980, @gtbercea wrote:
>
> > In https://reviews.llvm.org/D34784#795934, @hfinkel wrote:
> >
> > > In https://reviews.llvm.org/D34784#795871, @gtbercea wrote:
> > >
> > > > In https://reviews.llvm.org/D34784#795367, @hfinkel wrote:
> > > >
> > > > > In https://reviews.llvm.org/D34784#795353, @gtbercea wrote:
> > > > >
> > > > > > In https://reviews.llvm.org/D34784#795287, @hfinkel wrote:
> > > > > >
> > > > > > > What happens if you have multiple targets? Maybe this should be 
> > > > > > > -fopenmp-targets-arch=foo,bar,whatever?
> > > > > > >
> > > > > > > Once this all lands, please make sure that you add additional 
> > > > > > > test cases here. Make sure that the arch is passed through to the 
> > > > > > > ptx and cuda tools as it should be. Make sure that the defaults 
> > > > > > > work. Make sure that something reasonable happens if the user 
> > > > > > > specifies the option more than once (if they're all the same).
> > > > > >
> > > > > >
> > > > > > Hi Hal,
> > > > > >
> > > > > > At the moment only one arch is supported and it would apply to all 
> > > > > > the target triples under -fopenmp-targets.
> > > > > >
> > > > > > I was planning to address the multiple archs problem in a future 
> > > > > > patch.
> > > > > >
> > > > > > I am assuming that in the case of multiple archs, each arch in 
> > > > > > -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple 
> > > > > > in -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. 
> > > > > > Is this a practical interpretation of what should happen?
> > > > >
> > > > >
> > > > > Yea, that's what I was thinking. I'm a bit concerned that none of 
> > > > > this generalizes well. To take a step back, under what circumstances 
> > > > > do we support multiple targets right now?
> > > >
> > > >
> > > > We allow -fopenmp-targets to get a list of triples. I am not aware of 
> > > > any limitations in terms of how many of these triples you can have. 
> > > > Even in the test file of this patch we have the following: 
> > > > "-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"
> > > >
> > > > > 
> > > > > 
> > > > >> Regarding tests: more tests can be added as a separate patch once 
> > > > >> offloading is enabled by the patch following this one (i.e. 
> > > > >> https://reviews.llvm.org/D29654). There actually is a test in 
> > > > >> https://reviews.llvm.org/D29654 where I check that the arch is 
> > > > >> passed to ptxas and nvlink correctly using this flag. I will add 
> > > > >> some more test cases to cover the other situations you mentioned.
> > > > > 
> > > > > Sounds good.
> > > > > 
> > > > >> Thanks,
> > > > >> 
> > > > >> --Doru
> > > >
> > > > In our previous solution there might be a problem.  The same triple 
> > > > might be used multiple times just so that you can have several archs in 
> > > > the other flag (T1 and T2 being the same). There are some alternatives 
> > > > which I have discussed with @ABataev.
> > > >
> > > > One solution could be to associate an arch with each triple to avoid 
> > > > positional matching of triples in one flag with archs in another flag:
> > > >
> > > >   -fopenmp-targets=T1:A1,T2,T3:A2
> > > >
> > > >
> > > > ":A1" is optional, also, in the future, we can pass other things to the 
> > > > toolchain such as "-L/a/b/c/d":
> > > >
> > > >   -fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2
> > > >
> > >
> > >
> > > Okay, good, this is exactly where I was going when I said I was worried 
> > > about generalization. -march seems like one of many flags I might want to 
> > > pass to the target compilation. Moreover, it doesn't seem special in what 
> > > regard.
> > >
> > > We have -Xclang and -mllvm, etc. to pass flags through to other stages of 
> > > compilation. Could we do something similar here? Maybe something like: 
> > > ``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7``. That's 
> > > unfortunately long, but if there's only one target, we could omit the 
> > > triple?
> >
> >
> > The triple could be omitted, absolutely.
> >
> > If you have the following:
> >
> > -fopenmp-targets=openmp-powerpc64le-ibm-linux-gnu 
> > ``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7`` 
> > ``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr8``
> >
> > This would end up having a toolchain called for each one of the 
> > -Xopenmp-target sets of flags even though a single triple was specified 
> > under the -fopenmp-targets. Would this be ok?
>
>
> Why? That does not sound desirable. And could you even use these multiple 
> outputs? I think you'd want to pass all of the arguments for each target 
> triple to the one toolchain invocation for that target triple. Is that 
> possible?


I agree, I don't think that is something we want (i.e. having one triple lead 
to two toolchains), with the current 

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-29 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added a comment.

In https://reviews.llvm.org/D34784#795980, @gtbercea wrote:

> In https://reviews.llvm.org/D34784#795934, @hfinkel wrote:
>
> > In https://reviews.llvm.org/D34784#795871, @gtbercea wrote:
> >
> > > In https://reviews.llvm.org/D34784#795367, @hfinkel wrote:
> > >
> > > > In https://reviews.llvm.org/D34784#795353, @gtbercea wrote:
> > > >
> > > > > In https://reviews.llvm.org/D34784#795287, @hfinkel wrote:
> > > > >
> > > > > > What happens if you have multiple targets? Maybe this should be 
> > > > > > -fopenmp-targets-arch=foo,bar,whatever?
> > > > > >
> > > > > > Once this all lands, please make sure that you add additional test 
> > > > > > cases here. Make sure that the arch is passed through to the ptx 
> > > > > > and cuda tools as it should be. Make sure that the defaults work. 
> > > > > > Make sure that something reasonable happens if the user specifies 
> > > > > > the option more than once (if they're all the same).
> > > > >
> > > > >
> > > > > Hi Hal,
> > > > >
> > > > > At the moment only one arch is supported and it would apply to all 
> > > > > the target triples under -fopenmp-targets.
> > > > >
> > > > > I was planning to address the multiple archs problem in a future 
> > > > > patch.
> > > > >
> > > > > I am assuming that in the case of multiple archs, each arch in 
> > > > > -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in 
> > > > > -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is 
> > > > > this a practical interpretation of what should happen?
> > > >
> > > >
> > > > Yea, that's what I was thinking. I'm a bit concerned that none of this 
> > > > generalizes well. To take a step back, under what circumstances do we 
> > > > support multiple targets right now?
> > >
> > >
> > > We allow -fopenmp-targets to get a list of triples. I am not aware of any 
> > > limitations in terms of how many of these triples you can have. Even in 
> > > the test file of this patch we have the following: 
> > > "-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"
> > >
> > > > 
> > > > 
> > > >> Regarding tests: more tests can be added as a separate patch once 
> > > >> offloading is enabled by the patch following this one (i.e. 
> > > >> https://reviews.llvm.org/D29654). There actually is a test in 
> > > >> https://reviews.llvm.org/D29654 where I check that the arch is passed 
> > > >> to ptxas and nvlink correctly using this flag. I will add some more 
> > > >> test cases to cover the other situations you mentioned.
> > > > 
> > > > Sounds good.
> > > > 
> > > >> Thanks,
> > > >> 
> > > >> --Doru
> > >
> > > In our previous solution there might be a problem.  The same triple might 
> > > be used multiple times just so that you can have several archs in the 
> > > other flag (T1 and T2 being the same). There are some alternatives which 
> > > I have discussed with @ABataev.
> > >
> > > One solution could be to associate an arch with each triple to avoid 
> > > positional matching of triples in one flag with archs in another flag:
> > >
> > >   -fopenmp-targets=T1:A1,T2,T3:A2
> > >
> > >
> > > ":A1" is optional, also, in the future, we can pass other things to the 
> > > toolchain such as "-L/a/b/c/d":
> > >
> > >   -fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2
> > >
> >
> >
> > Okay, good, this is exactly where I was going when I said I was worried 
> > about generalization. -march seems like one of many flags I might want to 
> > pass to the target compilation. Moreover, it doesn't seem special in what 
> > regard.
> >
> > We have -Xclang and -mllvm, etc. to pass flags through to other stages of 
> > compilation. Could we do something similar here? Maybe something like: 
> > ``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7``. That's 
> > unfortunately long, but if there's only one target, we could omit the 
> > triple?
>
>
> The triple could be omitted, absolutely.
>
> If you have the following:
>
> -fopenmp-targets=openmp-powerpc64le-ibm-linux-gnu 
> ``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7`` 
> ``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr8``
>
> This would end up having a toolchain called for each one of the 
> -Xopenmp-target sets of flags even though a single triple was specified under 
> the -fopenmp-targets. Would this be ok?


Why? That does not sound desirable. And could you even use these multiple 
outputs? I think you'd want to pass all of the arguments for each target triple 
to the one toolchain invocation for that target triple. Is that possible?

> 
> 
>> 
>> 
>>> An actual example:
>>> 
>>>   
>>> -fopenmp-targets=nvptx64-nvidia-cuda:sm_35,openmp-powerpc64le-ibm-linux-gnu




Repository:
  rL LLVM

https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-29 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

In https://reviews.llvm.org/D34784#795934, @hfinkel wrote:

> In https://reviews.llvm.org/D34784#795871, @gtbercea wrote:
>
> > In https://reviews.llvm.org/D34784#795367, @hfinkel wrote:
> >
> > > In https://reviews.llvm.org/D34784#795353, @gtbercea wrote:
> > >
> > > > In https://reviews.llvm.org/D34784#795287, @hfinkel wrote:
> > > >
> > > > > What happens if you have multiple targets? Maybe this should be 
> > > > > -fopenmp-targets-arch=foo,bar,whatever?
> > > > >
> > > > > Once this all lands, please make sure that you add additional test 
> > > > > cases here. Make sure that the arch is passed through to the ptx and 
> > > > > cuda tools as it should be. Make sure that the defaults work. Make 
> > > > > sure that something reasonable happens if the user specifies the 
> > > > > option more than once (if they're all the same).
> > > >
> > > >
> > > > Hi Hal,
> > > >
> > > > At the moment only one arch is supported and it would apply to all the 
> > > > target triples under -fopenmp-targets.
> > > >
> > > > I was planning to address the multiple archs problem in a future patch.
> > > >
> > > > I am assuming that in the case of multiple archs, each arch in 
> > > > -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in 
> > > > -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this 
> > > > a practical interpretation of what should happen?
> > >
> > >
> > > Yea, that's what I was thinking. I'm a bit concerned that none of this 
> > > generalizes well. To take a step back, under what circumstances do we 
> > > support multiple targets right now?
> >
> >
> > We allow -fopenmp-targets to get a list of triples. I am not aware of any 
> > limitations in terms of how many of these triples you can have. Even in the 
> > test file of this patch we have the following: 
> > "-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"
> >
> > > 
> > > 
> > >> Regarding tests: more tests can be added as a separate patch once 
> > >> offloading is enabled by the patch following this one (i.e. 
> > >> https://reviews.llvm.org/D29654). There actually is a test in 
> > >> https://reviews.llvm.org/D29654 where I check that the arch is passed to 
> > >> ptxas and nvlink correctly using this flag. I will add some more test 
> > >> cases to cover the other situations you mentioned.
> > > 
> > > Sounds good.
> > > 
> > >> Thanks,
> > >> 
> > >> --Doru
> >
> > In our previous solution there might be a problem.  The same triple might 
> > be used multiple times just so that you can have several archs in the other 
> > flag (T1 and T2 being the same). There are some alternatives which I have 
> > discussed with @ABataev.
> >
> > One solution could be to associate an arch with each triple to avoid 
> > positional matching of triples in one flag with archs in another flag:
> >
> >   -fopenmp-targets=T1:A1,T2,T3:A2
> >
> >
> > ":A1" is optional, also, in the future, we can pass other things to the 
> > toolchain such as "-L/a/b/c/d":
> >
> >   -fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2
> >
>
>
> Okay, good, this is exactly where I was going when I said I was worried about 
> generalization. -march seems like one of many flags I might want to pass to 
> the target compilation. Moreover, it doesn't seem special in what regard.
>
> We have -Xclang and -mllvm, etc. to pass flags through to other stages of 
> compilation. Could we do something similar here? Maybe something like: 
> ``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7``. That's 
> unfortunately long, but if there's only one target, we could omit the triple?


The triple could be omitted, absolutely.

If you have the following:

-fopenmp-targets=openmp-powerpc64le-ibm-linux-gnu 
``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7`` 
``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr8``

This would end up having a toolchain called for each one of the -Xopenmp-target 
sets of flags even though a single triple was specified under the 
-fopenmp-targets. Would this be ok?

> 
> 
>> An actual example:
>> 
>>   -fopenmp-targets=nvptx64-nvidia-cuda:sm_35,openmp-powerpc64le-ibm-linux-gnu


Repository:
  rL LLVM

https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-29 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added a comment.

In https://reviews.llvm.org/D34784#795871, @gtbercea wrote:

> In https://reviews.llvm.org/D34784#795367, @hfinkel wrote:
>
> > In https://reviews.llvm.org/D34784#795353, @gtbercea wrote:
> >
> > > In https://reviews.llvm.org/D34784#795287, @hfinkel wrote:
> > >
> > > > What happens if you have multiple targets? Maybe this should be 
> > > > -fopenmp-targets-arch=foo,bar,whatever?
> > > >
> > > > Once this all lands, please make sure that you add additional test 
> > > > cases here. Make sure that the arch is passed through to the ptx and 
> > > > cuda tools as it should be. Make sure that the defaults work. Make sure 
> > > > that something reasonable happens if the user specifies the option more 
> > > > than once (if they're all the same).
> > >
> > >
> > > Hi Hal,
> > >
> > > At the moment only one arch is supported and it would apply to all the 
> > > target triples under -fopenmp-targets.
> > >
> > > I was planning to address the multiple archs problem in a future patch.
> > >
> > > I am assuming that in the case of multiple archs, each arch in 
> > > -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in 
> > > -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a 
> > > practical interpretation of what should happen?
> >
> >
> > Yea, that's what I was thinking. I'm a bit concerned that none of this 
> > generalizes well. To take a step back, under what circumstances do we 
> > support multiple targets right now?
>
>
> We allow -fopenmp-targets to get a list of triples. I am not aware of any 
> limitations in terms of how many of these triples you can have. Even in the 
> test file of this patch we have the following: 
> "-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"
>
> > 
> > 
> >> Regarding tests: more tests can be added as a separate patch once 
> >> offloading is enabled by the patch following this one (i.e. 
> >> https://reviews.llvm.org/D29654). There actually is a test in 
> >> https://reviews.llvm.org/D29654 where I check that the arch is passed to 
> >> ptxas and nvlink correctly using this flag. I will add some more test 
> >> cases to cover the other situations you mentioned.
> > 
> > Sounds good.
> > 
> >> Thanks,
> >> 
> >> --Doru
>
> In our previous solution there might be a problem.  The same triple might be 
> used multiple times just so that you can have several archs in the other flag 
> (T1 and T2 being the same). There are some alternatives which I have 
> discussed with @ABataev.
>
> One solution could be to associate an arch with each triple to avoid 
> positional matching of triples in one flag with archs in another flag:
>
>   -fopenmp-targets=T1:A1,T2,T3:A2
>
>
> ":A1" is optional, also, in the future, we can pass other things to the 
> toolchain such as "-L/a/b/c/d":
>
>   -fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2
>


Okay, good, this is exactly where I was going when I said I was worried about 
generalization. -march seems like one of many flags I might want to pass to the 
target compilation. Moreover, it doesn't seem special in what regard.

We have -Xclang and -mllvm, etc. to pass flags through to other stages of 
compilation. Could we do something similar here? Maybe something like: 
``-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7``. That's 
unfortunately long, but if there's only one target, we could omit the triple?

> An actual example:
> 
>   -fopenmp-targets=nvptx64-nvidia-cuda:sm_35,openmp-powerpc64le-ibm-linux-gnu




Repository:
  rL LLVM

https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-29 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

In https://reviews.llvm.org/D34784#795367, @hfinkel wrote:

> In https://reviews.llvm.org/D34784#795353, @gtbercea wrote:
>
> > In https://reviews.llvm.org/D34784#795287, @hfinkel wrote:
> >
> > > What happens if you have multiple targets? Maybe this should be 
> > > -fopenmp-targets-arch=foo,bar,whatever?
> > >
> > > Once this all lands, please make sure that you add additional test cases 
> > > here. Make sure that the arch is passed through to the ptx and cuda tools 
> > > as it should be. Make sure that the defaults work. Make sure that 
> > > something reasonable happens if the user specifies the option more than 
> > > once (if they're all the same).
> >
> >
> > Hi Hal,
> >
> > At the moment only one arch is supported and it would apply to all the 
> > target triples under -fopenmp-targets.
> >
> > I was planning to address the multiple archs problem in a future patch.
> >
> > I am assuming that in the case of multiple archs, each arch in 
> > -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in 
> > -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a 
> > practical interpretation of what should happen?
>
>
> Yea, that's what I was thinking. I'm a bit concerned that none of this 
> generalizes well. To take a step back, under what circumstances do we support 
> multiple targets right now?


We allow -fopenmp-targets to get a list of triples. I am not aware of any 
limitations in terms of how many of these triples you can have. Even in the 
test file of this patch we have the following: 
"-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"

> 
> 
>> Regarding tests: more tests can be added as a separate patch once offloading 
>> is enabled by the patch following this one (i.e. 
>> https://reviews.llvm.org/D29654). There actually is a test in 
>> https://reviews.llvm.org/D29654 where I check that the arch is passed to 
>> ptxas and nvlink correctly using this flag. I will add some more test cases 
>> to cover the other situations you mentioned.
> 
> Sounds good.
> 
>> Thanks,
>> 
>> --Doru

In our previous solution there might be a problem.  The same triple might be 
used multiple times just so that you can have several archs in the other flag 
(T1 and T2 being the same). There are some alternatives which I have discussed 
with @ABataev.

One solution could be to associate an arch with each triple to avoid positional 
matching of triples in one flag with archs in another flag:

  -fopenmp-targets=T1:A1,T2,T3:A2

":A1" is optional, also, in the future, we can pass other things to the 
toolchain such as "-L/a/b/c/d":

  -fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2

An actual example:

  -fopenmp-targets=nvptx64-nvidia-cuda:sm_35,openmp-powerpc64le-ibm-linux-gnu


Repository:
  rL LLVM

https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-29 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added a comment.

In https://reviews.llvm.org/D34784#795353, @gtbercea wrote:

> In https://reviews.llvm.org/D34784#795287, @hfinkel wrote:
>
> > What happens if you have multiple targets? Maybe this should be 
> > -fopenmp-targets-arch=foo,bar,whatever?
> >
> > Once this all lands, please make sure that you add additional test cases 
> > here. Make sure that the arch is passed through to the ptx and cuda tools 
> > as it should be. Make sure that the defaults work. Make sure that something 
> > reasonable happens if the user specifies the option more than once (if 
> > they're all the same).
>
>
> Hi Hal,
>
> At the moment only one arch is supported and it would apply to all the target 
> triples under -fopenmp-targets.
>
> I was planning to address the multiple archs problem in a future patch.
>
> I am assuming that in the case of multiple archs, each arch in 
> -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in 
> -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a 
> practical interpretation of what should happen?


Yea, that's what I was thinking. I'm a bit concerned that none of this 
generalizes well. To take a step back, under what circumstances do we support 
multiple targets right now?

> Regarding tests: more tests can be added as a separate patch once offloading 
> is enabled by the patch following this one (i.e. 
> https://reviews.llvm.org/D29654). There actually is a test in 
> https://reviews.llvm.org/D29654 where I check that the arch is passed to 
> ptxas and nvlink correctly using this flag. I will add some more test cases 
> to cover the other situations you mentioned.

Sounds good.

> Thanks,
> 
> --Doru




Repository:
  rL LLVM

https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-29 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

In https://reviews.llvm.org/D34784#795287, @hfinkel wrote:

> What happens if you have multiple targets? Maybe this should be 
> -fopenmp-targets-arch=foo,bar,whatever?
>
> Once this all lands, please make sure that you add additional test cases 
> here. Make sure that the arch is passed through to the ptx and cuda tools as 
> it should be. Make sure that the defaults work. Make sure that something 
> reasonable happens if the user specifies the option more than once (if 
> they're all the same).


Hi Hal,

At the moment only one arch is supported and it would apply to all the target 
triples under -fopenmp-targets.

I was planning to address the multiple archs problem in a future patch.

I am assuming that in the case of multiple archs, each arch in 
-fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in 
-fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a 
practical interpretation of what should happen?

Regarding tests: more tests can be added as a separate patch once offloading is 
enabled by the patch following this one (i.e. https://reviews.llvm.org/D29654). 
There actually is a test in https://reviews.llvm.org/D29654 where I check that 
the arch is passed to ptxas and nvlink correctly using this flag. I will add 
some more test cases to cover the other situations you mentioned.

Thanks,

--Doru


Repository:
  rL LLVM

https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-29 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added a comment.

What happens if you have multiple targets? Maybe this should be 
-fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. 
Make sure that the arch is passed through to the ptx and cuda tools as it 
should be. Make sure that the defaults work. Make sure that something 
reasonable happens if the user specifies the option more than once (if they're 
all the same).


Repository:
  rL LLVM

https://reviews.llvm.org/D34784



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-28 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 104532.

Repository:
  rL LLVM

https://reviews.llvm.org/D34784

Files:
  include/clang/Driver/Options.td
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c


Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -599,3 +599,11 @@
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}}.i" {{.*}}" "-fopenmp-is-device"
 // CHK-FOPENMP-IS-DEVICE-NEXT: clang{{.*}}.bc" {{.*}}.i" "-fopenmp-is-device" 
"-fopenmp-host-ir-file-path"
 // CHK-FOPENMP-IS-DEVICE-NEXT: clang{{.*}}.s" {{.*}}.bc" "-fopenmp-is-device"
+
+/// ###
+
+/// Check -march propagates compute capability to device offloading toolchain.
+// RUN:   %clang -### -fopenmp=libomp 
-fopenmp-targets=powerpc64le-ibm-linux-gnu -save-temps -no-canonical-prefixes 
-fopenmp-target-arch=sm_35 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-COMPUTE-CAPABILITY %s
+
+// CHK-COMPUTE-CAPABILITY: clang: warning: argument unused during compilation: 
'-fopenmp-target-arch=sm_35'
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,20 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  std::vector GPUArchNames;
+  // If this is an OpenMP action we need to extract the device architecture 
from
+  // the -fopenmp-target-arch option.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchNames = Args.getAllArgValues(options::OPT_fopenmp_target_arch_EQ);
+assert(GPUArchNames.size() == 1 &&
+   "Exactly one GPU Arch required for ptxas.");
+GPUArchName = GPUArchNames[0];
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -342,7 +354,9 @@
 Action::OffloadKind DeviceOffloadingKind) const {
   HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);
 
-  StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
+  StringRef GpuArch = DriverArgs.getLastArgValue(
+  DeviceOffloadingKind == Action::OFK_OpenMP ?
+  options::OPT_fopenmp_target_arch_EQ : options::OPT_march_EQ);
   assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
   assert((DeviceOffloadingKind == Action::OFK_OpenMP ||
   DeviceOffloadingKind == Action::OFK_Cuda) &&
@@ -405,7 +419,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +432,14 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+// Get the compute capability from the -fopenmp-target-arch flag.
+// The default compute capability is sm_20 since this is a CUDA
+// tool chain.
+if (Args.getAllArgValues(options::OPT_fopenmp_target_arch_EQ).empty())
+  DAL->AddJoinedArg(nullptr,
+  Opts.getOption(options::OPT_fopenmp_target_arch_EQ), "sm_20");
+
 return DAL;
   }
 
Index: include/clang/Driver/Options.td
===
--- include/clang/Driver/Options.td
+++ include/clang/Driver/Options.td
@@ -1294,6 +1294,8 @@
   HelpText<"Specify comma-separated list of triples OpenMP offloading targets 
to be supported">;
 def fopenmp_dump_offload_linker_script : Flag<["-"], 
"fopenmp-dump-offload-linker-script">, Group, 
   Flags<[NoArgumentUnused]>;
+def fopenmp_target_arch_EQ : Joined<["-"], "fopenmp-target-arch=">, 
Flags<[DriverOption]>,
+  HelpText<"Pass a single target architecture (default for NVIDIA is sm_20) to 
be used by OpenMP device offloading.">;
 def fno_optimize_sibling_calls : Flag<["-"], "fno-optimize-sibling-calls">, 
Group;
 def foptimize_sibling_calls : Flag<["-"], "foptimize-sibling-calls">, 
Group;
 def force__cpusubtype__ALL : Flag<["-"], "force_cpusubtype_ALL">;


Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -599,3 +599,11 @@
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}}.i" {{.*}}" "-fopenmp-is-device"
 // CHK-FOPENMP-IS-DEVICE-NEXT: clang{{.*}}.bc" {{.*}}.i" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"
 // CHK-FOPENMP-IS-DEVICE-NEXT: clang{{.*}}.s" {{.*}}.bc" "-fopenmp-is-device"
+
+/// ###
+
+/// Check -march 

[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

2017-06-28 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea created this revision.

OpenMP has the ability to offload target regions to devices which may have 
different architectures.

A new -fopenmp-target-arch flag is introduced to specify the device 
architecture.

In this patch I use the new flag to specify the compute capability of the 
underlying NVIDIA architecture for the OpenMP offloading CUDA tool chain.

Only a host-offloading test is provided since full device offloading capability 
will only be available when D29654  lands.


Repository:
  rL LLVM

https://reviews.llvm.org/D34784

Files:
  include/clang/Driver/Options.td
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload.c


Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -599,3 +599,11 @@
 // CHK-FOPENMP-IS-DEVICE: clang{{.*}}.i" {{.*}}" "-fopenmp-is-device"
 // CHK-FOPENMP-IS-DEVICE-NEXT: clang{{.*}}.bc" {{.*}}.i" "-fopenmp-is-device" 
"-fopenmp-host-ir-file-path"
 // CHK-FOPENMP-IS-DEVICE-NEXT: clang{{.*}}.s" {{.*}}.bc" "-fopenmp-is-device"
+
+/// ###
+
+/// Check -march propagates compute capability to device offloading toolchain.
+// RUN:   %clang -### -fopenmp=libomp 
-fopenmp-targets=powerpc64le-ibm-linux-gnu -save-temps -no-canonical-prefixes 
-fopenmp-target-arch=sm_35 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-COMPUTE-CAPABILITY %s
+
+// CHK-COMPUTE-CAPABILITY: clang: warning: argument unused during compilation: 
'-fopenmp-target-arch=sm_35'
Index: lib/Driver/ToolChains/Cuda.cpp
===
--- lib/Driver/ToolChains/Cuda.cpp
+++ lib/Driver/ToolChains/Cuda.cpp
@@ -212,8 +212,20 @@
   static_cast(getToolChain());
   assert(TC.getTriple().isNVPTX() && "Wrong platform");
 
+  StringRef GPUArchName;
+  std::vector GPUArchNames;
+  // If this is an OpenMP action we need to extract the device architecture 
from
+  // the -fopenmp-target-arch option.
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+GPUArchNames = Args.getAllArgValues(options::OPT_fopenmp_target_arch_EQ);
+assert(GPUArchNames.size() == 1 &&
+   "Exactly one GPU Arch required for ptxas.");
+GPUArchName = GPUArchNames[0];
+  } else
+GPUArchName = JA.getOffloadingArch();
+
   // Obtain architecture from the action.
-  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
+  CudaArch gpu_arch = StringToCudaArch(GPUArchName);
   assert(gpu_arch != CudaArch::UNKNOWN &&
  "Device action expected to have an architecture.");
 
@@ -342,7 +354,9 @@
 Action::OffloadKind DeviceOffloadingKind) const {
   HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);
 
-  StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
+  StringRef GpuArch = DriverArgs.getLastArgValue(
+  DeviceOffloadingKind == Action::OFK_OpenMP ?
+  options::OPT_fopenmp_target_arch_EQ : options::OPT_march_EQ);
   assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
   assert((DeviceOffloadingKind == Action::OFK_OpenMP ||
   DeviceOffloadingKind == Action::OFK_Cuda) &&
@@ -364,7 +378,6 @@
   }
 
   std::string LibDeviceFile = CudaInstallation.getLibDeviceFile(GpuArch);
-
   if (LibDeviceFile.empty()) {
 getDriver().Diag(diag::err_drv_no_cuda_libdevice) << GpuArch;
 return;
@@ -405,7 +418,7 @@
 
   // For OpenMP device offloading, append derived arguments. Make sure
   // flags are not duplicated.
-  // TODO: Append the compute capability.
+  // Also append the compute capability.
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
 for (Arg *A : Args){
   bool IsDuplicate = false;
@@ -418,6 +431,14 @@
   if (!IsDuplicate)
 DAL->append(A);
 }
+
+// Get the compute capability from the -fopenmp-target-arch flag.
+// The default compute capability is sm_20 since this is a CUDA
+// tool chain.
+if (Args.getAllArgValues(options::OPT_fopenmp_target_arch_EQ).empty())
+  DAL->AddJoinedArg(nullptr,
+  Opts.getOption(options::OPT_fopenmp_target_arch_EQ), "sm_20");
+
 return DAL;
   }
 
Index: include/clang/Driver/Options.td
===
--- include/clang/Driver/Options.td
+++ include/clang/Driver/Options.td
@@ -1294,6 +1294,8 @@
   HelpText<"Specify comma-separated list of triples OpenMP offloading targets 
to be supported">;
 def fopenmp_dump_offload_linker_script : Flag<["-"], 
"fopenmp-dump-offload-linker-script">, Group, 
   Flags<[NoArgumentUnused]>;
+def fopenmp_target_arch_EQ : Joined<["-"], "fopenmp-target-arch=">, 
Flags<[DriverOption]>,
+  HelpText<"Pass a single target architecture (default for NVIDIA is sm_20) to 
be used by OpenMP device offloading.">;
 def fno_optimize_sibling_calls : Flag<["-"], "fno-optimize-sibling-calls">, 
Group;
 def