[PATCH] D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls.

2021-12-08 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov added a comment.

In D115283#3180836 , @yaxunl wrote:

> If we only need to check whether `__ockl_hostcall_internal` exists in the 
> final module in LLVM codegen to determine whether we need the hostcall 
> metadata, probably we don't even need a function attribute or even module 
> flag.

Right, we used to do exactly that, but then it turned out that it does not work 
with -fgpu-rdc since IPO may rename the __ockl_hostcall_internal().


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115283/new/

https://reviews.llvm.org/D115283

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls.

2021-12-08 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov added inline comments.



Comment at: clang/test/CodeGenHIP/amdgpu_hostcall.cpp:2-6
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x hip -emit-llvm 
-fcuda-is-device -DFN_HOSTCALL \
+// RUN:   -o - %s | FileCheck --enable-var-scope %s
+
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x hip -emit-llvm 
-fcuda-is-device -DFN_PRINTF \
+// RUN:   -o - %s | FileCheck --enable-var-scope %s

dfukalov wrote:
> Am I right we don't actually need two runs here, the test may be executed 
> with one run, removed `#ifdefs` and, possible, multiplied `CHECK:` lines?
> I would suggest to use the llvm/utils/update_cc_test_checks.py script in such 
> tests.
Well, it may be executed with one run, but in that case we won't be able to 
catch an error if one of the functions is broken, because the 2nd one will set 
the module flag.
Why do you think I should use the script for this test?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115283/new/

https://reviews.llvm.org/D115283

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls.

2021-12-08 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov added a comment.

In D115283#3179651 , @yaxunl wrote:

> One drawback of this approach is that it does not work for LLVM modules 
> generated from assembly or programmatically e.g. Tensorflow XLA.
>
> Another drawback is that if `__ockl_call_host_function` or 
> `__ockl_fprintf_stderr_begin` are eliminated by optimizer, the module flag is 
> still kept. This could happen if users use printf in assert.
>
> Is there a way to detect use of hostcall later in LLVM IR not by calling of 
> these functions?

Two other possible solutions that come to my mind:

1. Make a separate pass that would check if there is an instance of the 
ockl_hostcall_internal() function in the module and set the module flag if so.

2. Add a new "amdgpu_hostcall" function attribute. The 
"ockl_hostcall_internal()" function should be declared with 
__attribute__(("amdgpu_hostcall")) in the device libs. Then, in the Code Gen 
pass, we just set the "hidden_hostcall" kernel arg attribute if any function 
with "amdgpu_hostcall" is present. I think this is the best solution since we 
don't rely on the particular function name, and it would work ideally with any 
optimizations.

Please let me know what you think.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115283/new/

https://reviews.llvm.org/D115283

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls.

2021-12-07 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov updated this revision to Diff 392577.
Herald added subscribers: kerbowa, nhaehnle, jvesely.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115283/new/

https://reviews.llvm.org/D115283

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenHIP/amdgpu_hostcall.cpp


Index: clang/test/CodeGenHIP/amdgpu_hostcall.cpp
===
--- /dev/null
+++ clang/test/CodeGenHIP/amdgpu_hostcall.cpp
@@ -0,0 +1,48 @@
+
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x hip -emit-llvm 
-fcuda-is-device -DFN_HOSTCALL \
+// RUN:   -o - %s | FileCheck --enable-var-scope %s
+
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x hip -emit-llvm 
-fcuda-is-device -DFN_PRINTF \
+// RUN:   -o - %s | FileCheck --enable-var-scope %s
+
+// CHECK: !llvm.module.flags
+// CHECK: "amdgpu_hostcall"
+
+
+typedef unsigned long int uint64_t;
+
+#define __device__ __attribute__((device))
+
+template struct HIP_vector_base;
+
+template
+struct HIP_vector_base { using Native_vec_ = T 
__attribute__((ext_vector_type(2))); };
+
+
+extern "C" __device__ uint64_t __ockl_fprintf_stderr_begin();
+
+extern "C" __device__ HIP_vector_base::Native_vec_ 
__ockl_call_host_function(
+uint64_t fptr, uint64_t arg0, uint64_t arg1, uint64_t arg2, uint64_t arg3, 
uint64_t arg4, uint64_t arg5, uint64_t arg6);
+
+
+#ifdef FN_HOSTCALL
+__device__ void fn_hostcall(uint64_t fptr, uint64_t* retval0, uint64_t* 
retval1) {
+  uint64_t arg0 = (uint64_t)fptr;
+  uint64_t arg1 = 0;
+  uint64_t arg2 = 0;
+  uint64_t arg3 = 0;
+  uint64_t arg4 = 0;
+  uint64_t arg5 = 0;
+  uint64_t arg6 = 0;
+  uint64_t arg7 = 0;
+
+  __ockl_call_host_function(arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7);
+}
+#endif
+
+#ifdef FN_PRINTF
+__device__ void fn_printf() {
+  auto msg = __ockl_fprintf_stderr_begin();
+}
+#endif
+
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -9194,6 +9194,11 @@
 llvm::Value *BlockLiteral) const override;
   bool shouldEmitStaticExternCAliases() const override;
   void setCUDAKernelCallingConvention(const FunctionType *) const override;
+
+  virtual void checkFunctionCallABI(CodeGenModule , SourceLocation CallLoc,
+const FunctionDecl *Caller,
+const FunctionDecl *Callee,
+const CallArgList ) const override;
 };
 }
 
@@ -9417,6 +9422,24 @@
   FT, FT->getExtInfo().withCallingConv(CC_OpenCLKernel));
 }
 
+void AMDGPUTargetCodeGenInfo::checkFunctionCallABI(CodeGenModule ,
+   SourceLocation CallLoc,
+   const FunctionDecl *Caller,
+   const FunctionDecl *Callee,
+   const CallArgList ) 
const
+{
+  // Set the "amdgpu_hostcall" module flag if "Callee" is a library function
+  // that uses AMDGPU hostcall mechanism.
+  if (Callee &&
+  (Callee->getName() == "__ockl_call_host_function" ||
+   Callee->getName() == "__ockl_fprintf_stderr_begin")) {
+llvm::Module  = CGM.getModule();
+if (!M.getModuleFlag("amdgpu_hostcall")) {
+  M.addModuleFlag(llvm::Module::Override, "amdgpu_hostcall", 1);
+}
+  }
+}
+
 
//===--===//
 // SPARC v8 ABI Implementation.
 // Based on the SPARC Compliance Definition version 2.4.1.


Index: clang/test/CodeGenHIP/amdgpu_hostcall.cpp
===
--- /dev/null
+++ clang/test/CodeGenHIP/amdgpu_hostcall.cpp
@@ -0,0 +1,48 @@
+
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x hip -emit-llvm -fcuda-is-device -DFN_HOSTCALL \
+// RUN:   -o - %s | FileCheck --enable-var-scope %s
+
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x hip -emit-llvm -fcuda-is-device -DFN_PRINTF \
+// RUN:   -o - %s | FileCheck --enable-var-scope %s
+
+// CHECK: !llvm.module.flags
+// CHECK: "amdgpu_hostcall"
+
+
+typedef unsigned long int uint64_t;
+
+#define __device__ __attribute__((device))
+
+template struct HIP_vector_base;
+
+template
+struct HIP_vector_base { using Native_vec_ = T __attribute__((ext_vector_type(2))); };
+
+
+extern "C" __device__ uint64_t __ockl_fprintf_stderr_begin();
+
+extern "C" __device__ HIP_vector_base::Native_vec_ __ockl_call_host_function(
+uint64_t fptr, uint64_t arg0, uint64_t arg1, uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6);
+
+
+#ifdef FN_HOSTCALL
+__device__ void fn_hostcall(uint64_t fptr, uint64_t* retval0, uint64_t* retval1) {
+  uint64_t arg0 = (uint64_t)fptr;
+  uint64_t arg1 = 0;
+  uint64_t arg2 = 0;
+  uint64_t arg3 = 0;
+  uint64_t arg4 = 0;
+  uint64_t arg5 = 0;
+  uint64_t 

[PATCH] D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls.

2021-12-07 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov added a comment.

In D115283#3177615 , @dfukalov wrote:

> Needs a test.

Yes, good point, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115283/new/

https://reviews.llvm.org/D115283

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls.

2021-12-07 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov created this revision.
kpyzhov added a reviewer: yaxunl.
kpyzhov added a project: AMDGPU.
Herald added subscribers: t-tye, tpr, dstuttard, kzhuravl.
kpyzhov requested review of this revision.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D115283

Files:
  clang/lib/CodeGen/TargetInfo.cpp


Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -9194,6 +9194,11 @@
 llvm::Value *BlockLiteral) const override;
   bool shouldEmitStaticExternCAliases() const override;
   void setCUDAKernelCallingConvention(const FunctionType *) const override;
+
+  virtual void checkFunctionCallABI(CodeGenModule , SourceLocation CallLoc,
+const FunctionDecl *Caller,
+const FunctionDecl *Callee,
+const CallArgList ) const override;
 };
 }
 
@@ -9417,6 +9422,24 @@
   FT, FT->getExtInfo().withCallingConv(CC_OpenCLKernel));
 }
 
+void AMDGPUTargetCodeGenInfo::checkFunctionCallABI(CodeGenModule ,
+   SourceLocation CallLoc,
+   const FunctionDecl *Caller,
+   const FunctionDecl *Callee,
+   const CallArgList ) 
const
+{
+  // Set the "amdgpu_hostcall" module flag if "Callee" is a library function
+  // that uses AMDGPU hostcall mechanism.
+  if (Callee &&
+  (Callee->getName() == "__ockl_call_host_function" ||
+   Callee->getName() == "__ockl_fprintf_stderr_begin")) {
+llvm::Module  = CGM.getModule();
+if (!M.getModuleFlag("amdgpu_hostcall")) {
+  M.addModuleFlag(llvm::Module::Override, "amdgpu_hostcall", 1);
+}
+  }
+}
+
 
//===--===//
 // SPARC v8 ABI Implementation.
 // Based on the SPARC Compliance Definition version 2.4.1.


Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -9194,6 +9194,11 @@
 llvm::Value *BlockLiteral) const override;
   bool shouldEmitStaticExternCAliases() const override;
   void setCUDAKernelCallingConvention(const FunctionType *) const override;
+
+  virtual void checkFunctionCallABI(CodeGenModule , SourceLocation CallLoc,
+const FunctionDecl *Caller,
+const FunctionDecl *Callee,
+const CallArgList ) const override;
 };
 }
 
@@ -9417,6 +9422,24 @@
   FT, FT->getExtInfo().withCallingConv(CC_OpenCLKernel));
 }
 
+void AMDGPUTargetCodeGenInfo::checkFunctionCallABI(CodeGenModule ,
+   SourceLocation CallLoc,
+   const FunctionDecl *Caller,
+   const FunctionDecl *Callee,
+   const CallArgList ) const
+{
+  // Set the "amdgpu_hostcall" module flag if "Callee" is a library function
+  // that uses AMDGPU hostcall mechanism.
+  if (Callee &&
+  (Callee->getName() == "__ockl_call_host_function" ||
+   Callee->getName() == "__ockl_fprintf_stderr_begin")) {
+llvm::Module  = CGM.getModule();
+if (!M.getModuleFlag("amdgpu_hostcall")) {
+  M.addModuleFlag(llvm::Module::Override, "amdgpu_hostcall", 1);
+}
+  }
+}
+
 //===--===//
 // SPARC v8 ABI Implementation.
 // Based on the SPARC Compliance Definition version 2.4.1.
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72723: Built-in functions for AMDGPU MFMA instructions.

2020-01-28 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov added a comment.

I've pushed the fix already. Is something still broken?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72723/new/

https://reviews.llvm.org/D72723



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72723: Built-in functions for AMDGPU MFMA instructions.

2020-01-28 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov added a comment.

Thanks for reporting this. Somehow I managed to push empty test files. I'll 
push the fix soon.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72723/new/

https://reviews.llvm.org/D72723



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72723: Built-in functions for AMDGPU MFMA instructions.

2020-01-28 Thread Konstantin Pyzhov via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG6d614a82a423: Summary: This CL adds clang declarations of 
built-in functions for AMDGPU MFMA… (authored by kpyzhov).
Herald added a subscriber: kerbowa.

Changed prior to commit:
  https://reviews.llvm.org/D72723?vs=238263=240898#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72723/new/

https://reviews.llvm.org/D72723

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx908-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td

Index: llvm/include/llvm/IR/IntrinsicsAMDGPU.td
===
--- llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -1725,105 +1725,125 @@
 def int_amdgcn_global_atomic_fadd: AMDGPUGlobalAtomicNoRtn;
 
 // llvm.amdgcn.mfma.f32.* vdst, srcA, srcB, srcC, cbsz, abid, blgp
-def int_amdgcn_mfma_f32_32x32x1f32 : Intrinsic<[llvm_v32f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v32f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x1f32 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_4x4x1f32 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x2f32 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x4f32 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x4f16 : Intrinsic<[llvm_v32f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v32f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x4f16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_4x4x4f16 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x8f16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x16f16 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_32x32x4i8 : Intrinsic<[llvm_v32i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v32i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_16x16x4i8 : Intrinsic<[llvm_v16i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v16i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_4x4x4i8 : Intrinsic<[llvm_v4i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v4i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_32x32x8i8 : Intrinsic<[llvm_v16i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v16i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_16x16x16i8 : Intrinsic<[llvm_v4i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v4i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x2bf16 : Intrinsic<[llvm_v32f32_ty],
-  [llvm_v2i16_ty, llvm_v2i16_ty, llvm_v32f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x2bf16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v2i16_ty, llvm_v2i16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_4x4x2bf16 : 

[PATCH] D72723: Built-in functions for AMDGPU MFMA instructions.

2020-01-15 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov added a comment.

In D72723#1821917 , @arsenm wrote:

> Having two subtarget features for the same feature is an issue


?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72723/new/

https://reviews.llvm.org/D72723



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72723: Built-in functions for AMDGPU MFMA instructions.

2020-01-15 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov updated this revision to Diff 238263.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72723/new/

https://reviews.llvm.org/D72723

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx908-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td

Index: llvm/include/llvm/IR/IntrinsicsAMDGPU.td
===
--- llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -1725,105 +1725,125 @@
 def int_amdgcn_global_atomic_fadd: AMDGPUGlobalAtomicNoRtn;
 
 // llvm.amdgcn.mfma.f32.* vdst, srcA, srcB, srcC, cbsz, abid, blgp
-def int_amdgcn_mfma_f32_32x32x1f32 : Intrinsic<[llvm_v32f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v32f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x1f32 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_4x4x1f32 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x2f32 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x4f32 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x4f16 : Intrinsic<[llvm_v32f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v32f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x4f16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_4x4x4f16 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x8f16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x16f16 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_32x32x4i8 : Intrinsic<[llvm_v32i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v32i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_16x16x4i8 : Intrinsic<[llvm_v16i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v16i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_4x4x4i8 : Intrinsic<[llvm_v4i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v4i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_32x32x8i8 : Intrinsic<[llvm_v16i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v16i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_16x16x16i8 : Intrinsic<[llvm_v4i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v4i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x2bf16 : Intrinsic<[llvm_v32f32_ty],
-  [llvm_v2i16_ty, llvm_v2i16_ty, llvm_v32f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x2bf16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v2i16_ty, llvm_v2i16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_4x4x2bf16 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_v2i16_ty, llvm_v2i16_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x4bf16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v2i16_ty, llvm_v2i16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, 

[PATCH] D72723: Built-in functions for AMDGPU MFMA instructions.

2020-01-15 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov updated this revision to Diff 238259.
kpyzhov added a reviewer: arsenm.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72723/new/

https://reviews.llvm.org/D72723

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx908-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td

Index: llvm/include/llvm/IR/IntrinsicsAMDGPU.td
===
--- llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -1725,105 +1725,125 @@
 def int_amdgcn_global_atomic_fadd: AMDGPUGlobalAtomicNoRtn;
 
 // llvm.amdgcn.mfma.f32.* vdst, srcA, srcB, srcC, cbsz, abid, blgp
-def int_amdgcn_mfma_f32_32x32x1f32 : Intrinsic<[llvm_v32f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v32f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x1f32 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_4x4x1f32 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x2f32 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x4f32 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x4f16 : Intrinsic<[llvm_v32f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v32f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x4f16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_4x4x4f16 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x8f16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x16f16 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_32x32x4i8 : Intrinsic<[llvm_v32i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v32i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_16x16x4i8 : Intrinsic<[llvm_v16i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v16i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_4x4x4i8 : Intrinsic<[llvm_v4i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v4i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_32x32x8i8 : Intrinsic<[llvm_v16i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v16i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_16x16x16i8 : Intrinsic<[llvm_v4i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v4i32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x2bf16 : Intrinsic<[llvm_v32f32_ty],
-  [llvm_v2i16_ty, llvm_v2i16_ty, llvm_v32f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x2bf16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v2i16_ty, llvm_v2i16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_4x4x2bf16 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_v2i16_ty, llvm_v2i16_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x4bf16 : Intrinsic<[llvm_v16f32_ty],
-  

[PATCH] D72723: Built-in functions for AMDGPU MFMA instructions.

2020-01-14 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov created this revision.
kpyzhov added a reviewer: yaxunl.
kpyzhov added a project: AMDGPU.
Herald added subscribers: llvm-commits, cfe-commits, hiraditya, t-tye, tpr, 
dstuttard, nhaehnle, wdng, jvesely, kzhuravl, arsenm.
Herald added projects: clang, LLVM.

Added declarations of MFMA built-ins to clang. Modified declarations of 
corresponding LLVM intrinsics. Added tests for new built-ins.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D72723

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx908-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

Index: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
===
--- llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
+++ llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
@@ -353,6 +353,7 @@
   bool HasMAIInsts;
   bool HasPkFmacF16Inst;
   bool HasAtomicFaddInsts;
+  bool HasMfma1Insts;
   bool EnableSRAMECC;
   bool DoesNotSupportSRAMECC;
   bool HasNoSdstCMPX;
Index: llvm/lib/Target/AMDGPU/AMDGPU.td
===
--- llvm/lib/Target/AMDGPU/AMDGPU.td
+++ llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -484,6 +484,12 @@
   "Does not need SW waitstates"
 >;
 
+def FeatureMfma1Insts : SubtargetFeature<"mfma1-insts",
+  "HasMfma1Insts",
+  "true",
+  "Has MFMA1 instructions"
+>;
+
 //======//
 // Subtarget Features (options and debugging)
 //======//
@@ -818,6 +824,7 @@
FeatureAtomicFaddInsts,
FeatureSRAMECC,
FeatureMFMAInlineLiteralBug,
+   FeatureMfma1Insts,
FeatureCodeObjectV3]>;
 
 def FeatureISAVersion9_0_9 : FeatureSet<
@@ -1161,6 +1168,9 @@
 def EnableLateCFGStructurize : Predicate<
   "EnableLateStructurizeCFG">;
 
+def HasMfma1Insts : Predicate<"Subtarget->hasMfma1Insts()">,
+  AssemblerPredicate<"FeatureMfma1Insts">;
+
 // Include AMDGPU TD files
 include "SISchedule.td"
 include "GCNProcessors.td"
Index: llvm/include/llvm/IR/IntrinsicsAMDGPU.td
===
--- llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -1725,105 +1725,125 @@
 def int_amdgcn_global_atomic_fadd: AMDGPUGlobalAtomicNoRtn;
 
 // llvm.amdgcn.mfma.f32.* vdst, srcA, srcB, srcC, cbsz, abid, blgp
-def int_amdgcn_mfma_f32_32x32x1f32 : Intrinsic<[llvm_v32f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v32f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x1f32 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_4x4x1f32 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x2f32 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x4f32 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_float_ty, llvm_float_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x4f16 : Intrinsic<[llvm_v32f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v32f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x4f16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_4x4x4f16 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_32x32x8f16 : Intrinsic<[llvm_v16f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v16f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_f32_16x16x16f16 : Intrinsic<[llvm_v4f32_ty],
-  [llvm_v4f16_ty, llvm_v4f16_ty, llvm_v4f32_ty,
-   llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
-   [IntrConvergent, IntrNoMem, ImmArg<3>, ImmArg<4>, ImmArg<5>]>;
-
-def int_amdgcn_mfma_i32_32x32x4i8 : Intrinsic<[llvm_v32i32_ty],
-  [llvm_i32_ty, llvm_i32_ty, llvm_v32i32_ty,
- 

[PATCH] D63277: [CUDA][HIP] Don't set "comdat" attribute for CUDA device stub functions.

2019-06-19 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov added a comment.

In D63277#1550870 , @rjmccall wrote:

> This optimization is disabled for functions not in COMDAT sections?  Is that 
> documented somewhere?


It is documented here: 
https://docs.microsoft.com/en-us/cpp/build/reference/opt-optimizations?view=vs-2019


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63277/new/

https://reviews.llvm.org/D63277



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63277: [CUDA][HIP] Don't set "comdat" attribute for CUDA device stub functions.

2019-06-19 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov updated this revision to Diff 205655.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63277/new/

https://reviews.llvm.org/D63277

Files:
  clang/lib/CodeGen/CodeGenModule.cpp


Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3706,6 +3706,11 @@
   if (!CGM.supportsCOMDAT())
 return false;
 
+  // Do not set COMDAT attribute for CUDA/HIP stub functions to prevent
+  // them being "merged" by the COMDAT Folding linker optimization.
+  if (D.hasAttr())
+return false;
+
   if (D.hasAttr())
 return true;
 


Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3706,6 +3706,11 @@
   if (!CGM.supportsCOMDAT())
 return false;
 
+  // Do not set COMDAT attribute for CUDA/HIP stub functions to prevent
+  // them being "merged" by the COMDAT Folding linker optimization.
+  if (D.hasAttr())
+return false;
+
   if (D.hasAttr())
 return true;
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63277: [CUDA][HIP] Don't set "comdat" attribute for CUDA device stub functions.

2019-06-19 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov added inline comments.



Comment at: clang/lib/CodeGen/CodeGenModule.cpp:4294
   setNonAliasAttributes(GD, Fn);
   SetLLVMFunctionAttributesForDefinition(D, Fn);
   if (const ConstructorAttr *CA = D->getAttr())

tra wrote:
> Perhaps this should be pushed further down into `shouldBeInCOMDAT()` which 
> seems to be the right place to decide whether something is not suitable for 
> comdat.
Good idea. Let me do that.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63277/new/

https://reviews.llvm.org/D63277



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63277: Don't set "comdat" attribute for CUDA device stub functions.

2019-06-13 Thread Konstantin Pyzhov via Phabricator via cfe-commits
kpyzhov created this revision.
kpyzhov added a reviewer: rjmccall.
kpyzhov added projects: clang, AMDGPU.
Herald added a subscriber: cfe-commits.

When compiling the HOST part of CUDA programs, clang replaces device kernels 
with so-called "stub" functions that contains a few calls to the Runtime API 
functions (which set the kernel argument values and launch the kernel itself). 
The stub functions are very small, so they may have identical generated code 
for different kernels with same arguments.
The Microsoft Linker has an optimization called "COMDAT Folding". It's able to 
detect functions with identical binary code and "merge" them, i.e. replace 
calls and pointers to those different functions with call/pointer to one of 
them and eliminate other copies.

Here is the description of this optimization: 
https://docs.microsoft.com/en-us/cpp/build/reference/opt-optimizations?view=vs-2019
This page contains a warning about "COMDAT Folding":
//"Because /OPT:ICF can cause the same address to be assigned to different 
functions or read-only data members (that is, const variables when compiled by 
using /Gy), it can break a program that depends on unique addresses for 
functions or read-only data members."//
That's exactly what happens to the CUDA stub functions.

This change disables setting "COMDAT" attribute for CUDA stub functions in the 
HOST code.


Repository:
  rC Clang

https://reviews.llvm.org/D63277

Files:
  clang/lib/CodeGen/CodeGenModule.cpp


Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -4291,14 +4291,15 @@

   MaybeHandleStaticInExternC(D, Fn);

-
-  maybeSetTrivialComdat(*D, *Fn);
+  if (!D->hasAttr()) {
+  maybeSetTrivialComdat(*D, *Fn);
+  }

   CodeGenFunction(*this).GenerateCode(D, Fn, FI);

   setNonAliasAttributes(GD, Fn);
   SetLLVMFunctionAttributesForDefinition(D, Fn);

   if (const ConstructorAttr *CA = D->getAttr())
 AddGlobalCtor(Fn, CA->getPriority());
   if (const DestructorAttr *DA = D->getAttr())


Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -4291,14 +4291,15 @@

   MaybeHandleStaticInExternC(D, Fn);

-
-  maybeSetTrivialComdat(*D, *Fn);
+  if (!D->hasAttr()) {
+  maybeSetTrivialComdat(*D, *Fn);
+  }

   CodeGenFunction(*this).GenerateCode(D, Fn, FI);

   setNonAliasAttributes(GD, Fn);
   SetLLVMFunctionAttributesForDefinition(D, Fn);

   if (const ConstructorAttr *CA = D->getAttr())
 AddGlobalCtor(Fn, CA->getPriority());
   if (const DestructorAttr *DA = D->getAttr())
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits