[PATCH] D146023: [AMDGPU] Remove Code Object V2

2023-09-07 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh abandoned this revision.
Pierre-vh added a comment.

https://github.com/llvm/llvm-project/pull/65715


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146023/new/

https://reviews.llvm.org/D146023

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146023: [AMDGPU] Remove Code Object V2

2023-08-22 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

ping


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146023/new/

https://reviews.llvm.org/D146023

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146023: [AMDGPU] Remove Code Object V2

2023-08-04 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:5598
   return ParseDirectiveHSAMetadata();
   } else {
-if (IDVal == ".hsa_code_object_version")

cfang wrote:
> Are you sure Non-HSA does not have the four directives you deleted?  
I'm really not sure, I saw `hsa` in the name and I thought it only applied to 
HSA, but some tests are failing.
I'll leave them in until someone can answer for sure.



Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h:46
 enum {
-  AMDHSA_COV2 = 2,
   AMDHSA_COV3 = 3,

cfang wrote:
> Should we keep this field, and just mention "unsupported"?
I'm not sure about that, I assume that we're removing as many traces of V2 as 
possible from the backend. No point in keeping an unused enum entry IMO, but 
I'm okay with keeping it if there's a good reason to



Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h:59
 /// false otherwise.
 bool isHsaAbiVersion3(const MCSubtargetInfo *STI);
 /// \returns True if HSA OS ABI Version identification is 4,

cfang wrote:
> Are all these "isHsaAbiVersionX" no longer needed? 
Yes they're needed because they implicitly check for HSA as well


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146023/new/

https://reviews.llvm.org/D146023

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146023: [AMDGPU] Remove Code Object V2

2023-08-01 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:5583
 
+  // FIXME: Shouldn't be needed anymore? Should we remove this directive
+  // entirely? See `amdpal-elf.ll` - the output ASM contains both amdgcn_target

This is an issue with `amdpal-elf.ll`, the run line with `llvm-mc` fails 
because it can't parse `amd_amdgpu_isa`.
Not sure how to fix this. Should we be able to read that directive, or should 
we just never emit it?
Some tests use it, and stg also emits it in the same test, so it's not new. 
It's just no longer parse-able after `isHsaAbiVersion3AndAbove` was removed - I 
suspect that function returned false for non-HSA OSes and was mistakenly used 
here to check for `!= AMDHSA`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146023/new/

https://reviews.llvm.org/D146023

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D152251: [clang][CodeGen] Fix GPU-specific attributes being dropped by bitcode linking

2023-06-07 Thread Pierre van Houtryve via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Pierre-vh marked an inline comment as done.
Closed by commit rG23431b524603: [clang][CodeGen] Fix GPU-specific attributes 
being dropped by bitcode linking (authored by Pierre-vh).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152251/new/

https://reviews.llvm.org/D152251

Files:
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/CodeGenModule.h
  clang/test/CodeGenCUDA/Inputs/ocml-sample-target-attrs.cl
  clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
  clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu

Index: clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
@@ -0,0 +1,48 @@
+// Verify the behavior of the +gfxN-insts in the way that
+// rocm-device-libs should be built with. e.g. If the device libraries has a function
+// with "+gfx11-insts", that attribute should still be present after linking and not
+// overwritten with the current target's settings.
+
+// This is important because at this time, many device-libs functions that are only
+// available on some GPUs put an attribute such as "+gfx11-insts" so that
+// AMDGPURemoveIncompatibleFunctions can detect & remove them if needed.
+
+// Build the fake device library in the way rocm-device-libs should be built.
+//
+// RUN: %clang_cc1 -x cl -triple amdgcn-amd-amdhsa\
+// RUN:   -mcode-object-version=none -emit-llvm-bc \
+// RUN:   %S/Inputs/ocml-sample-target-attrs.cl -o %t.bc
+
+// Check the default behavior
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx803 -fcuda-is-device \
+// RUN:   -mlink-builtin-bitcode %t.bc \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefixes=CHECK,INTERNALIZE
+
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1101 -fcuda-is-device \
+// RUN:   -mlink-builtin-bitcode %t.bc -emit-llvm %s -o - | FileCheck %s --check-prefixes=CHECK,INTERNALIZE
+
+// Check the case where no internalization is performed
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx803 \
+// RUN:   -fcuda-is-device -mlink-bitcode-file %t.bc -emit-llvm %s -o -  | FileCheck %s --check-prefixes=CHECK,NOINTERNALIZE
+
+// Check the case where no internalization is performed
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1101 \
+// RUN:   -fcuda-is-device -mlink-bitcode-file %t.bc -emit-llvm %s -o - | FileCheck %s --check-prefixes=CHECK,NOINTERNALIZE
+
+
+// CHECK: define {{.*}} i64 @do_intrin_stuff() #[[ATTR:[0-9]+]]
+// INTERNALIZE: attributes #[[ATTR]] = {{.*}} "target-cpu"="gfx{{.*}}" "target-features"="+gfx11-insts"
+// NOINTERNALIZE: attributes #[[ATTR]] = {{.*}} "target-features"="+gfx11-insts"
+
+#define __device__ __attribute__((device))
+#define __global__ __attribute__((global))
+
+typedef unsigned long ulong;
+
+extern "C" {
+__device__ ulong do_intrin_stuff(void);
+
+__global__ void kernel_f16(ulong* out) {
+*out = do_intrin_stuff();
+  }
+}
Index: clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
===
--- clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
+++ clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
@@ -132,24 +132,32 @@
 
 // Default mode relies on the implicit check-not for the denormal-fp-math.
 
-// PSZ: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// PSZ: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// PSZ: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
+// PSZ: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign"
+// PSZ-SAME: "target-cpu"="gfx803"
+// PSZ: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign"
+// PSZ-SAME: "target-cpu"="gfx803"
+// PSZ: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign"
+// PSZ-SAME: "target-cpu"="gfx803"
 
 // FIXME: Should check-not "denormal-fp-math" within the line
-// IEEEF64-PSZF32: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// IEEEF64-PSZF32: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// IEEEF64-PSZF32: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
+// IEEEF64-PSZF32: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign"
+// IEEEF64-PSZF32-SAME

[PATCH] D152251: [clang][CodeGen] Fix GPU-specific attributes being dropped by bitcode linking

2023-06-07 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh marked 2 inline comments as done.
Pierre-vh added inline comments.



Comment at: 
clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu:34
+// CHECK: define {{.*}} i32 @do_intrin_stuff() #[[ATTR:[0-9]+]]
+// CHECK: attributes #[[ATTR]] = {{.*}} "target-features"="+gfx11-insts"
+

arsenm wrote:
> arsenm wrote:
> > Also should make sure target-cpu was set
> Did this previously receive the target-features spam implied by the target?
> Did this previously receive the target-features spam implied by the target?

I think it did, the attributes were filled with things like "+gfx9-insts", etc.

> Do we know why internalize keeps the target-cpu attribute but non-internalize 
> does not?

PropagateAttrs is only set for -mlink-builtin-bitcode, see 
CompilerInvocation.cpp@1888 (where OPT_mlink_builtin_bitcode is processed)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152251/new/

https://reviews.llvm.org/D152251

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146023: [AMDGPU] Remove Code Object V2

2023-06-06 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp:520
 
-  assert(Func.getCallingConv() == CallingConv::AMDGPU_KERNEL ||
- Func.getCallingConv() == CallingConv::SPIR_KERNEL);
+  if (Func.getCallingConv() != CallingConv::AMDGPU_KERNEL &&
+  Func.getCallingConv() != CallingConv::SPIR_KERNEL)

scott.linder wrote:
> I don't follow this change; was the assert just incorrect previously?
It's for `CodeGen/AMDGPU/no-hsa-graphics-shaders.ll`, it crashes otherwise.

An alternative can be to change this:
```
  if (STM.isAmdHsaOS())
HSAMetadataStream->emitKernel(*MF, CurrentProgramInfo);
```
So it checks the CC and doesn't call the function if the CC is incorrect. I 
don't mind either solution



Comment at: llvm/test/MC/AMDGPU/hsa-gfx10.s:3
-// RUN: llvm-mc -filetype=obj -triple amdgcn--amdhsa -mcpu=gfx1010 
--amdhsa-code-object-version=2 -mattr=-wavefrontsize32,+wavefrontsize64 
-show-encoding %s | llvm-readobj -S --sd --syms - | FileCheck %s 
--check-prefix=ELF
-
-// ELF: Section {

scott.linder wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > I thought we were still going to be able to read old objects 
> > I think llvm-readobj uses all of the MC/Target infrastructure so if we 
> > remove emission, we also remove reading, no?
> > 
> > I'm actually not sure if we plan to let readobj/readelf read COV2 object 
> > files, it's an interesting question
> I think this is my biggest concern. Do we incur a huge maintenance burden 
> that warrants dropping read support? How much code do we really need to 
> maintain to keep the readobj/objdump like tools universal?
> 
> @t-tye do you have any thoughts on whether we should maintain backwards 
> compatibility in the LLVM tooling, even if we drop generation support?
It's been a while since I wrote this but IIRC there was a discussion about it 
and it was fine to remove read support. An alternative may be to still identify 
code object V2, but not read the metadata and instead print a warning about the 
file format being deprecated?

Or I think it's YAML, maybe we can just raw dump the MD and print a warning?

Most of the maintenance cost would be in the MD mapper which is almost 500 
lines of code that'd just be there for the sake of "maybe some needs to read MD"

If we go with one of the above suggestions I can just add a test using yml2obj 
that emits a V2 file


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146023/new/

https://reviews.llvm.org/D146023

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D152251: [clang][CodeGen] Fix GPU-specific attributes being dropped by bitcode linking

2023-06-06 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 528794.
Pierre-vh added a comment.

Fix check lines, I think it's just FileCheck weirdness


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152251/new/

https://reviews.llvm.org/D152251

Files:
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/CodeGenModule.h
  clang/test/CodeGenCUDA/Inputs/ocml-sample-target-attrs.cl
  clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
  clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu

Index: clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
@@ -0,0 +1,48 @@
+// Verify the behavior of the +gfxN-insts in the way that
+// rocm-device-libs should be built with. e.g. If the device libraries has a function
+// with "+gfx11-insts", that attribute should still be present after linking and not
+// overwritten with the current target's settings.
+
+// This is important because at this time, many device-libs functions that are only
+// available on some GPUs put an attribute such as "+gfx11-insts" so that
+// AMDGPURemoveIncompatibleFunctions can detect & remove them if needed.
+
+// Build the fake device library in the way rocm-device-libs should be built.
+//
+// RUN: %clang_cc1 -x cl -triple amdgcn-amd-amdhsa\
+// RUN:   -mcode-object-version=none -emit-llvm-bc \
+// RUN:   %S/Inputs/ocml-sample-target-attrs.cl -o %t.bc
+
+// Check the default behavior
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx803 -fcuda-is-device \
+// RUN:   -mlink-builtin-bitcode %t.bc \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefixes=CHECK,INTERNALIZE
+
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1101 -fcuda-is-device \
+// RUN:   -mlink-builtin-bitcode %t.bc -emit-llvm %s -o - | FileCheck %s --check-prefixes=CHECK,INTERNALIZE
+
+// Check the case where no internalization is performed
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx803 \
+// RUN:   -fcuda-is-device -mlink-bitcode-file %t.bc -emit-llvm %s -o -  | FileCheck %s --check-prefixes=CHECK,NOINTERNALIZE
+
+// Check the case where no internalization is performed
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1101 \
+// RUN:   -fcuda-is-device -mlink-bitcode-file %t.bc -emit-llvm %s -o - | FileCheck %s --check-prefixes=CHECK,NOINTERNALIZE
+
+
+// CHECK: define {{.*}} i64 @do_intrin_stuff() #[[ATTR:[0-9]+]]
+// INTERNALIZE: attributes #[[ATTR]] = {{.*}} "target-cpu"="gfx{{.*}}" "target-features"="+gfx11-insts"
+// NOINTERNALIZE: attributes #[[ATTR]] = {{.*}} "target-features"="+gfx11-insts"
+
+#define __device__ __attribute__((device))
+#define __global__ __attribute__((global))
+
+typedef unsigned long ulong;
+
+extern "C" {
+__device__ ulong do_intrin_stuff(void);
+
+__global__ void kernel_f16(ulong* out) {
+*out = do_intrin_stuff();
+  }
+}
Index: clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
===
--- clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
+++ clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
@@ -132,24 +132,32 @@
 
 // Default mode relies on the implicit check-not for the denormal-fp-math.
 
-// PSZ: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// PSZ: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// PSZ: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
+// PSZ: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign"
+// PSZ-SAME: "target-cpu"="gfx803"
+// PSZ: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign"
+// PSZ-SAME: "target-cpu"="gfx803"
+// PSZ: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign"
+// PSZ-SAME: "target-cpu"="gfx803"
 
 // FIXME: Should check-not "denormal-fp-math" within the line
-// IEEEF64-PSZF32: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// IEEEF64-PSZF32: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// IEEEF64-PSZF32: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
+// IEEEF64-PSZF32: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign"
+// IEEEF64-PSZF32-SAME: "target-cpu"="gfx803"
+// IEEEF64-PSZF32: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign"
+// IEEEF64-PSZF32-SAME: "target-cpu"="gfx803"
+// IEEEF

[PATCH] D152251: [clang][CodeGen] Fix GPU-specific attributes being dropped by bitcode linking

2023-06-06 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 528793.
Pierre-vh marked 3 inline comments as done.
Pierre-vh added a comment.

target-cpu wasn't set so I tried something a bit different, but now I still 
need to remove the target-cpu check in the old test.
For some reason filecheck doesn't match it but it's there?

  
/home/pierre/work/trunk/llvm-project/clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu:150:28:
 error: IEEEF32-PSZF64-DYNFULL: expected string not found in input
  // IEEEF32-PSZF64-DYNFULL: #[[$FUNCATTR]] = { {{.*}} 
"denormal-fp-math"="preserve-sign,preserve-sign" 
"denormal-fp-math-f32"="ieee,ieee" {{.*}} "target-cpu"="gfx803" {{.*}} }
 ^
  :260:427: note: scanning from here
  attributes #0 = { convergent mustprogress noinline norecurse nounwind optnone 
"amdgpu-flat-work-group-size"="1,1024" 
"denormal-fp-math"="preserve-sign,preserve-sign" 
"denormal-fp-math-f32"="ieee,ieee" "no-trapping-math"="true" 
"stack-protector-buffer-size"="8" "target-cpu"="gfx803" 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
 "uniform-work-group-size"="true" }





^
  :260:427: note: with "$FUNCATTR" equal to "1"
  attributes #0 = { convergent mustprogress noinline norecurse nounwind optnone 
"amdgpu-flat-work-group-size"="1,1024" 
"denormal-fp-math"="preserve-sign,preserve-sign" 
"denormal-fp-math-f32"="ieee,ieee" "no-trapping-math"="true" 
"stack-protector-buffer-size"="8" "target-cpu"="gfx803" 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
 "uniform-work-group-size"="true" }





^
  :261:85: note: possible intended match here
  attributes #1 = { convergent mustprogress nofree norecurse nosync nounwind 
willreturn memory(none) "denormal-fp-math"="preserve-sign,preserve-sign" 
"denormal-fp-math-f32"="ieee,ieee" "no-trapping-math"="true" 
"stack-protector-buffer-size"="8" "target-cpu"="gfx803" }
 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152251/new/

https://reviews.llvm.org/D152251

Files:
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/CodeGenModule.h
  clang/test/CodeGenCUDA/Inputs/ocml-sample-target-attrs.cl
  clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
  clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu

Index: clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
@@ -0,0 +1,48 @@
+// Verify the behavior of the +gfxN-insts in the way that
+// rocm-device-libs should be built with. e.g. If the device libraries has a function
+// with "+gfx11-insts", that attribute should still be present after linking and not
+// overwritten with the current target's settings.
+
+// This is important because at this time, many device-libs functions that are only
+// available on some GPUs put an attribute such as "+gfx11-insts" so that
+// AMDGPURemoveIncompatibleFunctions can detect & remove them if needed.
+
+// Build the fake device library in the way rocm-device-libs should be built.
+//
+// RUN: %clang_cc1 -x cl -triple amdgcn-amd-amdhsa\
+// RUN:   -mcode-object-version=none -emit-llvm-bc \
+// RUN:   %S/Inputs/ocml-sample-target-attrs.cl -o %t.bc
+
+// Check the default behavior
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx803 -fcuda-is-device \
+// RUN:   -mlink-builtin-bitcode %t.bc \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefixes=CHECK,INTERNALIZE
+
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1101 -fcuda-is-device \
+// RUN:   -mlink-builtin-bitcode %t.bc -emit-llvm %s -o - | FileCheck %s --check-prefixes=CHECK,INTERNALIZE
+
+// Check the case where no internalization is performed
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx803 \
+// RUN:   -fcuda-is-device -mlink-bitcode-file %t.bc -emit-llvm %s -o -  | FileCheck

[PATCH] D152251: [clang][CodeGen] Fix GPU-specific attributes being dropped by bitcode linking

2023-06-06 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh created this revision.
Pierre-vh added reviewers: arsenm, tra, foad.
Herald added subscribers: StephenFan, wenlei, tpr.
Herald added a project: All.
Pierre-vh requested review of this revision.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

Device libs make use of patterns like this:

  __attribute__((target("gfx11-insts")))
  static unsigned do_intrin_stuff(void)
  {
return __builtin_amdgcn_s_sendmsg_rtnl(0x0);
  }

For functions that are assumed to be eliminated if the currennt GPU target 
doesn't support them.
At O0 such functions aren't eliminated by common optimizations but often by 
AMDGPURemoveIncompatibleFunctions instead, which sees the "+gfx11-insts" 
attribute on, say, GFX9 and knows it's not valid, so it removes the function.

D142907  accidentally made it so such 
attributes were dropped during bitcode linking, making it impossible for 
RemoveIncompatibleFunctions to catch the functions and causing ISel to catch 
fire eventually.

This fixes the issue and adds a new test to ensure we don't accidentally fall 
into this trap again.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D152251

Files:
  clang/lib/CodeGen/CGCall.cpp
  clang/test/CodeGenCUDA/Inputs/ocml-sample-target-attrs.cl
  clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
  clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu

Index: clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
@@ -0,0 +1,47 @@
+// Verify the behavior of the +gfxN-insts in the way that
+// rocm-device-libs should be built with. e.g. If the device libraries has a function
+// with "+gfx11-insts", that attribute should still be present after linking and not
+// overwritten with the current target's settings.
+
+// This is important because at this time, many device-libs functions that are only
+// available on some GPUs put an attribute such as "+gfx11-insts" so that
+// AMDGPURemoveIncompatibleFunctions can detect & remove them if needed.
+
+// Build the fake device library in the way rocm-device-libs should be built.
+//
+// RUN: %clang_cc1 -x cl -triple amdgcn-amd-amdhsa\
+// RUN:   -mcode-object-version=none -emit-llvm-bc \
+// RUN:   %S/Inputs/ocml-sample-target-attrs.cl -o %t.bc
+
+// Check the default behavior
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx803 -fcuda-is-device \
+// RUN:   -mlink-builtin-bitcode %t.bc \
+// RUN:   -emit-llvm %s -o - | FileCheck %s
+
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1101 -fcuda-is-device \
+// RUN:   -mlink-builtin-bitcode %t.bc -emit-llvm %s -o - | FileCheck %s
+
+// Check the case where no internalization is performed
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx803 \
+// RUN:   -fcuda-is-device -mlink-bitcode-file %t.bc -emit-llvm %s -o -  | FileCheck %s
+
+// Check the case where no internalization is performed
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1101 \
+// RUN:   -fcuda-is-device -mlink-bitcode-file %t.bc -emit-llvm %s -o - | FileCheck %s
+
+
+// CHECK: define {{.*}} i32 @do_intrin_stuff() #[[ATTR:[0-9]+]]
+// CHECK: attributes #[[ATTR]] = {{.*}} "target-features"="+gfx11-insts"
+
+#define __device__ __attribute__((device))
+#define __global__ __attribute__((global))
+
+typedef _Float16 half;
+
+extern "C" {
+__device__ unsigned do_intrin_stuff(void);
+
+__global__ void kernel_f16(unsigned* out) {
+*out = do_intrin_stuff();
+  }
+}
Index: clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
===
--- clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
+++ clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu
@@ -132,26 +132,26 @@
 
 // Default mode relies on the implicit check-not for the denormal-fp-math.
 
-// PSZ: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// PSZ: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// PSZ: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
+// PSZ: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" {{.*}} }
+// PSZ: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} }
+// PSZ: #[[$WEAK_FUNCATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} }
 
 // FIXME: Should check-not "denormal-fp-math" within the line
-// IEEEF64-PSZF32: #[[$KERNELATTR]] = { {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign" {{.*}} "target-cpu"="gfx803" {{.*}} }
-// IEEEF64-PSZF32: #[[$FUNCATTR]] = { {{.*}} "denormal

[PATCH] D146023: [AMDGPU] Remove Code Object V2

2023-06-01 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

ping


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146023/new/

https://reviews.llvm.org/D146023

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146023: [AMDGPU] Remove Code Object V2

2023-05-11 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh removed a reviewer: jdoerfert.
Pierre-vh added a comment.
Herald added a reviewer: jdoerfert.

ping


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146023/new/

https://reviews.llvm.org/D146023

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D149986: AMDGPU: Force sc0 and sc1 on stores for gfx940 and gfx941

2023-05-11 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

I think that if this is a new property of the GFX940/941 targets, and turning 
it off shouldn't be possible, we shouldn't even bother with a feature and just 
set a bool in the ST for those targets




Comment at: llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp:524
+ SIAtomicAddrSpace::NONE)
+  return enableSC0Bit(MI) | enableSC1Bit(MI);
+return false;

kzhuravl wrote:
> jmmartinez wrote:
> > NIT: Is the use of the bitwise or " | " intended? I'd use the logical or " 
> > || " instead.
> It is intentional, we need both SC0 and SC1 bits set. If I switch this to || 
> it will short circuit and not invoke enableSC1Bit.
IMHO then it needs a comment to explain that it's intentional, otherwise some 
innocent maintainer in the future could think it's a typo and change it


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149986/new/

https://reviews.llvm.org/D149986

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146945: Add Release Note for -mcode-object-v3 removal

2023-03-27 Thread Pierre van Houtryve via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG4559e8e8cd99: Add Release Note for -mcode-object-v3 removal 
(authored by Pierre-vh).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146945/new/

https://reviews.llvm.org/D146945

Files:
  clang/docs/ReleaseNotes.rst


Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -279,6 +279,8 @@
   undefined symbols in the created module to be a linker error. To prevent 
this,
   pass ``-Wl,--undefined`` if compiling directly, or ``-Xoffload-linker
   --undefined`` if using an offloading language.
+- The deprecated ``-mcode-object-v3`` and ``-mno-code-object-v3`` command-line 
+  options have been removed.
 
 X86 Support
 ^^^


Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -279,6 +279,8 @@
   undefined symbols in the created module to be a linker error. To prevent this,
   pass ``-Wl,--undefined`` if compiling directly, or ``-Xoffload-linker
   --undefined`` if using an offloading language.
+- The deprecated ``-mcode-object-v3`` and ``-mno-code-object-v3`` command-line 
+  options have been removed.
 
 X86 Support
 ^^^
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146945: Add Release Note for -mcode-object-v3 removal

2023-03-27 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

In D146945#4223677 , @aaron.ballman 
wrote:

> LGTM! Adding the clang-vendors group for awareness since this technically 
> could break some downstream (I don't expect it to given that this was 
> deprecated, but you never know).

Can it land or should I wait for someone from that group to comment first?
Thanks


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146945/new/

https://reviews.llvm.org/D146945

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D145671: [clang] Remove legacy -m(no)-code-object-v3 options

2023-03-27 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

In D145671#4223655 , @c wrote:

> These changes need a release note. Given that this has been deprecated, is 
> there documentation that should have been removed as well?

I can't find anything in the docs about `code-object-v3` so I don't think 
there's any documentation left. For the release note, I opened a diff: D146945 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145671/new/

https://reviews.llvm.org/D145671

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146945: Add Release Note for -mcode-object-v3 removal

2023-03-27 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh created this revision.
Pierre-vh added a reviewer: aaron.ballman.
Herald added a project: All.
Pierre-vh requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D146945

Files:
  clang/docs/ReleaseNotes.rst


Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -279,6 +279,8 @@
   undefined symbols in the created module to be a linker error. To prevent 
this,
   pass ``-Wl,--undefined`` if compiling directly, or ``-Xoffload-linker
   --undefined`` if using an offloading language.
+- The deprecated ``-mcode-object-v3`` and ``-mno-code-object-v3`` command-line 
+  options have been removed.
 
 X86 Support
 ^^^


Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -279,6 +279,8 @@
   undefined symbols in the created module to be a linker error. To prevent this,
   pass ``-Wl,--undefined`` if compiling directly, or ``-Xoffload-linker
   --undefined`` if using an offloading language.
+- The deprecated ``-mcode-object-v3`` and ``-mno-code-object-v3`` command-line 
+  options have been removed.
 
 X86 Support
 ^^^
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D145671: [clang] Remove legacy -m(no)-code-object-v3 options

2023-03-27 Thread Pierre van Houtryve via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG30e7cd48778b: [clang] Remove legacy -m(no)-code-object-v3 
options (authored by Pierre-vh).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145671/new/

https://reviews.llvm.org/D145671

Files:
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/test/Driver/amdgpu-features-as.s
  clang/test/Driver/amdgpu-features.c
  clang/test/Driver/hip-code-object-version.hip

Index: clang/test/Driver/hip-code-object-version.hip
===
--- clang/test/Driver/hip-code-object-version.hip
+++ clang/test/Driver/hip-code-object-version.hip
@@ -2,27 +2,16 @@
 
 // Check bundle ID for code object v2.
 
-// RUN: %clang -### --target=x86_64-linux-gnu \
-// RUN:   -mno-code-object-v3 \
-// RUN:   --offload-arch=gfx906 -nogpulib \
-// RUN:   %s 2>&1 | FileCheck -check-prefixes=V2,V2-WARN %s
-
 // RUN: %clang -### --target=x86_64-linux-gnu \
 // RUN:   -mcode-object-version=2 \
 // RUN:   --offload-arch=gfx906 -nogpulib \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=V2 %s
 
-// V2-WARN: warning: argument '-mno-code-object-v3' is deprecated, use '-mcode-object-version=2' instead [-Wdeprecated]
 // V2: "-mllvm" "--amdhsa-code-object-version=2"
 // V2: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"
 
 // Check bundle ID for code object v3.
 
-// RUN: %clang -### --target=x86_64-linux-gnu \
-// RUN:   -mcode-object-v3 \
-// RUN:   --offload-arch=gfx906 -nogpulib \
-// RUN:   %s 2>&1 | FileCheck -check-prefixes=V3,V3-WARN %s
-
 // RUN: %clang -### --target=x86_64-linux-gnu \
 // RUN:   -mcode-object-version=3 \
 // RUN:   --offload-arch=gfx906 -nogpulib \
@@ -33,7 +22,6 @@
 // RUN:   --offload-arch=gfx906 -nogpulib \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=V3 %s
 
-// V3-WARN: warning: argument '-mcode-object-v3' is deprecated, use '-mcode-object-version=3' instead [-Wdeprecated]
 // V3: "-mcode-object-version=3"
 // V3: "-mllvm" "--amdhsa-code-object-version=3"
 // V3: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"
@@ -95,14 +83,3 @@
 // RUN:   %s 2>&1 | FileCheck -check-prefix=CC1NEG %s
 
 // CC1NEG-NOT: "-cc1as" {{.*}}"-mcode-object-version=5"
-
-// Check warnings are emitted for legacy options before -mcode-object-version options.
-// Check warnings are emitted only once.
-
-// RUN: %clang -### --target=x86_64-linux-gnu \
-// RUN:   -mno-code-object-v3 -mcode-object-v3 -mcode-object-version=4 \
-// RUN:   --offload-arch=gfx906 -nogpulib \
-// RUN:   %s 2>&1 | FileCheck -check-prefixes=WARN %s
-// WARN: warning: argument '-mno-code-object-v3' is deprecated, use '-mcode-object-version=2' instead [-Wdeprecated]
-// WARN: warning: argument '-mcode-object-v3' is deprecated, use '-mcode-object-version=3' instead [-Wdeprecated]
-// WARN-NOT: warning: argument {{.*}} is deprecated
Index: clang/test/Driver/amdgpu-features.c
===
--- clang/test/Driver/amdgpu-features.c
+++ clang/test/Driver/amdgpu-features.c
@@ -1,15 +1,3 @@
-// RUN: %clang -### -target amdgcn-amd-amdhsa -mcpu=gfx700 -mcode-object-v3 %s 2>&1 | FileCheck --check-prefix=CODE-OBJECT-V3 %s
-// CODE-OBJECT-V3: warning: argument '-mcode-object-v3' is deprecated, use '-mcode-object-version=3' instead [-Wdeprecated]
-// CODE-OBJECT-V3: "-mllvm" "--amdhsa-code-object-version=3"
-
-// RUN: %clang -### -target amdgcn-amd-amdhsa amdgcn -mcpu=gfx700 -mno-code-object-v3 %s 2>&1 | FileCheck --check-prefix=NO-CODE-OBJECT-V3 %s
-// NO-CODE-OBJECT-V3: warning: argument '-mno-code-object-v3' is deprecated, use '-mcode-object-version=2' instead [-Wdeprecated]
-// NO-CODE-OBJECT-V3: "-mllvm" "--amdhsa-code-object-version=2"
-
-// RUN: %clang -### -target amdgcn-amd-amdhsa -mcpu=gfx700 -mcode-object-v3 -mno-code-object-v3 -mcode-object-v3 %s 2>&1 | FileCheck --check-prefix=MUL-CODE-OBJECT-V3 %s
-// MUL-CODE-OBJECT-V3: warning: argument '-mcode-object-v3' is deprecated, use '-mcode-object-version=3' instead [-Wdeprecated]
-// MUL-CODE-OBJECT-V3: "-mllvm" "--amdhsa-code-object-version=3"
-
 // RUN: %clang -### -target amdgcn-amdhsa -mcpu=gfx900:xnack+ %s 2>&1 | FileCheck --check-prefix=XNACK %s
 // XNACK: "-target-feature" "+xnack"
 
Index: clang/test/Driver/amdgpu-features-as.s
===
--- clang/test/Driver/amdgpu-features-as.s
+++ /dev/null
@@ -1,11 +0,0 @@
-// RUN: %clang -### -target amdgcn-amd-amdhsa -mcpu=gfx900 -mcode-object-v3 %s 2>&1 | FileCheck --check-prefix=CODE-OBJECT-V3 %s
-// CODE-OBJECT-V3: warning: argument '-mcode-object-v3' is deprecated, use '-mcode-object-version=3' instead [-Wdeprecated]
-// CODE-OBJECT-V3: "-mllvm" "--amdhsa-code-object-version=3"
-
-// RUN: %clang -### -target amdgcn-amd-amdhsa amdgcn -mcpu=gfx900 -mno-code-object-v3 %s 2>&1 | FileCheck --check-prefix=NO-CODE

[PATCH] D146023: [AMDGPU] Remove Code Object V2

2023-03-15 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: llvm/test/MC/AMDGPU/hsa-gfx10.s:3
-// RUN: llvm-mc -filetype=obj -triple amdgcn--amdhsa -mcpu=gfx1010 
--amdhsa-code-object-version=2 -mattr=-wavefrontsize32,+wavefrontsize64 
-show-encoding %s | llvm-readobj -S --sd --syms - | FileCheck %s 
--check-prefix=ELF
-
-// ELF: Section {

arsenm wrote:
> I thought we were still going to be able to read old objects 
I think llvm-readobj uses all of the MC/Target infrastructure so if we remove 
emission, we also remove reading, no?

I'm actually not sure if we plan to let readobj/readelf read COV2 object files, 
it's an interesting question



Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:5437-5440
+// FIXME: Metadata Verifier doesn't work with AMDPAL MD.
+//  This is a ugly workaround to avoid the verifier.
+if (MsgPackString.find("amdpal.") == StringRef::npos) {
+  AMDGPU::HSAMD::V3::MetadataVerifier Verifier(true);

arsenm wrote:
> This looks like a separate change?
Moved to D146119


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146023/new/

https://reviews.llvm.org/D146023

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D145671: [clang] Remove legacy -m(no)-code-object-v3 options

2023-03-13 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

I'll wait a couple of more days and then land, if no one has any objections.
As said before this has been deprecated for a long time, and clang was already 
warning on use of those options. Removal has been pending for a while :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145671/new/

https://reviews.llvm.org/D145671

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D145671: [clang] Remove legacy -m(no)-code-object-v3 options

2023-03-09 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh created this revision.
Pierre-vh added reviewers: arsenm, foad, kzhuravl, rampitec.
Herald added subscribers: kosarev, StephenFan, kerbowa, jvesely.
Herald added a project: All.
Pierre-vh requested review of this revision.
Herald added subscribers: cfe-commits, MaskRay, wdng.
Herald added a project: clang.

Code object V2 and V3 have been deprecated for a long time.
They're now scheduled to be removed completely from LLVM in the coming 
weeks/months.

There is no reason to support those legacy options anymore as they've
also been deprecated for a long time.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D145671

Files:
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/test/Driver/amdgpu-features-as.s
  clang/test/Driver/amdgpu-features.c
  clang/test/Driver/hip-code-object-version.hip

Index: clang/test/Driver/hip-code-object-version.hip
===
--- clang/test/Driver/hip-code-object-version.hip
+++ clang/test/Driver/hip-code-object-version.hip
@@ -2,27 +2,16 @@
 
 // Check bundle ID for code object v2.
 
-// RUN: %clang -### --target=x86_64-linux-gnu \
-// RUN:   -mno-code-object-v3 \
-// RUN:   --offload-arch=gfx906 -nogpulib \
-// RUN:   %s 2>&1 | FileCheck -check-prefixes=V2,V2-WARN %s
-
 // RUN: %clang -### --target=x86_64-linux-gnu \
 // RUN:   -mcode-object-version=2 \
 // RUN:   --offload-arch=gfx906 -nogpulib \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=V2 %s
 
-// V2-WARN: warning: argument '-mno-code-object-v3' is deprecated, use '-mcode-object-version=2' instead [-Wdeprecated]
 // V2: "-mllvm" "--amdhsa-code-object-version=2"
 // V2: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"
 
 // Check bundle ID for code object v3.
 
-// RUN: %clang -### --target=x86_64-linux-gnu \
-// RUN:   -mcode-object-v3 \
-// RUN:   --offload-arch=gfx906 -nogpulib \
-// RUN:   %s 2>&1 | FileCheck -check-prefixes=V3,V3-WARN %s
-
 // RUN: %clang -### --target=x86_64-linux-gnu \
 // RUN:   -mcode-object-version=3 \
 // RUN:   --offload-arch=gfx906 -nogpulib \
@@ -33,7 +22,6 @@
 // RUN:   --offload-arch=gfx906 -nogpulib \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=V3 %s
 
-// V3-WARN: warning: argument '-mcode-object-v3' is deprecated, use '-mcode-object-version=3' instead [-Wdeprecated]
 // V3: "-mcode-object-version=3"
 // V3: "-mllvm" "--amdhsa-code-object-version=3"
 // V3: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"
@@ -95,14 +83,3 @@
 // RUN:   %s 2>&1 | FileCheck -check-prefix=CC1NEG %s
 
 // CC1NEG-NOT: "-cc1as" {{.*}}"-mcode-object-version=5"
-
-// Check warnings are emitted for legacy options before -mcode-object-version options.
-// Check warnings are emitted only once.
-
-// RUN: %clang -### --target=x86_64-linux-gnu \
-// RUN:   -mno-code-object-v3 -mcode-object-v3 -mcode-object-version=4 \
-// RUN:   --offload-arch=gfx906 -nogpulib \
-// RUN:   %s 2>&1 | FileCheck -check-prefixes=WARN %s
-// WARN: warning: argument '-mno-code-object-v3' is deprecated, use '-mcode-object-version=2' instead [-Wdeprecated]
-// WARN: warning: argument '-mcode-object-v3' is deprecated, use '-mcode-object-version=3' instead [-Wdeprecated]
-// WARN-NOT: warning: argument {{.*}} is deprecated
Index: clang/test/Driver/amdgpu-features.c
===
--- clang/test/Driver/amdgpu-features.c
+++ clang/test/Driver/amdgpu-features.c
@@ -1,15 +1,3 @@
-// RUN: %clang -### -target amdgcn-amd-amdhsa -mcpu=gfx700 -mcode-object-v3 %s 2>&1 | FileCheck --check-prefix=CODE-OBJECT-V3 %s
-// CODE-OBJECT-V3: warning: argument '-mcode-object-v3' is deprecated, use '-mcode-object-version=3' instead [-Wdeprecated]
-// CODE-OBJECT-V3: "-mllvm" "--amdhsa-code-object-version=3"
-
-// RUN: %clang -### -target amdgcn-amd-amdhsa amdgcn -mcpu=gfx700 -mno-code-object-v3 %s 2>&1 | FileCheck --check-prefix=NO-CODE-OBJECT-V3 %s
-// NO-CODE-OBJECT-V3: warning: argument '-mno-code-object-v3' is deprecated, use '-mcode-object-version=2' instead [-Wdeprecated]
-// NO-CODE-OBJECT-V3: "-mllvm" "--amdhsa-code-object-version=2"
-
-// RUN: %clang -### -target amdgcn-amd-amdhsa -mcpu=gfx700 -mcode-object-v3 -mno-code-object-v3 -mcode-object-v3 %s 2>&1 | FileCheck --check-prefix=MUL-CODE-OBJECT-V3 %s
-// MUL-CODE-OBJECT-V3: warning: argument '-mcode-object-v3' is deprecated, use '-mcode-object-version=3' instead [-Wdeprecated]
-// MUL-CODE-OBJECT-V3: "-mllvm" "--amdhsa-code-object-version=3"
-
 // RUN: %clang -### -target amdgcn-amdhsa -mcpu=gfx900:xnack+ %s 2>&1 | FileCheck --check-prefix=XNACK %s
 // XNACK: "-target-feature" "+xnack"
 
Index: clang/test/Driver/amdgpu-features-as.s
===
--- clang/test/Driver/amdgpu-features-as.s
+++ /dev/null
@@ -1,11 +0,0 @@
-// RUN: %clang -### -target amdgcn-amd-amdhsa -mcpu=gfx900 -mcode-object-v3 %s 2>&1 | FileCheck --check-prefix=CODE-OBJECT-V

[PATCH] D139608: [Clang][NFC] Add default `getBFloat16Mangling` impl

2023-01-10 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh abandoned this revision.
Pierre-vh added a comment.

Makes sense to abandon this given that D136919 
 exists


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139608/new/

https://reviews.llvm.org/D139608

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139608: [Clang][NFC] Add default `getBFloat16Mangling` impl

2023-01-09 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

Ping? it's a small NFC and if it's not desired I don't mind abandoning it; I'd 
just like to remove this diff from the queue.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139608/new/

https://reviews.llvm.org/D139608

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139713: [Sema] Fix crash when evaluating nested call with value-dependent arg

2023-01-05 Thread Pierre van Houtryve via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGd6acd0196b33: [Sema] Fix crash when evaluating nested call 
with value-dependent arg (authored by Pierre-vh).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139713/new/

https://reviews.llvm.org/D139713

Files:
  clang/lib/AST/ExprConstant.cpp
  clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp


Index: clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -fsyntax-only %s -std=c++14
+
+// Checks that Clang doesn't crash/assert on the nested call to "kaboom"
+// in "bar()".
+//
+// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame`
+// because it triggers the following chain of events:
+// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`.
+//  1. The outer call to "kaboom" gets evaluated.
+//   2. The expr for "a" gets evaluated, it has a version X;
+//  a temporary with the key (a, X) is created.
+// 3. The inner call to "kaboom" gets evaluated.
+//   4. The expr for "a" gets evaluated, it has a version Y;
+//  a temporary with the key (a, Y) is created.
+//   5. The expr for "b" gets evaluated, it has a version Y;
+//  a temporary with the key (b, Y) is created.
+//   6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it
+//  because it's value-dependent (due to the call to "f.foo()").
+//
+// When `EvaluateWithSubstitution` bails out while evaluating the outer
+// call, it attempts to fetch "b"'s param slot to clean it up.
+//
+// This used to cause an assertion failure in `getTemporary` because
+// a temporary with the key "(b, Y)" (created at step 4) existed but
+// not one for "(b, X)", which is what it was trying to fetch.
+
+template
+__attribute__((enable_if(true, "")))
+T kaboom(T a, T b) {
+  return b;
+}
+
+struct A {
+  double foo();
+};
+
+template 
+struct B {
+  A &f;
+
+  void bar() {
+kaboom(kaboom(0.0, 1.0), f.foo());
+  }
+};
Index: clang/lib/AST/ExprConstant.cpp
===
--- clang/lib/AST/ExprConstant.cpp
+++ clang/lib/AST/ExprConstant.cpp
@@ -594,11 +594,6 @@
   auto LB = Temporaries.lower_bound(KV);
   if (LB != Temporaries.end() && LB->first == KV)
 return &LB->second;
-  // Pair (Key,Version) wasn't found in the map. Check that no elements
-  // in the map have 'Key' as their key.
-  assert((LB == Temporaries.end() || LB->first.first != Key) &&
- (LB == Temporaries.begin() || std::prev(LB)->first.first != Key) 
&&
- "Element with key 'Key' found in map");
   return nullptr;
 }
 


Index: clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -fsyntax-only %s -std=c++14
+
+// Checks that Clang doesn't crash/assert on the nested call to "kaboom"
+// in "bar()".
+//
+// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame`
+// because it triggers the following chain of events:
+// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`.
+//  1. The outer call to "kaboom" gets evaluated.
+//   2. The expr for "a" gets evaluated, it has a version X;
+//  a temporary with the key (a, X) is created.
+// 3. The inner call to "kaboom" gets evaluated.
+//   4. The expr for "a" gets evaluated, it has a version Y;
+//  a temporary with the key (a, Y) is created.
+//   5. The expr for "b" gets evaluated, it has a version Y;
+//  a temporary with the key (b, Y) is created.
+//   6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it
+//  because it's value-dependent (due to the call to "f.foo()").
+//
+// When `EvaluateWithSubstitution` bails out while evaluating the outer
+// call, it attempts to fetch "b"'s param slot to clean it up.
+//
+// This used to cause an assertion failure in `getTemporary` because
+// a temporary with the key "(b, Y)" (created at step 4) existed but
+// not one for "(b, X)", which is what it was trying to fetch.
+
+template
+__attribute__((enable_if(true, "")))
+T kaboom(T a, T b) {
+  return b;
+}
+
+struct A {
+  double foo();
+};
+
+template 
+struct B {
+  A &f;
+
+  void bar() {
+kaboom(kaboom(0.0, 1.0), f.foo());
+  }
+};
Index: clang/lib/AST/ExprConstant.cpp
===
--- clang/lib/AST/ExprConstant.cpp
+++ clang/lib/AST/ExprConstant.cpp
@@ -594,11 +594,6 @@
   auto LB = Temporaries.lower_bound(KV);
   if (LB !

[PATCH] D139713: [Sema] Fix crash when evaluating nested call with value-dependent arg

2023-01-04 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 486183.
Pierre-vh added a comment.

Remove assert
Please review @ahatanak


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139713/new/

https://reviews.llvm.org/D139713

Files:
  clang/lib/AST/ExprConstant.cpp
  clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp


Index: clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -fsyntax-only %s -std=c++14
+
+// Checks that Clang doesn't crash/assert on the nested call to "kaboom"
+// in "bar()".
+//
+// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame`
+// because it triggers the following chain of events:
+// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`.
+//  1. The outer call to "kaboom" gets evaluated.
+//   2. The expr for "a" gets evaluated, it has a version X;
+//  a temporary with the key (a, X) is created.
+// 3. The inner call to "kaboom" gets evaluated.
+//   4. The expr for "a" gets evaluated, it has a version Y;
+//  a temporary with the key (a, Y) is created.
+//   5. The expr for "b" gets evaluated, it has a version Y;
+//  a temporary with the key (b, Y) is created.
+//   6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it
+//  because it's value-dependent (due to the call to "f.foo()").
+//
+// When `EvaluateWithSubstitution` bails out while evaluating the outer
+// call, it attempts to fetch "b"'s param slot to clean it up.
+//
+// This used to cause an assertion failure in `getTemporary` because
+// a temporary with the key "(b, Y)" (created at step 4) existed but
+// not one for "(b, X)", which is what it was trying to fetch.
+
+template
+__attribute__((enable_if(true, "")))
+T kaboom(T a, T b) {
+  return b;
+}
+
+struct A {
+  double foo();
+};
+
+template 
+struct B {
+  A &f;
+
+  void bar() {
+kaboom(kaboom(0.0, 1.0), f.foo());
+  }
+};
Index: clang/lib/AST/ExprConstant.cpp
===
--- clang/lib/AST/ExprConstant.cpp
+++ clang/lib/AST/ExprConstant.cpp
@@ -594,11 +594,6 @@
   auto LB = Temporaries.lower_bound(KV);
   if (LB != Temporaries.end() && LB->first == KV)
 return &LB->second;
-  // Pair (Key,Version) wasn't found in the map. Check that no elements
-  // in the map have 'Key' as their key.
-  assert((LB == Temporaries.end() || LB->first.first != Key) &&
- (LB == Temporaries.begin() || std::prev(LB)->first.first != Key) 
&&
- "Element with key 'Key' found in map");
   return nullptr;
 }
 


Index: clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -fsyntax-only %s -std=c++14
+
+// Checks that Clang doesn't crash/assert on the nested call to "kaboom"
+// in "bar()".
+//
+// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame`
+// because it triggers the following chain of events:
+// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`.
+//  1. The outer call to "kaboom" gets evaluated.
+//   2. The expr for "a" gets evaluated, it has a version X;
+//  a temporary with the key (a, X) is created.
+// 3. The inner call to "kaboom" gets evaluated.
+//   4. The expr for "a" gets evaluated, it has a version Y;
+//  a temporary with the key (a, Y) is created.
+//   5. The expr for "b" gets evaluated, it has a version Y;
+//  a temporary with the key (b, Y) is created.
+//   6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it
+//  because it's value-dependent (due to the call to "f.foo()").
+//
+// When `EvaluateWithSubstitution` bails out while evaluating the outer
+// call, it attempts to fetch "b"'s param slot to clean it up.
+//
+// This used to cause an assertion failure in `getTemporary` because
+// a temporary with the key "(b, Y)" (created at step 4) existed but
+// not one for "(b, X)", which is what it was trying to fetch.
+
+template
+__attribute__((enable_if(true, "")))
+T kaboom(T a, T b) {
+  return b;
+}
+
+struct A {
+  double foo();
+};
+
+template 
+struct B {
+  A &f;
+
+  void bar() {
+kaboom(kaboom(0.0, 1.0), f.foo());
+  }
+};
Index: clang/lib/AST/ExprConstant.cpp
===
--- clang/lib/AST/ExprConstant.cpp
+++ clang/lib/AST/ExprConstant.cpp
@@ -594,11 +594,6 @@
   auto LB = Temporaries.lower_bound(KV);
   if (LB != Temporaries.end() && LB->first == KV)
 return &LB->second;
-  // Pair (Key,Version) wasn't found in the map. Check that no element

[PATCH] D139713: [Sema] Fix crash when evaluating nested call with value-dependent arg

2022-12-13 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 482384.
Pierre-vh added a comment.

Put the assert back in, use alternative fix


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139713/new/

https://reviews.llvm.org/D139713

Files:
  clang/lib/AST/ExprConstant.cpp
  clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp


Index: clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -fsyntax-only %s -std=c++14
+
+// Checks that Clang doesn't crash/assert on the nested call to "kaboom"
+// in "bar()".
+//
+// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame`
+// because it triggers the following chain of events:
+// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`.
+//  1. The outer call to "kaboom" gets evaluated.
+//   2. The expr for "a" gets evaluated, it has a version X;
+//  a temporary with the key (a, X) is created.
+// 3. The inner call to "kaboom" gets evaluated.
+//   4. The expr for "a" gets evaluated, it has a version Y;
+//  a temporary with the key (a, Y) is created.
+//   5. The expr for "b" gets evaluated, it has a version Y;
+//  a temporary with the key (b, Y) is created.
+//   6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it
+//  because it's value-dependent (due to the call to "f.foo()").
+//
+// When `EvaluateWithSubstitution` bails out while evaluating the outer
+// call, it attempts to fetch "b"'s param slot to clean it up.
+//
+// This used to cause an assertion failure in `getTemporary` because
+// a temporary with the key "(b, Y)" (created at step 4) existed but
+// not one for "(b, X)", which is what it was trying to fetch.
+
+template
+__attribute__((enable_if(true, "")))
+T kaboom(T a, T b) {
+  return b;
+}
+
+struct A {
+  double foo();
+};
+
+template 
+struct B {
+  A &f;
+
+  void bar() {
+kaboom(kaboom(0.0, 1.0), f.foo());
+  }
+};
Index: clang/lib/AST/ExprConstant.cpp
===
--- clang/lib/AST/ExprConstant.cpp
+++ clang/lib/AST/ExprConstant.cpp
@@ -16059,9 +16059,13 @@
 if ((*I)->isValueDependent() ||
 !EvaluateCallArg(PVD, *I, Call, Info) ||
 Info.EvalStatus.HasSideEffects) {
-  // If evaluation fails, throw away the argument entirely.
-  if (APValue *Slot = Info.getParamSlot(Call, PVD))
-*Slot = APValue();
+  // If evaluation fails, throw away the argument entirely unless I is
+  // value-dependent. In those cases, the condition above will 
short-circuit
+  // before calling `EvaluateCallArg` and no param slot is created.
+  if (!(*I)->isValueDependent()) {
+if (APValue *Slot = Info.getParamSlot(Call, PVD))
+  *Slot = APValue();
+  }
 }
 
 // Ignore any side-effects from a failed evaluation. This is safe because


Index: clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -fsyntax-only %s -std=c++14
+
+// Checks that Clang doesn't crash/assert on the nested call to "kaboom"
+// in "bar()".
+//
+// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame`
+// because it triggers the following chain of events:
+// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`.
+//  1. The outer call to "kaboom" gets evaluated.
+//   2. The expr for "a" gets evaluated, it has a version X;
+//  a temporary with the key (a, X) is created.
+// 3. The inner call to "kaboom" gets evaluated.
+//   4. The expr for "a" gets evaluated, it has a version Y;
+//  a temporary with the key (a, Y) is created.
+//   5. The expr for "b" gets evaluated, it has a version Y;
+//  a temporary with the key (b, Y) is created.
+//   6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it
+//  because it's value-dependent (due to the call to "f.foo()").
+//
+// When `EvaluateWithSubstitution` bails out while evaluating the outer
+// call, it attempts to fetch "b"'s param slot to clean it up.
+//
+// This used to cause an assertion failure in `getTemporary` because
+// a temporary with the key "(b, Y)" (created at step 4) existed but
+// not one for "(b, X)", which is what it was trying to fetch.
+
+template
+__attribute__((enable_if(true, "")))
+T kaboom(T a, T b) {
+  return b;
+}
+
+struct A {
+  double foo();
+};
+
+template 
+struct B {
+  A &f;
+
+  void bar() {
+kaboom(kaboom(0.0, 1.0), f.foo());
+  }
+};
Index: clang/lib/AST/ExprConstant.cpp
===
--- clang/lib/AST/ExprConstant.cpp
+++ 

[PATCH] D139398: [AMDGPU] Add bf16 storage support

2022-12-13 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+// When we don't have 16 bit instructions, bf16 is illegal and gets
+// softened to i16 for storage, with float being used for arithmetic.
+//
+// After softening, some i16 -> fp32 bf16_to_fp operations can be left 
over.
+// Lower those to (f32 (fp_extend (f16 (bitconvert x
+if (!Op->getValueType(0).isFloatingPoint() ||
+Op->getOperand(0).getValueType() != MVT::i16)

arsenm wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > Pierre-vh wrote:
> > > > arsenm wrote:
> > > > > Pierre-vh wrote:
> > > > > > arsenm wrote:
> > > > > > > Pierre-vh wrote:
> > > > > > > > arsenm wrote:
> > > > > > > > > Pierre-vh wrote:
> > > > > > > > > > arsenm wrote:
> > > > > > > > > > > The generic legalizer should have handled this?
> > > > > > > > > > It looks like those operations are not implemented in the 
> > > > > > > > > > generic legalizer, e.g. I get 
> > > > > > > > > > ``` 
> > > > > > > > > > Do not know how to promote this operator's operand!
> > > > > > > > > > ```
> > > > > > > > > Right, this is the code that would go there
> > > > > > > > Do I just copy/paste this code in that PromoteInt function, and 
> > > > > > > > keep a copy here too in LowerOperation? (not really a fan of 
> > > > > > > > copy-pasting code in different files, I'd rather keep it all 
> > > > > > > > here)
> > > > > > > > We need to have the lowering too AFAIK, it didn't go well when 
> > > > > > > > I tried to remove it
> > > > > > > I'm not following why you need to handle it here
> > > > > > IIRC:
> > > > > >  - I need to handle FP_TO_BF16 in ReplaceNodeResult because that's 
> > > > > > what the Integer Legalizer calls (through CustomLowerNode)
> > > > > >  - I need to handle both opcodes in LowerOperation because 
> > > > > > otherwise they'll fail selection. They can be left over from 
> > > > > > expanding/legalizing other operations.
> > > > > But why are they custom? We don't have to handle FP16_TO_FP or 
> > > > > FP_TO_FP16 there, and they aren't custom lowered. They have the same 
> > > > > basic properties. We have this:
> > > > > 
> > > > > 
> > > > > ```
> > > > > setOperationAction(ISD::FP16_TO_FP, MVT::i16, Promote);
> > > > > AddPromotedToType(ISD::FP16_TO_FP, MVT::i16, MVT::i32);
> > > > > setOperationAction(ISD::FP_TO_FP16, MVT::i16, Promote);
> > > > > AddPromotedToType(ISD::FP_TO_FP16, MVT::i16, MVT::i32);
> > > > > ```
> > > > > 
> > > > > I'd expect the same basic pattern
> > > > PromoteIntegerOperand, PromoteFloatOperand and PromoteIntegerResult 
> > > > don't handle FP_TO_BF16 and BF16_TO_FP, and unless we put a Custom 
> > > > lowering mode it'll assert/unreachable.
> > > > I tried to make it work (for a while) using the default expand but I 
> > > > can't quite get it to work. It feels like there is some legalizer work 
> > > > missing for handling BF16 like we want to.
> > > > Even though it's not ideal I think the custom lowering is easiest
> > > What about Expand? that's where the implemented part is
> > Last I tried, Expand will emit a libcall in many cases that we don't handle
> Library call is supposed to be a distinct action now, the DAG only did about 
> 5% of the work to migrate to using it. This code can go to the default expand 
> action
Does it need to happen in this commit? It'll delay the review quite a bit I 
think if other people have to review it
If it needs to happen, when what do I need to do? Use the Expand action & fix 
the legalizer in places where it needs to be fixed?

I feel like it might be better suited for a follow-up patch; I can create a 
task and pick it up when I come back from vacation if you want


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139713: [Sema] Fix crash when evaluating nested call with value-dependent arg

2022-12-12 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

In D139713#3989071 , @shafik wrote:

> If I am reading the code correctly it looks like if the call to 
> `(*I)->isValueDependent()` is true then the temporary will not be created and 
> therefore we should not be attempting to access the slot.
>
> If this is the case then maybe the checking in 
> `EvaluateWithSubstitution(...)` needs to be more carefully done?
>
> I am not familiar with this code but I don't know if you analysis provides a 
> convincing case the assert should be removed.

Indeed, this only happens when isValueDependent returns true.

I am also not familiar with the code, so I just decided to propose a quick fix 
to get the discussion started; I certainly don't mind changing the nature of 
the fix if we agree it should be fixed differently.
For instance, we could also make the "getParam" call faillible by adding some 
"tryGetParam" variant that doesn't have the assert, or by passing some optional 
boolean to indicate it's acceptable to have the key present in the map with a 
different version.

My initial reasoning was that if the assert can be broken by legitimate code, 
then it shouldn't be there


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139713/new/

https://reviews.llvm.org/D139713

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139608: [Clang][NFC] Add default `getBFloat16Mangling` impl

2022-12-12 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

In D139608#3982494 , @jcranmer-intel 
wrote:

> I don't normally handle name mangling, so I can't comment too much here, but 
> I will note that Itanium ABI is planning on using DF16b for 
> `std::bfloat16_t`: https://github.com/itanium-cxx-abi/cxx-abi/pull/147

This is just a NFC so if we're sure targets will soon start to use different 
mangled names then I can just abandon the commit
I just proposed this change because currently there's no reason not to have a 
default value for this function


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139608/new/

https://reviews.llvm.org/D139608

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139398: [AMDGPU] Add bf16 storage support

2022-12-12 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5573-5576
+  SDLoc SL(Op);
+  return DAG.getNode(
+  ISD::FP_EXTEND, SL, MVT::f32,
+  DAG.getNode(ISD::BITCAST, SL, MVT::f16, Op->getOperand(0)));

arsenm wrote:
> arsenm wrote:
> > Pierre-vh wrote:
> > > arsenm wrote:
> > > > ExpandNode covers lowering BF16_TO_FP. It also has a shift by 16-bits 
> > > > into the high bits. Is this correct?
> > > Ah I didn't know that, though as long as we use custom lowering, and our 
> > > FP_TO_BF16/BF16_TO_FP methods are consistent, it should be fine, no?
> > bfloat16 has the same number of exponent bits in the same high bits as f32; 
> > I kind of think the idea is you can just do a bitshift and then operate on 
> > f32?  I think the fp_extend here is wrong
> The default legalization also looks wrong to me. I don't understand why it 
> isn't shifting down the mantissa bit
Indeed it was terribly wrong. I rewrote both legalizations following what I 
found online: https://en.wikipedia.org/wiki/Bfloat16_floating-point_format

bf16 is designed to be very easily convertible from/to f32, save for some edge 
cases with denormalized numbers I think, thus:
- bf16 -> f32 is just left-shift by 16, filling the least-significant bits with 
zeroes.
- f32 -> bf16 is just cutting off the 16 least-significant bits.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+// When we don't have 16 bit instructions, bf16 is illegal and gets
+// softened to i16 for storage, with float being used for arithmetic.
+//
+// After softening, some i16 -> fp32 bf16_to_fp operations can be left 
over.
+// Lower those to (f32 (fp_extend (f16 (bitconvert x
+if (!Op->getValueType(0).isFloatingPoint() ||
+Op->getOperand(0).getValueType() != MVT::i16)

arsenm wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > Pierre-vh wrote:
> > > > arsenm wrote:
> > > > > Pierre-vh wrote:
> > > > > > arsenm wrote:
> > > > > > > Pierre-vh wrote:
> > > > > > > > arsenm wrote:
> > > > > > > > > The generic legalizer should have handled this?
> > > > > > > > It looks like those operations are not implemented in the 
> > > > > > > > generic legalizer, e.g. I get 
> > > > > > > > ``` 
> > > > > > > > Do not know how to promote this operator's operand!
> > > > > > > > ```
> > > > > > > Right, this is the code that would go there
> > > > > > Do I just copy/paste this code in that PromoteInt function, and 
> > > > > > keep a copy here too in LowerOperation? (not really a fan of 
> > > > > > copy-pasting code in different files, I'd rather keep it all here)
> > > > > > We need to have the lowering too AFAIK, it didn't go well when I 
> > > > > > tried to remove it
> > > > > I'm not following why you need to handle it here
> > > > IIRC:
> > > >  - I need to handle FP_TO_BF16 in ReplaceNodeResult because that's what 
> > > > the Integer Legalizer calls (through CustomLowerNode)
> > > >  - I need to handle both opcodes in LowerOperation because otherwise 
> > > > they'll fail selection. They can be left over from expanding/legalizing 
> > > > other operations.
> > > But why are they custom? We don't have to handle FP16_TO_FP or FP_TO_FP16 
> > > there, and they aren't custom lowered. They have the same basic 
> > > properties. We have this:
> > > 
> > > 
> > > ```
> > > setOperationAction(ISD::FP16_TO_FP, MVT::i16, Promote);
> > > AddPromotedToType(ISD::FP16_TO_FP, MVT::i16, MVT::i32);
> > > setOperationAction(ISD::FP_TO_FP16, MVT::i16, Promote);
> > > AddPromotedToType(ISD::FP_TO_FP16, MVT::i16, MVT::i32);
> > > ```
> > > 
> > > I'd expect the same basic pattern
> > PromoteIntegerOperand, PromoteFloatOperand and PromoteIntegerResult don't 
> > handle FP_TO_BF16 and BF16_TO_FP, and unless we put a Custom lowering mode 
> > it'll assert/unreachable.
> > I tried to make it work (for a while) using the default expand but I can't 
> > quite get it to work. It feels like there is some legalizer work missing 
> > for handling BF16 like we want to.
> > Even though it's not ideal I think the custom lowering is easiest
> What about Expand? that's where the implemented part is
Last I tried, Expand will emit a libcall in many cases that we don't handle


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139713: [Sema] Fix crash when evaluating nested call with value-dependent arg

2022-12-09 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

I think the right choice is to remove the assert as it's an invariant that can 
be broken in some cases, so I don't feel like the assert is worth it anymore.
Alternatively I could add something to bypass the assert in that specific call 
to getParamSlot for Value-Dependent expressions, it's a more targeted fix in 
case you think the assertion must be left in.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139713/new/

https://reviews.llvm.org/D139713

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139713: [Sema] Fix crash when evaluating nested call with value-dependent arg

2022-12-09 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 481626.
Pierre-vh added a comment.

Missing newline at EOF


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139713/new/

https://reviews.llvm.org/D139713

Files:
  clang/lib/AST/ExprConstant.cpp
  clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp


Index: clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -fsyntax-only %s -std=c++14
+
+// Checks that Clang doesn't crash/assert on the nested call to "kaboom"
+// in "bar()".
+//
+// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame`
+// because it triggers the following chain of events:
+// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`.
+//  1. The outer call to "kaboom" gets evaluated.
+//   2. The expr for "a" gets evaluated, it has a version X;
+//  a temporary with the key (a, X) is created.
+// 3. The inner call to "kaboom" gets evaluated.
+//   4. The expr for "a" gets evaluated, it has a version Y;
+//  a temporary with the key (a, Y) is created.
+//   5. The expr for "b" gets evaluated, it has a version Y;
+//  a temporary with the key (b, Y) is created.
+//   6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it
+//  because it's value-dependent (due to the call to "f.foo()").
+//
+// When `EvaluateWithSubstitution` bails out while evaluating the outer
+// call, it attempts to fetch "b"'s param slot to clean it up.
+//
+// This used to cause an assertion failure in `getTemporary` because
+// a temporary with the key "(b, Y)" (created at step 4) existed but
+// not one for "(b, X)", which is what it was trying to fetch.
+
+template
+__attribute__((enable_if(true, "")))
+T kaboom(T a, T b) {
+  return b;
+}
+
+struct A {
+  double foo();
+};
+
+template 
+struct B {
+  A &f;
+
+  void bar() {
+kaboom(kaboom(0.0, 1.0), f.foo());
+  }
+};
Index: clang/lib/AST/ExprConstant.cpp
===
--- clang/lib/AST/ExprConstant.cpp
+++ clang/lib/AST/ExprConstant.cpp
@@ -593,11 +593,6 @@
   auto LB = Temporaries.lower_bound(KV);
   if (LB != Temporaries.end() && LB->first == KV)
 return &LB->second;
-  // Pair (Key,Version) wasn't found in the map. Check that no elements
-  // in the map have 'Key' as their key.
-  assert((LB == Temporaries.end() || LB->first.first != Key) &&
- (LB == Temporaries.begin() || std::prev(LB)->first.first != Key) 
&&
- "Element with key 'Key' found in map");
   return nullptr;
 }
 


Index: clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -fsyntax-only %s -std=c++14
+
+// Checks that Clang doesn't crash/assert on the nested call to "kaboom"
+// in "bar()".
+//
+// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame`
+// because it triggers the following chain of events:
+// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`.
+//  1. The outer call to "kaboom" gets evaluated.
+//   2. The expr for "a" gets evaluated, it has a version X;
+//  a temporary with the key (a, X) is created.
+// 3. The inner call to "kaboom" gets evaluated.
+//   4. The expr for "a" gets evaluated, it has a version Y;
+//  a temporary with the key (a, Y) is created.
+//   5. The expr for "b" gets evaluated, it has a version Y;
+//  a temporary with the key (b, Y) is created.
+//   6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it
+//  because it's value-dependent (due to the call to "f.foo()").
+//
+// When `EvaluateWithSubstitution` bails out while evaluating the outer
+// call, it attempts to fetch "b"'s param slot to clean it up.
+//
+// This used to cause an assertion failure in `getTemporary` because
+// a temporary with the key "(b, Y)" (created at step 4) existed but
+// not one for "(b, X)", which is what it was trying to fetch.
+
+template
+__attribute__((enable_if(true, "")))
+T kaboom(T a, T b) {
+  return b;
+}
+
+struct A {
+  double foo();
+};
+
+template 
+struct B {
+  A &f;
+
+  void bar() {
+kaboom(kaboom(0.0, 1.0), f.foo());
+  }
+};
Index: clang/lib/AST/ExprConstant.cpp
===
--- clang/lib/AST/ExprConstant.cpp
+++ clang/lib/AST/ExprConstant.cpp
@@ -593,11 +593,6 @@
   auto LB = Temporaries.lower_bound(KV);
   if (LB != Temporaries.end() && LB->first == KV)
 return &LB->second;
-  // Pair (Key,Version) wasn't found in the map. Check that no elements
-  // in 

[PATCH] D139713: [Sema] Fix crash when evaluating nested call with value-dependent arg

2022-12-09 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh created this revision.
Pierre-vh added reviewers: ahatanak, rsmith.
Herald added a project: All.
Pierre-vh requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Fix an edge case `ExprConstant.cpp`'s `EvaluateWithSubstitution` when called by 
`CheckEnableIf`

The assertion in `CallStackFrame::getTemporary`
could fail during evaluation of nested calls to a function
using `enable_if` when the second argument was a
value-dependent expression.

This caused a temporary to be created for the second
argument with a given version during the
evaluation of the inner call, but we bailed out
when evaluating the second argument of the
outer call due to the expression being value-dependent.
After bailing out, we tried to clean up the argument's value slot but it
caused an assertion to trigger in `getTemporary` as
a temporary for the second argument existed, but only for the inner call and 
not the outer call.

See the test case for a more complete description of the issue.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D139713

Files:
  clang/lib/AST/ExprConstant.cpp
  clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp


Index: clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -fsyntax-only %s -std=c++14
+
+// Checks that Clang doesn't crash/assert on the nested call to "kaboom"
+// in "bar()".
+//
+// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame`
+// because it triggers the following chain of events:
+// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`.
+//  1. The outer call to "kaboom" gets evaluated.
+//   2. The expr for "a" gets evaluated, it has a version X;
+//  a temporary with the key (a, X) is created.
+// 3. The inner call to "kaboom" gets evaluated.
+//   4. The expr for "a" gets evaluated, it has a version Y;
+//  a temporary with the key (a, Y) is created.
+//   5. The expr for "b" gets evaluated, it has a version Y;
+//  a temporary with the key (b, Y) is created.
+//   6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it
+//  because it's value-dependent (due to the call to "f.foo()").
+//
+// When `EvaluateWithSubstitution` bails out while evaluating the outer
+// call, it attempts to fetch "b"'s param slot to clean it up.
+//
+// This used to cause an assertion failure in `getTemporary` because
+// a temporary with the key "(b, Y)" (created at step 4) existed but
+// not one for "(b, X)", which is what it was trying to fetch.
+
+template
+__attribute__((enable_if(true, "")))
+T kaboom(T a, T b) {
+  return b;
+}
+
+struct A {
+  double foo();
+};
+
+template 
+struct B {
+  A &f;
+
+  void bar() {
+kaboom(kaboom(0.0, 1.0), f.foo());
+  }
+};
\ No newline at end of file
Index: clang/lib/AST/ExprConstant.cpp
===
--- clang/lib/AST/ExprConstant.cpp
+++ clang/lib/AST/ExprConstant.cpp
@@ -593,11 +593,6 @@
   auto LB = Temporaries.lower_bound(KV);
   if (LB != Temporaries.end() && LB->first == KV)
 return &LB->second;
-  // Pair (Key,Version) wasn't found in the map. Check that no elements
-  // in the map have 'Key' as their key.
-  assert((LB == Temporaries.end() || LB->first.first != Key) &&
- (LB == Temporaries.begin() || std::prev(LB)->first.first != Key) 
&&
- "Element with key 'Key' found in map");
   return nullptr;
 }
 


Index: clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -fsyntax-only %s -std=c++14
+
+// Checks that Clang doesn't crash/assert on the nested call to "kaboom"
+// in "bar()".
+//
+// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame`
+// because it triggers the following chain of events:
+// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`.
+//  1. The outer call to "kaboom" gets evaluated.
+//   2. The expr for "a" gets evaluated, it has a version X;
+//  a temporary with the key (a, X) is created.
+// 3. The inner call to "kaboom" gets evaluated.
+//   4. The expr for "a" gets evaluated, it has a version Y;
+//  a temporary with the key (a, Y) is created.
+//   5. The expr for "b" gets evaluated, it has a version Y;
+//  a temporary with the key (b, Y) is created.
+//   6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it
+//  because it's value-dependent (due to the call to "f.foo()").
+//
+// When `EvaluateWithSubstitution` bails out while evaluating the outer
+// call, it at

[PATCH] D139398: [AMDGPU] Add bf16 storage support

2022-12-09 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:913
+  else
+RegisterVT = (ScalarVT == MVT::bf16 ? MVT::v2bf16 : MVT::v2f16);
   IntermediateVT = RegisterVT;

arsenm wrote:
> If you wanted the promote to i32, you could have done it here instead of in 
> the tablegen cc handling
Do you mean somewhere else in that function? Changing v2bf16 to i32 here 
doesn't fix it 
I also tried changing the function above but I kept running into asserts so I 
just left the TableGen CC for now



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5563
+  return DAG.getNode(ISD::BITCAST, SL, MVT::i16,
+ DAG.getFPExtendOrRound(Op->getOperand(0), SL, MVT::f16));
+}

arsenm wrote:
> Should be specific cast, not FPExtOrRound. I don't think the FP_ROUND case 
> would be correct
But we need to do f32 -> f16, isn't FP_ROUND used for that? I thought it's what 
we needed



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5573-5576
+  SDLoc SL(Op);
+  return DAG.getNode(
+  ISD::FP_EXTEND, SL, MVT::f32,
+  DAG.getNode(ISD::BITCAST, SL, MVT::f16, Op->getOperand(0)));

arsenm wrote:
> ExpandNode covers lowering BF16_TO_FP. It also has a shift by 16-bits into 
> the high bits. Is this correct?
Ah I didn't know that, though as long as we use custom lowering, and our 
FP_TO_BF16/BF16_TO_FP methods are consistent, it should be fine, no?



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+// When we don't have 16 bit instructions, bf16 is illegal and gets
+// softened to i16 for storage, with float being used for arithmetic.
+//
+// After softening, some i16 -> fp32 bf16_to_fp operations can be left 
over.
+// Lower those to (f32 (fp_extend (f16 (bitconvert x
+if (!Op->getValueType(0).isFloatingPoint() ||
+Op->getOperand(0).getValueType() != MVT::i16)

arsenm wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > Pierre-vh wrote:
> > > > arsenm wrote:
> > > > > Pierre-vh wrote:
> > > > > > arsenm wrote:
> > > > > > > The generic legalizer should have handled this?
> > > > > > It looks like those operations are not implemented in the generic 
> > > > > > legalizer, e.g. I get 
> > > > > > ``` 
> > > > > > Do not know how to promote this operator's operand!
> > > > > > ```
> > > > > Right, this is the code that would go there
> > > > Do I just copy/paste this code in that PromoteInt function, and keep a 
> > > > copy here too in LowerOperation? (not really a fan of copy-pasting code 
> > > > in different files, I'd rather keep it all here)
> > > > We need to have the lowering too AFAIK, it didn't go well when I tried 
> > > > to remove it
> > > I'm not following why you need to handle it here
> > IIRC:
> >  - I need to handle FP_TO_BF16 in ReplaceNodeResult because that's what the 
> > Integer Legalizer calls (through CustomLowerNode)
> >  - I need to handle both opcodes in LowerOperation because otherwise 
> > they'll fail selection. They can be left over from expanding/legalizing 
> > other operations.
> But why are they custom? We don't have to handle FP16_TO_FP or FP_TO_FP16 
> there, and they aren't custom lowered. They have the same basic properties. 
> We have this:
> 
> 
> ```
> setOperationAction(ISD::FP16_TO_FP, MVT::i16, Promote);
> AddPromotedToType(ISD::FP16_TO_FP, MVT::i16, MVT::i32);
> setOperationAction(ISD::FP_TO_FP16, MVT::i16, Promote);
> AddPromotedToType(ISD::FP_TO_FP16, MVT::i16, MVT::i32);
> ```
> 
> I'd expect the same basic pattern
PromoteIntegerOperand, PromoteFloatOperand and PromoteIntegerResult don't 
handle FP_TO_BF16 and BF16_TO_FP, and unless we put a Custom lowering mode 
it'll assert/unreachable.
I tried to make it work (for a while) using the default expand but I can't 
quite get it to work. It feels like there is some legalizer work missing for 
handling BF16 like we want to.
Even though it's not ideal I think the custom lowering is easiest


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139398: [AMDGPU] Add bf16 storage support

2022-12-08 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: clang/lib/Basic/Targets/AMDGPU.h:119
+  bool hasBFloat16Type() const override { return isAMDGCN(getTriple()); }
+  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
+

arsenm wrote:
> Pierre-vh wrote:
> > Pierre-vh wrote:
> > > arsenm wrote:
> > > > Don't understand this mangling. What is u6?
> > > Not sure; for that one I just copy-pasted the implementation of other 
> > > targets. All other targets use that mangling scheme
> > Ah I remember now, it's just C++ mangling. I don't quite understand the 
> > lowercase "u" but a quick search in Clang tells me it's vendor-extended 
> > types.
> > So it's just u6 -> vendor extended type, 6 characters following + __bf16 
> > (name of the type).
> Do we really need an override for this? I'd expect a reasonable default. Plus 
> I think a virtual function for something that's only a parameterless, static 
> string is a bit ridiculous
Default impl asserts if not implemented. I think it's to make sure targets are 
all aware of what it takes to support bfloat and they don't end up partially 
implementing it?
```
  /// Return the mangled code of bfloat.
  virtual const char *getBFloat16Mangling() const {
llvm_unreachable("bfloat not implemented on this target");
  }
```

 I'd say let's stick to the current pattern in this diff; I created D139608 to 
change it



Comment at: llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td:49
+  CCIfType<[bf16], CCBitConvertToType>,
+  CCIfType<[v2bf16], CCBitConvertToType>,
   CCIfNotInReg Without being added to a register class, all the tablegen changes should not 
> do anything
bf16 ones seem to not be needed but if I don't have the v2bf16 ones I get 
"cannot allocate arguments" in "test_arg_store_v2bf16"


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139608: [Clang][NFC] Add default `getBFloat16Mangling` impl

2022-12-08 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh created this revision.
Pierre-vh added reviewers: MaskRay, stuij.
Herald added subscribers: kosarev, mattd, gchakrabarti, asavonic, StephenFan, 
kerbowa, jvesely.
Herald added a project: All.
Pierre-vh requested review of this revision.
Herald added subscribers: cfe-commits, jholewinski.
Herald added a project: clang.

All targets that currently implement `__bf16` use the exact same mangled name.
Reduce code duplication by adding that name to the default implementation, like 
it's done in e.g. `getLongDoubleMangling` and `getFloat128Mangling`

Depends on D139398 


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D139608

Files:
  clang/include/clang/Basic/TargetInfo.h
  clang/lib/Basic/Targets/AArch64.h
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/Basic/Targets/ARM.h
  clang/lib/Basic/Targets/NVPTX.h
  clang/lib/Basic/Targets/X86.h


Index: clang/lib/Basic/Targets/X86.h
===
--- clang/lib/Basic/Targets/X86.h
+++ clang/lib/Basic/Targets/X86.h
@@ -415,8 +415,6 @@
   uint64_t getPointerAlignV(LangAS AddrSpace) const override {
 return getPointerWidthV(AddrSpace);
   }
-
-  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
 };
 
 // X86-32 generic target
Index: clang/lib/Basic/Targets/NVPTX.h
===
--- clang/lib/Basic/Targets/NVPTX.h
+++ clang/lib/Basic/Targets/NVPTX.h
@@ -179,7 +179,6 @@
 
   bool hasBitIntType() const override { return true; }
   bool hasBFloat16Type() const override { return true; }
-  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
 };
 } // namespace targets
 } // namespace clang
Index: clang/lib/Basic/Targets/ARM.h
===
--- clang/lib/Basic/Targets/ARM.h
+++ clang/lib/Basic/Targets/ARM.h
@@ -197,8 +197,6 @@
   bool hasSjLjLowering() const override;
 
   bool hasBitIntType() const override { return true; }
-
-  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
 };
 
 class LLVM_LIBRARY_VISIBILITY ARMleTargetInfo : public ARMTargetInfo {
Index: clang/lib/Basic/Targets/AMDGPU.h
===
--- clang/lib/Basic/Targets/AMDGPU.h
+++ clang/lib/Basic/Targets/AMDGPU.h
@@ -116,7 +116,6 @@
   }
 
   bool hasBFloat16Type() const override { return isAMDGCN(getTriple()); }
-  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
 
   const char *getClobbers() const override { return ""; }
 
Index: clang/lib/Basic/Targets/AArch64.h
===
--- clang/lib/Basic/Targets/AArch64.h
+++ clang/lib/Basic/Targets/AArch64.h
@@ -167,7 +167,6 @@
 
   int getEHDataRegisterNumber(unsigned RegNo) const override;
 
-  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
   bool hasInt128Type() const override;
 
   bool hasBitIntType() const override { return true; }
Index: clang/include/clang/Basic/TargetInfo.h
===
--- clang/include/clang/Basic/TargetInfo.h
+++ clang/include/clang/Basic/TargetInfo.h
@@ -758,9 +758,7 @@
   }
 
   /// Return the mangled code of bfloat.
-  virtual const char *getBFloat16Mangling() const {
-llvm_unreachable("bfloat not implemented on this target");
-  }
+  virtual const char *getBFloat16Mangling() const { return "u6__bf16"; }
 
   /// Return the value for the C99 FLT_EVAL_METHOD macro.
   virtual LangOptions::FPEvalMethodKind getFPEvalMethod() const {


Index: clang/lib/Basic/Targets/X86.h
===
--- clang/lib/Basic/Targets/X86.h
+++ clang/lib/Basic/Targets/X86.h
@@ -415,8 +415,6 @@
   uint64_t getPointerAlignV(LangAS AddrSpace) const override {
 return getPointerWidthV(AddrSpace);
   }
-
-  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
 };
 
 // X86-32 generic target
Index: clang/lib/Basic/Targets/NVPTX.h
===
--- clang/lib/Basic/Targets/NVPTX.h
+++ clang/lib/Basic/Targets/NVPTX.h
@@ -179,7 +179,6 @@
 
   bool hasBitIntType() const override { return true; }
   bool hasBFloat16Type() const override { return true; }
-  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
 };
 } // namespace targets
 } // namespace clang
Index: clang/lib/Basic/Targets/ARM.h
===
--- clang/lib/Basic/Targets/ARM.h
+++ clang/lib/Basic/Targets/ARM.h
@@ -197,8 +197,6 @@
   bool hasSjLjLowering() const override;
 
   bool hasBitIntType() const override { return true; }
-
-  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
 };
 
 class LLVM_LIBRARY_VISIBILITY ARMleTargetInfo : public ARMTargetInfo {
Index: clang/lib/Basi

[PATCH] D139398: [AMDGPU] Add bf16 storage support

2022-12-07 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: clang/lib/Basic/Targets/AMDGPU.h:119
+  bool hasBFloat16Type() const override { return isAMDGCN(getTriple()); }
+  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
+

Pierre-vh wrote:
> arsenm wrote:
> > Don't understand this mangling. What is u6?
> Not sure; for that one I just copy-pasted the implementation of other 
> targets. All other targets use that mangling scheme
Ah I remember now, it's just C++ mangling. I don't quite understand the 
lowercase "u" but a quick search in Clang tells me it's vendor-extended types.
So it's just u6 -> vendor extended type, 6 characters following + __bf16 (name 
of the type).



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+// When we don't have 16 bit instructions, bf16 is illegal and gets
+// softened to i16 for storage, with float being used for arithmetic.
+//
+// After softening, some i16 -> fp32 bf16_to_fp operations can be left 
over.
+// Lower those to (f32 (fp_extend (f16 (bitconvert x
+if (!Op->getValueType(0).isFloatingPoint() ||
+Op->getOperand(0).getValueType() != MVT::i16)

arsenm wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > Pierre-vh wrote:
> > > > arsenm wrote:
> > > > > The generic legalizer should have handled this?
> > > > It looks like those operations are not implemented in the generic 
> > > > legalizer, e.g. I get 
> > > > ``` 
> > > > Do not know how to promote this operator's operand!
> > > > ```
> > > Right, this is the code that would go there
> > Do I just copy/paste this code in that PromoteInt function, and keep a copy 
> > here too in LowerOperation? (not really a fan of copy-pasting code in 
> > different files, I'd rather keep it all here)
> > We need to have the lowering too AFAIK, it didn't go well when I tried to 
> > remove it
> I'm not following why you need to handle it here
IIRC:
 - I need to handle FP_TO_BF16 in ReplaceNodeResult because that's what the 
Integer Legalizer calls (through CustomLowerNode)
 - I need to handle both opcodes in LowerOperation because otherwise they'll 
fail selection. They can be left over from expanding/legalizing other 
operations.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139398: [AMDGPU] Add bf16 storage support

2022-12-06 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+// When we don't have 16 bit instructions, bf16 is illegal and gets
+// softened to i16 for storage, with float being used for arithmetic.
+//
+// After softening, some i16 -> fp32 bf16_to_fp operations can be left 
over.
+// Lower those to (f32 (fp_extend (f16 (bitconvert x
+if (!Op->getValueType(0).isFloatingPoint() ||
+Op->getOperand(0).getValueType() != MVT::i16)

arsenm wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > The generic legalizer should have handled this?
> > It looks like those operations are not implemented in the generic 
> > legalizer, e.g. I get 
> > ``` 
> > Do not know how to promote this operator's operand!
> > ```
> Right, this is the code that would go there
Do I just copy/paste this code in that PromoteInt function, and keep a copy 
here too in LowerOperation? (not really a fan of copy-pasting code in different 
files, I'd rather keep it all here)
We need to have the lowering too AFAIK, it didn't go well when I tried to 
remove it


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139398: [AMDGPU] Add bf16 storage support

2022-12-06 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: clang/lib/Basic/Targets/AMDGPU.h:119
+  bool hasBFloat16Type() const override { return isAMDGCN(getTriple()); }
+  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
+

arsenm wrote:
> Don't understand this mangling. What is u6?
Not sure; for that one I just copy-pasted the implementation of other targets. 
All other targets use that mangling scheme



Comment at: clang/test/SemaCUDA/amdgpu-bf16.cu:43
+  *out = bf16;
+}
+

arsenm wrote:
> check casts to different int and float types? Is construction of bf16 vectors 
> allowed?
Added cast + vec sema test and vec assign codegen test too

No conversions are allowed apparently but I don't think it matters for the 
initial patch; if needed we can always add it later I think



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+// When we don't have 16 bit instructions, bf16 is illegal and gets
+// softened to i16 for storage, with float being used for arithmetic.
+//
+// After softening, some i16 -> fp32 bf16_to_fp operations can be left 
over.
+// Lower those to (f32 (fp_extend (f16 (bitconvert x
+if (!Op->getValueType(0).isFloatingPoint() ||
+Op->getOperand(0).getValueType() != MVT::i16)

arsenm wrote:
> The generic legalizer should have handled this?
It looks like those operations are not implemented in the generic legalizer, 
e.g. I get 
``` 
Do not know how to promote this operator's operand!
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139398: [AMDGPU] Add bf16 storage support

2022-12-06 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 480431.
Pierre-vh added a comment.

- Only accept bf16 on AMDGCN; r600 doesn't support it (we could but it's not 
worth the effort I think; I'll look at it if we find out it's needed)
- Remove bf16 types from a few register classes


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398

Files:
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/test/CodeGenCUDA/amdgpu-bf16.cu
  clang/test/SemaCUDA/amdgpu-bf16.cu
  llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
  llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/AMDGPU/SIInstructions.td
  llvm/lib/Target/AMDGPU/SIRegisterInfo.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/test/CodeGen/AMDGPU/bf16-ops.ll
  llvm/test/CodeGen/AMDGPU/bf16.ll

Index: llvm/test/CodeGen/AMDGPU/bf16.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AMDGPU/bf16.ll
@@ -0,0 +1,956 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -march=amdgcn -verify-machineinstrs | FileCheck %s -check-prefixes=GCN
+; RUN: llc < %s -march=amdgcn -mcpu=hawaii  -verify-machineinstrs | FileCheck %s -check-prefixes=GFX7
+; RUN: llc < %s -march=amdgcn -mcpu=tonga  -verify-machineinstrs | FileCheck %s -check-prefixes=GFX8
+; RUN: llc < %s -march=amdgcn -mcpu=gfx900 -verify-machineinstrs | FileCheck %s -check-prefixes=GFX9
+; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs | FileCheck %s -check-prefixes=GFX10
+
+; We only have storage-only BF16 support. We can load/store those values as we treat them as u16, but
+; we don't support operations on them. As such, codegen is expected to fail for any operation other
+; than simple load/stores.
+
+define void @test_load_store(bfloat addrspace(1)* %in, bfloat addrspace(1)* %out) {
+; GCN-LABEL: test_load_store:
+; GCN:   ; %bb.0:
+; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:s_mov_b32 s6, 0
+; GCN-NEXT:s_mov_b32 s7, 0xf000
+; GCN-NEXT:s_mov_b32 s4, s6
+; GCN-NEXT:s_mov_b32 s5, s6
+; GCN-NEXT:buffer_load_ushort v0, v[0:1], s[4:7], 0 addr64
+; GCN-NEXT:s_waitcnt vmcnt(0)
+; GCN-NEXT:buffer_store_short v0, v[2:3], s[4:7], 0 addr64
+; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0)
+; GCN-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX7-LABEL: test_load_store:
+; GFX7:   ; %bb.0:
+; GFX7-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:s_mov_b32 s6, 0
+; GFX7-NEXT:s_mov_b32 s7, 0xf000
+; GFX7-NEXT:s_mov_b32 s4, s6
+; GFX7-NEXT:s_mov_b32 s5, s6
+; GFX7-NEXT:buffer_load_ushort v0, v[0:1], s[4:7], 0 addr64
+; GFX7-NEXT:s_waitcnt vmcnt(0)
+; GFX7-NEXT:buffer_store_short v0, v[2:3], s[4:7], 0 addr64
+; GFX7-NEXT:s_waitcnt vmcnt(0)
+; GFX7-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: test_load_store:
+; GFX8:   ; %bb.0:
+; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:flat_load_ushort v0, v[0:1]
+; GFX8-NEXT:s_waitcnt vmcnt(0)
+; GFX8-NEXT:flat_store_short v[2:3], v0
+; GFX8-NEXT:s_waitcnt vmcnt(0)
+; GFX8-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: test_load_store:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:global_load_ushort v0, v[0:1], off
+; GFX9-NEXT:s_waitcnt vmcnt(0)
+; GFX9-NEXT:global_store_short v[2:3], v0, off
+; GFX9-NEXT:s_waitcnt vmcnt(0)
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: test_load_store:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:s_waitcnt_vscnt null, 0x0
+; GFX10-NEXT:global_load_ushort v0, v[0:1], off
+; GFX10-NEXT:s_waitcnt vmcnt(0)
+; GFX10-NEXT:global_store_short v[2:3], v0, off
+; GFX10-NEXT:s_waitcnt_vscnt null, 0x0
+; GFX10-NEXT:s_setpc_b64 s[30:31]
+  %val = load bfloat, bfloat addrspace(1)* %in
+  store bfloat %val, bfloat addrspace(1) * %out
+  ret void
+}
+
+define void @test_load_store_v2bf16(<2 x bfloat> addrspace(1)* %in, <2 x bfloat> addrspace(1)* %out) {
+; GCN-LABEL: test_load_store_v2bf16:
+; GCN:   ; %bb.0:
+; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:s_mov_b32 s6, 0
+; GCN-NEXT:s_mov_b32 s7, 0xf000
+; GCN-NEXT:s_mov_b32 s4, s6
+; GCN-NEXT:s_mov_b32 s5, s6
+; GCN-NEXT:buffer_load_dword v0, v[0:1], s[4:7], 0 addr64
+; GCN-NEXT:s_waitcnt vmcnt(0)
+; GCN-NEXT:buffer_store_dword v0, v[2:3], s[4:7], 0 addr64
+; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0)
+; GCN-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX7-LABEL: test_load_store_v2bf16:
+; GFX7:   ; %bb.0:
+; GFX7-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:s_mov_b32 s6, 0
+; GFX7-NEXT:s_mov_b32 s7, 0xf000
+; GFX7-NEXT:s_mov_b32 s4, s6
+; GFX7-NEXT:s_mov_b32 s5, s6
+; GFX7-NEXT

[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-12-06 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh abandoned this revision.
Pierre-vh added a comment.

Added bf16 storage support instead: D139398 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138651/new/

https://reviews.llvm.org/D138651

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139398: [AMDGPU] Add bf16 storage support

2022-12-06 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh created this revision.
Pierre-vh added reviewers: arsenm, foad, yaxunl.
Herald added subscribers: kosarev, kerbowa, hiraditya, tpr, dstuttard, jvesely, 
kzhuravl.
Herald added a project: All.
Pierre-vh requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, wdng.
Herald added projects: clang, LLVM.

- [Clang] Declare AMDGPU target as supporting BF16 for storage-only purposes.
  - Add Sema & CodeGen tests cases.
  - Also add cases that D138651  would have 
covered as this patch replaces it.
- [AMDGPU] Add BF16 storage-only support
  - CC: Add bf16/v2bf16 arguments support by converting them to i16/i32.
  - Add BF16 to various register classes & fix issues it causes with type 
inference.
  - DAG: Add BF16 legalization/codegen support for GCN targets.
  - GISel: Not supported as the framework doesn't support bfloat16 properly yet.
  - Added test cases for supported BF16 ops + unsupported ones.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D139398

Files:
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/test/CodeGenCUDA/amdgpu-bf16.cu
  clang/test/SemaCUDA/amdgpu-bf16.cu
  llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
  llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/AMDGPU/SIInstructions.td
  llvm/lib/Target/AMDGPU/SIRegisterInfo.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/test/CodeGen/AMDGPU/bf16-ops.ll
  llvm/test/CodeGen/AMDGPU/bf16.ll

Index: llvm/test/CodeGen/AMDGPU/bf16.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AMDGPU/bf16.ll
@@ -0,0 +1,956 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -march=amdgcn -verify-machineinstrs | FileCheck %s -check-prefixes=GCN
+; RUN: llc < %s -march=amdgcn -mcpu=hawaii  -verify-machineinstrs | FileCheck %s -check-prefixes=GFX7
+; RUN: llc < %s -march=amdgcn -mcpu=tonga  -verify-machineinstrs | FileCheck %s -check-prefixes=GFX8
+; RUN: llc < %s -march=amdgcn -mcpu=gfx900 -verify-machineinstrs | FileCheck %s -check-prefixes=GFX9
+; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs | FileCheck %s -check-prefixes=GFX10
+
+; We only have storage-only BF16 support. We can load/store those values as we treat them as u16, but
+; we don't support operations on them. As such, codegen is expected to fail for any operation other
+; than simple load/stores.
+
+define void @test_load_store(bfloat addrspace(1)* %in, bfloat addrspace(1)* %out) {
+; GCN-LABEL: test_load_store:
+; GCN:   ; %bb.0:
+; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:s_mov_b32 s6, 0
+; GCN-NEXT:s_mov_b32 s7, 0xf000
+; GCN-NEXT:s_mov_b32 s4, s6
+; GCN-NEXT:s_mov_b32 s5, s6
+; GCN-NEXT:buffer_load_ushort v0, v[0:1], s[4:7], 0 addr64
+; GCN-NEXT:s_waitcnt vmcnt(0)
+; GCN-NEXT:buffer_store_short v0, v[2:3], s[4:7], 0 addr64
+; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0)
+; GCN-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX7-LABEL: test_load_store:
+; GFX7:   ; %bb.0:
+; GFX7-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:s_mov_b32 s6, 0
+; GFX7-NEXT:s_mov_b32 s7, 0xf000
+; GFX7-NEXT:s_mov_b32 s4, s6
+; GFX7-NEXT:s_mov_b32 s5, s6
+; GFX7-NEXT:buffer_load_ushort v0, v[0:1], s[4:7], 0 addr64
+; GFX7-NEXT:s_waitcnt vmcnt(0)
+; GFX7-NEXT:buffer_store_short v0, v[2:3], s[4:7], 0 addr64
+; GFX7-NEXT:s_waitcnt vmcnt(0)
+; GFX7-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: test_load_store:
+; GFX8:   ; %bb.0:
+; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:flat_load_ushort v0, v[0:1]
+; GFX8-NEXT:s_waitcnt vmcnt(0)
+; GFX8-NEXT:flat_store_short v[2:3], v0
+; GFX8-NEXT:s_waitcnt vmcnt(0)
+; GFX8-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX9-LABEL: test_load_store:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:global_load_ushort v0, v[0:1], off
+; GFX9-NEXT:s_waitcnt vmcnt(0)
+; GFX9-NEXT:global_store_short v[2:3], v0, off
+; GFX9-NEXT:s_waitcnt vmcnt(0)
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: test_load_store:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:s_waitcnt_vscnt null, 0x0
+; GFX10-NEXT:global_load_ushort v0, v[0:1], off
+; GFX10-NEXT:s_waitcnt vmcnt(0)
+; GFX10-NEXT:global_store_short v[2:3], v0, off
+; GFX10-NEXT:s_waitcnt_vscnt null, 0x0
+; GFX10-NEXT:s_setpc_b64 s[30:31]
+  %val = load bfloat, bfloat addrspace(1)* %in
+  store bfloat %val, bfloat addrspace(1) * %out
+  ret void
+}
+
+define void @test_load_store_v2bf16(<2 x bfloat> addrspace(1)* %in, <2 x bfloat> addrspace(1)* %out) {
+; GCN-LABEL: test_load_store_v2bf16:
+; GCN:   ; %bb.0:
+; GCN-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:

[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-12-02 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh planned changes to this revision.
Pierre-vh marked an inline comment as done.
Pierre-vh added a comment.

I'll take a look at handling bf16 storage-only for AMDGPU. Looks like our 
Backend already handles it and converts it to i16 so maybe it'll be really easy.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138651/new/

https://reviews.llvm.org/D138651

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-12-01 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh marked an inline comment as done.
Pierre-vh added inline comments.



Comment at: clang/test/SemaCUDA/amdgpu-bf16.cu:9
+
+__device__ void devicefn() {
+}

tra wrote:
> We should probably also have a case verifying that actual attempt to use 
> `__bf16` in device code is still diagnosed. 
Good catch, it's currently no longer diagnosed.
What can I use in `ConvertDeclSpecToType` to make it diagnose only if the 
current function is a device function? If I understand correctly, LangOpts are 
for the whole TU so I can't use that (e.g. CUDAIsDevice), right?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138651/new/

https://reviews.llvm.org/D138651

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-11-29 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 478823.
Pierre-vh marked 5 inline comments as done.
Pierre-vh added a comment.

- Recentering the patch around HIP only.
  - I was using too much from D57369  and was 
involving OpenMP when there's no reason to. Just checking if HIP is being used 
should be enough.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138651/new/

https://reviews.llvm.org/D138651

Files:
  clang/lib/AST/ASTContext.cpp
  clang/lib/Sema/SemaType.cpp
  clang/test/SemaCUDA/amdgpu-bf16.cu


Index: clang/test/SemaCUDA/amdgpu-bf16.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/amdgpu-bf16.cu
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -x hip -fsyntax-only -triple amdgcn-amd-amdhsa -aux-triple 
x86_64-unknown-linux-gnu -verify %s
+// expected-no-diagnostics
+
+// If AMDGPU is the main target and X86 the aux target, ensure we
+// don't complain about unsupported BF16 types in x86 code.
+
+#include "Inputs/cuda.h"
+
+__device__ void devicefn() {
+}
+
+__bf16 hostfn(__bf16 a) {
+  return a;
+}
+
+typedef __bf16 foo __attribute__((__vector_size__(16), __aligned__(16)));
Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -1518,7 +1518,11 @@
 break;
   case DeclSpec::TST_half:Result = Context.HalfTy; break;
   case DeclSpec::TST_BFloat16:
-if (!S.Context.getTargetInfo().hasBFloat16Type())
+// HIP does not currently support bf16. Avoid diagnosing uses of bf16
+// if the auxiliary target supports it.
+if (!S.Context.getTargetInfo().hasBFloat16Type() &&
+!(S.getLangOpts().HIP &&
+  S.Context.getAuxTargetInfo()->hasBFloat16Type()))
   S.Diag(DS.getTypeSpecTypeLoc(), diag::err_type_unsupported)
 << "__bf16";
 Result = Context.BFloat16Ty;
Index: clang/lib/AST/ASTContext.cpp
===
--- clang/lib/AST/ASTContext.cpp
+++ clang/lib/AST/ASTContext.cpp
@@ -2171,9 +2171,14 @@
   Align = Target->getLongFractAlign();
   break;
 case BuiltinType::BFloat16:
+  // HIP does not currently support bf16, so in that case allow querying 
the
+  // auxiliary target.
   if (Target->hasBFloat16Type()) {
 Width = Target->getBFloat16Width();
 Align = Target->getBFloat16Align();
+  } else if (getLangOpts().HIP && AuxTarget->hasBFloat16Type()) {
+Width = AuxTarget->getBFloat16Width();
+Align = AuxTarget->getBFloat16Align();
   }
   break;
 case BuiltinType::Float16:


Index: clang/test/SemaCUDA/amdgpu-bf16.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/amdgpu-bf16.cu
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -x hip -fsyntax-only -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -verify %s
+// expected-no-diagnostics
+
+// If AMDGPU is the main target and X86 the aux target, ensure we
+// don't complain about unsupported BF16 types in x86 code.
+
+#include "Inputs/cuda.h"
+
+__device__ void devicefn() {
+}
+
+__bf16 hostfn(__bf16 a) {
+  return a;
+}
+
+typedef __bf16 foo __attribute__((__vector_size__(16), __aligned__(16)));
Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -1518,7 +1518,11 @@
 break;
   case DeclSpec::TST_half:Result = Context.HalfTy; break;
   case DeclSpec::TST_BFloat16:
-if (!S.Context.getTargetInfo().hasBFloat16Type())
+// HIP does not currently support bf16. Avoid diagnosing uses of bf16
+// if the auxiliary target supports it.
+if (!S.Context.getTargetInfo().hasBFloat16Type() &&
+!(S.getLangOpts().HIP &&
+  S.Context.getAuxTargetInfo()->hasBFloat16Type()))
   S.Diag(DS.getTypeSpecTypeLoc(), diag::err_type_unsupported)
 << "__bf16";
 Result = Context.BFloat16Ty;
Index: clang/lib/AST/ASTContext.cpp
===
--- clang/lib/AST/ASTContext.cpp
+++ clang/lib/AST/ASTContext.cpp
@@ -2171,9 +2171,14 @@
   Align = Target->getLongFractAlign();
   break;
 case BuiltinType::BFloat16:
+  // HIP does not currently support bf16, so in that case allow querying the
+  // auxiliary target.
   if (Target->hasBFloat16Type()) {
 Width = Target->getBFloat16Width();
 Align = Target->getBFloat16Align();
+  } else if (getLangOpts().HIP && AuxTarget->hasBFloat16Type()) {
+Width = AuxTarget->getBFloat16Width();
+Align = AuxTarget->getBFloat16Align();
   }
   break;
 case BuiltinType::Float16:
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listin

[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-11-25 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 477871.
Pierre-vh added a comment.

Fixing condition, adding new test case


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138651/new/

https://reviews.llvm.org/D138651

Files:
  clang/lib/AST/ASTContext.cpp
  clang/lib/Sema/SemaType.cpp
  clang/test/SemaCUDA/amdgpu-bf16.cu


Index: clang/test/SemaCUDA/amdgpu-bf16.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/amdgpu-bf16.cu
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -fsyntax-only -triple amdgcn-amd-amdhsa -aux-triple 
x86_64-unknown-linux-gnu -verify %s
+// expected-no-diagnostics
+
+// If AMDGPU is the main target and X86 the aux target, ensure we
+// don't complain about unsupported BF16 types in x86 code.
+
+#include "Inputs/cuda.h"
+
+__device__ void devicefn() {
+}
+
+__bf16 hostfn(__bf16 a) {
+  return a;
+}
+
+typedef __bf16 foo __attribute__((__vector_size__(16), __aligned__(16)));
\ No newline at end of file
Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -1518,7 +1518,9 @@
 break;
   case DeclSpec::TST_half:Result = Context.HalfTy; break;
   case DeclSpec::TST_BFloat16:
-if (!S.Context.getTargetInfo().hasBFloat16Type())
+// Likewise, CUDA host and device may have different __bf16 support.
+if (!S.Context.getTargetInfo().hasBFloat16Type() && !S.getLangOpts().CUDA 
&&
+!(S.getLangOpts().OpenMP && S.getLangOpts().OpenMPIsDevice))
   S.Diag(DS.getTypeSpecTypeLoc(), diag::err_type_unsupported)
 << "__bf16";
 Result = Context.BFloat16Ty;
Index: clang/lib/AST/ASTContext.cpp
===
--- clang/lib/AST/ASTContext.cpp
+++ clang/lib/AST/ASTContext.cpp
@@ -2171,9 +2171,13 @@
   Align = Target->getLongFractAlign();
   break;
 case BuiltinType::BFloat16:
-  if (Target->hasBFloat16Type()) {
+  if (Target->hasBFloat16Type() &&
+  (!getLangOpts().OpenMP || !getLangOpts().OpenMPIsDevice)) {
 Width = Target->getBFloat16Width();
 Align = Target->getBFloat16Align();
+  } else if (AuxTarget && AuxTarget->hasBFloat16Type()) {
+Width = AuxTarget->getBFloat16Width();
+Align = AuxTarget->getBFloat16Align();
   }
   break;
 case BuiltinType::Float16:


Index: clang/test/SemaCUDA/amdgpu-bf16.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/amdgpu-bf16.cu
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -fsyntax-only -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -verify %s
+// expected-no-diagnostics
+
+// If AMDGPU is the main target and X86 the aux target, ensure we
+// don't complain about unsupported BF16 types in x86 code.
+
+#include "Inputs/cuda.h"
+
+__device__ void devicefn() {
+}
+
+__bf16 hostfn(__bf16 a) {
+  return a;
+}
+
+typedef __bf16 foo __attribute__((__vector_size__(16), __aligned__(16)));
\ No newline at end of file
Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -1518,7 +1518,9 @@
 break;
   case DeclSpec::TST_half:Result = Context.HalfTy; break;
   case DeclSpec::TST_BFloat16:
-if (!S.Context.getTargetInfo().hasBFloat16Type())
+// Likewise, CUDA host and device may have different __bf16 support.
+if (!S.Context.getTargetInfo().hasBFloat16Type() && !S.getLangOpts().CUDA &&
+!(S.getLangOpts().OpenMP && S.getLangOpts().OpenMPIsDevice))
   S.Diag(DS.getTypeSpecTypeLoc(), diag::err_type_unsupported)
 << "__bf16";
 Result = Context.BFloat16Ty;
Index: clang/lib/AST/ASTContext.cpp
===
--- clang/lib/AST/ASTContext.cpp
+++ clang/lib/AST/ASTContext.cpp
@@ -2171,9 +2171,13 @@
   Align = Target->getLongFractAlign();
   break;
 case BuiltinType::BFloat16:
-  if (Target->hasBFloat16Type()) {
+  if (Target->hasBFloat16Type() &&
+  (!getLangOpts().OpenMP || !getLangOpts().OpenMPIsDevice)) {
 Width = Target->getBFloat16Width();
 Align = Target->getBFloat16Align();
+  } else if (AuxTarget && AuxTarget->hasBFloat16Type()) {
+Width = AuxTarget->getBFloat16Width();
+Align = AuxTarget->getBFloat16Align();
   }
   break;
 case BuiltinType::Float16:
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-11-24 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 477760.
Pierre-vh added a comment.

Not all targets have bf16 and AuxTarget may not be available all the time so I 
changed the condition slightly


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138651/new/

https://reviews.llvm.org/D138651

Files:
  clang/lib/AST/ASTContext.cpp
  clang/lib/Sema/SemaType.cpp
  clang/test/SemaCUDA/amdgpu-bf16.cu


Index: clang/test/SemaCUDA/amdgpu-bf16.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/amdgpu-bf16.cu
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -fsyntax-only -triple amdgcn-amd-amdhsa -aux-triple 
x86_64-unknown-linux-gnu -verify %s
+// expected-no-diagnostics
+
+// If AMDGPU is the main target and X86 the aux target, ensure we
+// don't complain about unsupported BF16 types in x86 code.
+
+#include "Inputs/cuda.h"
+
+__device__ void devicefn() {
+}
+
+__bf16 hostfn(__bf16 a) {
+  return a;
+}
Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -1518,7 +1518,9 @@
 break;
   case DeclSpec::TST_half:Result = Context.HalfTy; break;
   case DeclSpec::TST_BFloat16:
-if (!S.Context.getTargetInfo().hasBFloat16Type())
+// Likewise, CUDA host and device may have different __bf16 support.
+if (!S.Context.getTargetInfo().hasBFloat16Type() && !S.getLangOpts().CUDA 
&&
+!(S.getLangOpts().OpenMP && S.getLangOpts().OpenMPIsDevice))
   S.Diag(DS.getTypeSpecTypeLoc(), diag::err_type_unsupported)
 << "__bf16";
 Result = Context.BFloat16Ty;
Index: clang/lib/AST/ASTContext.cpp
===
--- clang/lib/AST/ASTContext.cpp
+++ clang/lib/AST/ASTContext.cpp
@@ -2174,6 +2174,10 @@
   if (Target->hasBFloat16Type()) {
 Width = Target->getBFloat16Width();
 Align = Target->getBFloat16Align();
+  } else if (AuxTarget && AuxTarget->hasBFloat16Type() &&
+ (getLangOpts().OpenMP || getLangOpts().OpenMPIsDevice)) {
+Width = AuxTarget->getBFloat16Width();
+Align = AuxTarget->getBFloat16Align();
   }
   break;
 case BuiltinType::Float16:


Index: clang/test/SemaCUDA/amdgpu-bf16.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/amdgpu-bf16.cu
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -fsyntax-only -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -verify %s
+// expected-no-diagnostics
+
+// If AMDGPU is the main target and X86 the aux target, ensure we
+// don't complain about unsupported BF16 types in x86 code.
+
+#include "Inputs/cuda.h"
+
+__device__ void devicefn() {
+}
+
+__bf16 hostfn(__bf16 a) {
+  return a;
+}
Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -1518,7 +1518,9 @@
 break;
   case DeclSpec::TST_half:Result = Context.HalfTy; break;
   case DeclSpec::TST_BFloat16:
-if (!S.Context.getTargetInfo().hasBFloat16Type())
+// Likewise, CUDA host and device may have different __bf16 support.
+if (!S.Context.getTargetInfo().hasBFloat16Type() && !S.getLangOpts().CUDA &&
+!(S.getLangOpts().OpenMP && S.getLangOpts().OpenMPIsDevice))
   S.Diag(DS.getTypeSpecTypeLoc(), diag::err_type_unsupported)
 << "__bf16";
 Result = Context.BFloat16Ty;
Index: clang/lib/AST/ASTContext.cpp
===
--- clang/lib/AST/ASTContext.cpp
+++ clang/lib/AST/ASTContext.cpp
@@ -2174,6 +2174,10 @@
   if (Target->hasBFloat16Type()) {
 Width = Target->getBFloat16Width();
 Align = Target->getBFloat16Align();
+  } else if (AuxTarget && AuxTarget->hasBFloat16Type() &&
+ (getLangOpts().OpenMP || getLangOpts().OpenMPIsDevice)) {
+Width = AuxTarget->getBFloat16Width();
+Align = AuxTarget->getBFloat16Align();
   }
   break;
 case BuiltinType::Float16:
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-11-24 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh planned changes to this revision.
Pierre-vh added a comment.

Need to fix a test crash


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138651/new/

https://reviews.llvm.org/D138651

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-11-24 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 477734.
Pierre-vh added a comment.

Add newline at end of file


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138651/new/

https://reviews.llvm.org/D138651

Files:
  clang/lib/AST/ASTContext.cpp
  clang/lib/Sema/SemaType.cpp
  clang/test/SemaCUDA/amdgpu-bf16.cu


Index: clang/test/SemaCUDA/amdgpu-bf16.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/amdgpu-bf16.cu
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -fsyntax-only -triple amdgcn-amd-amdhsa -aux-triple 
x86_64-unknown-linux-gnu -verify %s
+// expected-no-diagnostics
+
+// If AMDGPU is the main target and X86 the aux target, ensure we
+// don't complain about unsupported BF16 types in x86 code.
+
+#include "Inputs/cuda.h"
+
+__device__ void devicefn() {
+}
+
+__bf16 hostfn(__bf16 a) {
+  return a;
+}
Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -1518,7 +1518,9 @@
 break;
   case DeclSpec::TST_half:Result = Context.HalfTy; break;
   case DeclSpec::TST_BFloat16:
-if (!S.Context.getTargetInfo().hasBFloat16Type())
+// Likewise, CUDA host and device may have different __bf16 support.
+if (!S.Context.getTargetInfo().hasBFloat16Type() && !S.getLangOpts().CUDA 
&&
+!(S.getLangOpts().OpenMP && S.getLangOpts().OpenMPIsDevice))
   S.Diag(DS.getTypeSpecTypeLoc(), diag::err_type_unsupported)
 << "__bf16";
 Result = Context.BFloat16Ty;
Index: clang/lib/AST/ASTContext.cpp
===
--- clang/lib/AST/ASTContext.cpp
+++ clang/lib/AST/ASTContext.cpp
@@ -2171,9 +2171,15 @@
   Align = Target->getLongFractAlign();
   break;
 case BuiltinType::BFloat16:
-  if (Target->hasBFloat16Type()) {
+  if (Target->hasBFloat16Type() || !getLangOpts().OpenMP ||
+  !getLangOpts().OpenMPIsDevice) {
 Width = Target->getBFloat16Width();
 Align = Target->getBFloat16Align();
+  } else {
+assert(getLangOpts().OpenMP && getLangOpts().OpenMPIsDevice &&
+   "Expected OpenMP device compilation.");
+Width = AuxTarget->getBFloat16Width();
+Align = AuxTarget->getBFloat16Align();
   }
   break;
 case BuiltinType::Float16:


Index: clang/test/SemaCUDA/amdgpu-bf16.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/amdgpu-bf16.cu
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -fsyntax-only -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -verify %s
+// expected-no-diagnostics
+
+// If AMDGPU is the main target and X86 the aux target, ensure we
+// don't complain about unsupported BF16 types in x86 code.
+
+#include "Inputs/cuda.h"
+
+__device__ void devicefn() {
+}
+
+__bf16 hostfn(__bf16 a) {
+  return a;
+}
Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -1518,7 +1518,9 @@
 break;
   case DeclSpec::TST_half:Result = Context.HalfTy; break;
   case DeclSpec::TST_BFloat16:
-if (!S.Context.getTargetInfo().hasBFloat16Type())
+// Likewise, CUDA host and device may have different __bf16 support.
+if (!S.Context.getTargetInfo().hasBFloat16Type() && !S.getLangOpts().CUDA &&
+!(S.getLangOpts().OpenMP && S.getLangOpts().OpenMPIsDevice))
   S.Diag(DS.getTypeSpecTypeLoc(), diag::err_type_unsupported)
 << "__bf16";
 Result = Context.BFloat16Ty;
Index: clang/lib/AST/ASTContext.cpp
===
--- clang/lib/AST/ASTContext.cpp
+++ clang/lib/AST/ASTContext.cpp
@@ -2171,9 +2171,15 @@
   Align = Target->getLongFractAlign();
   break;
 case BuiltinType::BFloat16:
-  if (Target->hasBFloat16Type()) {
+  if (Target->hasBFloat16Type() || !getLangOpts().OpenMP ||
+  !getLangOpts().OpenMPIsDevice) {
 Width = Target->getBFloat16Width();
 Align = Target->getBFloat16Align();
+  } else {
+assert(getLangOpts().OpenMP && getLangOpts().OpenMPIsDevice &&
+   "Expected OpenMP device compilation.");
+Width = AuxTarget->getBFloat16Width();
+Align = AuxTarget->getBFloat16Align();
   }
   break;
 case BuiltinType::Float16:
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138651: [CUDA][HIP] Don't diagnose use for __bf16

2022-11-24 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh created this revision.
Pierre-vh added reviewers: arsenm, rjmccall, tra.
Herald added subscribers: kosarev, mattd, kerbowa, pengfei, tpr, yaxunl, 
jvesely.
Herald added a project: All.
Pierre-vh requested review of this revision.
Herald added subscribers: cfe-commits, sstefan1, wdng.
Herald added a reviewer: jdoerfert.
Herald added a project: clang.

e0fb01e97b6b7d2fe66b17b36eeb98aa78c6e3bb 
 caused 
issues in some of our HIP projects. Builds were failing because "__bf16" wasn't 
allowed on the target. This is because in those cases, the main target is 
AMDGPU (which doesn't have bf16), and the aux target is X86 (which has bf16).

This implements a fix similar to D57369  but 
for bf16 which prevents Clang from diagnosing uses of bf16 when compiling 
heterogenous applications.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D138651

Files:
  clang/lib/AST/ASTContext.cpp
  clang/lib/Sema/SemaType.cpp
  clang/test/SemaCUDA/amdgpu-bf16.cu


Index: clang/test/SemaCUDA/amdgpu-bf16.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/amdgpu-bf16.cu
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -fsyntax-only -triple amdgcn-amd-amdhsa -aux-triple 
x86_64-unknown-linux-gnu -verify %s
+// expected-no-diagnostics
+
+// If AMDGPU is the main target and X86 the aux target, ensure we
+// don't complain about unsupported BF16 types in x86 code.
+
+#include "Inputs/cuda.h"
+
+__device__ void devicefn() {
+}
+
+__bf16 hostfn(__bf16 a) {
+  return a;
+}
\ No newline at end of file
Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -1518,7 +1518,9 @@
 break;
   case DeclSpec::TST_half:Result = Context.HalfTy; break;
   case DeclSpec::TST_BFloat16:
-if (!S.Context.getTargetInfo().hasBFloat16Type())
+// Likewise, CUDA host and device may have different __bf16 support.
+if (!S.Context.getTargetInfo().hasBFloat16Type() && !S.getLangOpts().CUDA 
&&
+!(S.getLangOpts().OpenMP && S.getLangOpts().OpenMPIsDevice))
   S.Diag(DS.getTypeSpecTypeLoc(), diag::err_type_unsupported)
 << "__bf16";
 Result = Context.BFloat16Ty;
Index: clang/lib/AST/ASTContext.cpp
===
--- clang/lib/AST/ASTContext.cpp
+++ clang/lib/AST/ASTContext.cpp
@@ -2171,9 +2171,15 @@
   Align = Target->getLongFractAlign();
   break;
 case BuiltinType::BFloat16:
-  if (Target->hasBFloat16Type()) {
+  if (Target->hasBFloat16Type() || !getLangOpts().OpenMP ||
+  !getLangOpts().OpenMPIsDevice) {
 Width = Target->getBFloat16Width();
 Align = Target->getBFloat16Align();
+  } else {
+assert(getLangOpts().OpenMP && getLangOpts().OpenMPIsDevice &&
+   "Expected OpenMP device compilation.");
+Width = AuxTarget->getBFloat16Width();
+Align = AuxTarget->getBFloat16Align();
   }
   break;
 case BuiltinType::Float16:


Index: clang/test/SemaCUDA/amdgpu-bf16.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/amdgpu-bf16.cu
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -fsyntax-only -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -verify %s
+// expected-no-diagnostics
+
+// If AMDGPU is the main target and X86 the aux target, ensure we
+// don't complain about unsupported BF16 types in x86 code.
+
+#include "Inputs/cuda.h"
+
+__device__ void devicefn() {
+}
+
+__bf16 hostfn(__bf16 a) {
+  return a;
+}
\ No newline at end of file
Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -1518,7 +1518,9 @@
 break;
   case DeclSpec::TST_half:Result = Context.HalfTy; break;
   case DeclSpec::TST_BFloat16:
-if (!S.Context.getTargetInfo().hasBFloat16Type())
+// Likewise, CUDA host and device may have different __bf16 support.
+if (!S.Context.getTargetInfo().hasBFloat16Type() && !S.getLangOpts().CUDA &&
+!(S.getLangOpts().OpenMP && S.getLangOpts().OpenMPIsDevice))
   S.Diag(DS.getTypeSpecTypeLoc(), diag::err_type_unsupported)
 << "__bf16";
 Result = Context.BFloat16Ty;
Index: clang/lib/AST/ASTContext.cpp
===
--- clang/lib/AST/ASTContext.cpp
+++ clang/lib/AST/ASTContext.cpp
@@ -2171,9 +2171,15 @@
   Align = Target->getLongFractAlign();
   break;
 case BuiltinType::BFloat16:
-  if (Target->hasBFloat16Type()) {
+  if (Target->hasBFloat16Type() || !getLangOpts().OpenMP ||
+  !getLangOpts().OpenMPIsDevice) {
 Width = Target->getBFloat16Width();
 Align = Target->getBFloat16Align();
+  } 

[PATCH] D137251: [clang][cuda/hip] Allow `__noinline__` lambdas

2022-11-04 Thread Pierre van Houtryve via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Pierre-vh marked 3 inline comments as done.
Closed by commit rGc05f1639f7f4: [clang][cuda/hip] Allow `__noinline__`  
lambdas (authored by Pierre-vh).

Changed prior to commit:
  https://reviews.llvm.org/D137251?vs=472875&id=473145#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137251/new/

https://reviews.llvm.org/D137251

Files:
  clang/docs/ReleaseNotes.rst
  clang/lib/Parse/ParseExprCXX.cpp
  clang/test/CodeGenCUDA/lambda-noinline.cu
  clang/test/Parser/lambda-attr.cu


Index: clang/test/Parser/lambda-attr.cu
===
--- clang/test/Parser/lambda-attr.cu
+++ clang/test/Parser/lambda-attr.cu
@@ -18,6 +18,10 @@
   ([&](int) __attribute__((device)){ device_fn(); })(0);
   // expected-warning@-1 {{nvcc does not allow '__device__' to appear after 
the parameter list in lambdas}}
   ([&] __attribute__((device)) (int) { device_fn(); })(0);
+
+  // test that noinline can appear anywhere.
+  ([&] __attribute__((device)) __noinline__ () { device_fn(); })();
+  ([&] __noinline__ __attribute__((device)) () { device_fn(); })();
 }
 
 __attribute__((host)) __attribute__((device)) void host_device_attrs() {
@@ -37,6 +41,11 @@
   // expected-warning@-1 {{nvcc does not allow '__host__' to appear after the 
parameter list in lambdas}}
   // expected-warning@-2 {{nvcc does not allow '__device__' to appear after 
the parameter list in lambdas}}
   ([&] __attribute__((host)) __attribute__((device)) (int) { hd_fn(); })(0);
+
+  // test that noinline can also appear anywhere.
+  ([] __attribute__((host)) __attribute__((device)) () { hd_fn(); })();
+  ([] __attribute__((host)) __noinline__ __attribute__((device)) () { hd_fn(); 
})();
+  ([] __attribute__((host)) __attribute__((device)) __noinline__ () { hd_fn(); 
})();
 }
 
 // TODO: Add tests for __attribute__((global)) once we support global lambdas.
Index: clang/test/CodeGenCUDA/lambda-noinline.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDA/lambda-noinline.cu
@@ -0,0 +1,23 @@
+// RUN: %clang_cc1 -no-opaque-pointers -x hip -emit-llvm -std=c++11 %s -o - \
+// RUN:   -triple x86_64-linux-gnu \
+// RUN:   | FileCheck -check-prefix=HOST %s
+// RUN: %clang_cc1 -no-opaque-pointers -x hip -emit-llvm -std=c++11 %s -o - \
+// RUN:   -triple amdgcn-amd-amdhsa -fcuda-is-device \
+// RUN:   | FileCheck -check-prefix=DEV %s
+
+#include "Inputs/cuda.h"
+
+// Checks noinline is correctly added to the lambda function.
+
+// HOST: define{{.*}}@_ZZ4HostvENKUlvE_clEv({{.*}}) #[[ATTR:[0-9]+]]
+// HOST: attributes #[[ATTR]]{{.*}}noinline
+
+// DEV: define{{.*}}@_ZZ6DevicevENKUlvE_clEv({{.*}}) #[[ATTR:[0-9]+]]
+// DEV: attributes #[[ATTR]]{{.*}}noinline
+
+__device__ int a;
+int b;
+
+__device__ int Device() { return ([&] __device__ __noinline__ (){ return a; 
})(); }
+
+__host__ int Host() { return ([&] __host__ __noinline__ (){ return b; })(); }
Index: clang/lib/Parse/ParseExprCXX.cpp
===
--- clang/lib/Parse/ParseExprCXX.cpp
+++ clang/lib/Parse/ParseExprCXX.cpp
@@ -1291,7 +1291,22 @@
   if (getLangOpts().CUDA) {
 // In CUDA code, GNU attributes are allowed to appear immediately after the
 // "[...]", even if there is no "(...)" before the lambda body.
-MaybeParseGNUAttributes(D);
+//
+// Note that we support __noinline__ as a keyword in this mode and thus
+// it has to be separately handled.
+while (true) {
+  if (Tok.is(tok::kw___noinline__)) {
+IdentifierInfo *AttrName = Tok.getIdentifierInfo();
+SourceLocation AttrNameLoc = ConsumeToken();
+Attr.addNew(AttrName, AttrNameLoc, nullptr, AttrNameLoc, nullptr, 0,
+ParsedAttr::AS_Keyword);
+  } else if (Tok.is(tok::kw___attribute))
+ParseGNUAttributes(Attr, nullptr, &D);
+  else
+break;
+}
+
+D.takeAttributes(Attr);
   }
 
   // Helper to emit a warning if we see a CUDA host/device/global attribute
Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -638,6 +638,9 @@
 CUDA/HIP Language Changes in Clang
 --
 
+ - Allow the use of ``__noinline__`` as a keyword (instead of 
``__attribute__((noinline))``)
+   in lambda declarations.
+
 Objective-C Language Changes in Clang
 -
 


Index: clang/test/Parser/lambda-attr.cu
===
--- clang/test/Parser/lambda-attr.cu
+++ clang/test/Parser/lambda-attr.cu
@@ -18,6 +18,10 @@
   ([&](int) __attribute__((device)){ device_fn(); })(0);
   // expected-warning@-1 {{nvcc does not allow '__device__' to appear after the parameter list in lambdas}}
   ([&] __attri

[PATCH] D137251: [clang][cuda/hip] Allow `__noinline__` lambdas

2022-11-03 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added inline comments.



Comment at: clang/lib/Parse/ParseExprCXX.cpp:1300
+ParseGNUAttributes(Attr, nullptr, &D);
+  } else if (Tok.is(tok::kw___noinline__)) {
+IdentifierInfo *AttrName = Tok.getIdentifierInfo();

aaron.ballman wrote:
> Any other keyword attributes that are missing?
> 
> `alignas`/`_Alignas`
> `__forceinline`
> `__cdecl`/`__stdcall`/etc
> 
I'm not too familiar with how those attributes work yet, so maybe there's more 
to handle but I don't have any concrete example of it being an issue and would 
rather not touch those unless needed


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137251/new/

https://reviews.llvm.org/D137251

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137251: [clang][cuda/hip] Allow `__noinline__` lambdas

2022-11-03 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 472875.
Pierre-vh marked 2 inline comments as done.
Pierre-vh added a comment.

Comments

Not sure if the release note is in the right place though.
As for the test, I did something quite targeted/minimal, hope it's fine?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137251/new/

https://reviews.llvm.org/D137251

Files:
  clang/docs/ReleaseNotes.rst
  clang/lib/Parse/ParseExprCXX.cpp
  clang/test/CodeGenCUDA/lambda-noinline.cu
  clang/test/Parser/lambda-attr.cu

Index: clang/test/Parser/lambda-attr.cu
===
--- clang/test/Parser/lambda-attr.cu
+++ clang/test/Parser/lambda-attr.cu
@@ -18,6 +18,10 @@
   ([&](int) __attribute__((device)){ device_fn(); })(0);
   // expected-warning@-1 {{nvcc does not allow '__device__' to appear after the parameter list in lambdas}}
   ([&] __attribute__((device)) (int) { device_fn(); })(0);
+
+  // test that noinline can appear anywhere.
+  ([&] __attribute__((device)) __noinline__ () { device_fn(); })();
+  ([&] __noinline__ __attribute__((device)) () { device_fn(); })();
 }
 
 __attribute__((host)) __attribute__((device)) void host_device_attrs() {
@@ -37,6 +41,11 @@
   // expected-warning@-1 {{nvcc does not allow '__host__' to appear after the parameter list in lambdas}}
   // expected-warning@-2 {{nvcc does not allow '__device__' to appear after the parameter list in lambdas}}
   ([&] __attribute__((host)) __attribute__((device)) (int) { hd_fn(); })(0);
+
+  // test that noinline can also appear anywhere.
+  ([] __attribute__((host)) __attribute__((device)) () { hd_fn(); })();
+  ([] __attribute__((host)) __noinline__ __attribute__((device)) () { hd_fn(); })();
+  ([] __attribute__((host)) __attribute__((device)) __noinline__ () { hd_fn(); })();
 }
 
 // TODO: Add tests for __attribute__((global)) once we support global lambdas.
Index: clang/test/CodeGenCUDA/lambda-noinline.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDA/lambda-noinline.cu
@@ -0,0 +1,23 @@
+// RUN: %clang_cc1 -no-opaque-pointers -x hip -emit-llvm -std=c++11 %s -o - \
+// RUN:   -triple x86_64-linux-gnu \
+// RUN:   | FileCheck -check-prefix=HOST %s
+// RUN: %clang_cc1 -no-opaque-pointers -x hip -emit-llvm -std=c++11 %s -o - \
+// RUN:   -triple amdgcn-amd-amdhsa -fcuda-is-device \
+// RUN:   | FileCheck -check-prefix=DEV %s
+
+#include "Inputs/cuda.h"
+
+// Checks noinline is correctly added to the lambda function.
+
+// HOST: define{{.*}}@_ZZ4HostvENKUlvE_clEv({{.*}}) #[[ATTR:[0-9]+]]
+// HOST: attributes #[[ATTR]]{{.*}}noinline
+
+// DEV: define{{.*}}@_ZZ6DevicevENKUlvE_clEv({{.*}}) #[[ATTR:[0-9]+]]
+// DEV: attributes #[[ATTR]]{{.*}}noinline
+
+__device__ int a;
+int b;
+
+__device__ int Device() { return ([&] __device__ __noinline__ (){ return a; })(); }
+
+__host__ int Host() { return ([&] __host__ __noinline__ (){ return b; })(); }
Index: clang/lib/Parse/ParseExprCXX.cpp
===
--- clang/lib/Parse/ParseExprCXX.cpp
+++ clang/lib/Parse/ParseExprCXX.cpp
@@ -1291,7 +1291,22 @@
   if (getLangOpts().CUDA) {
 // In CUDA code, GNU attributes are allowed to appear immediately after the
 // "[...]", even if there is no "(...)" before the lambda body.
-MaybeParseGNUAttributes(D);
+//
+// Note that we support __noinline__ as a keyword in this mode and thus
+// it has to be separately handled.
+while (true) {
+  if (Tok.is(tok::kw___noinline__)) {
+IdentifierInfo *AttrName = Tok.getIdentifierInfo();
+SourceLocation AttrNameLoc = ConsumeToken();
+Attr.addNew(AttrName, AttrNameLoc, nullptr, AttrNameLoc, nullptr, 0,
+ParsedAttr::AS_Keyword);
+  } else if (Tok.is(tok::kw___attribute))
+ParseGNUAttributes(Attr, nullptr, &D);
+  else
+break;
+}
+
+D.takeAttributes(Attr);
   }
 
   // Helper to emit a warning if we see a CUDA host/device/global attribute
Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -615,6 +615,9 @@
 CUDA/HIP Language Changes in Clang
 --
 
+ - Allow the use of `__noinline__` as a keyword (instead of `__attribute__((noinline))`)
+   in lambda declarations.
+
 Objective-C Language Changes in Clang
 -
 
@@ -751,8 +754,8 @@
 - Introduced the new function ``clang_CXXMethod_isCopyAssignmentOperator``,
   which identifies whether a method cursor is a copy-assignment
   operator.
-- ``clang_Cursor_getNumTemplateArguments``, ``clang_Cursor_getTemplateArgumentKind``, 
-  ``clang_Cursor_getTemplateArgumentType``, ``clang_Cursor_getTemplateArgumentValue`` and 
+- ``clang_Cursor_getNumTemplateArguments``, ``clang_Cursor_getTemplateArgumentKind``,
+ 

[PATCH] D137251: [clang][cuda/hip] Allow `__noinline__` lambdas

2022-11-02 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh created this revision.
Pierre-vh added reviewers: yaxunl, tra, aaron.ballman, rsmith.
Herald added a subscriber: mattd.
Herald added a project: All.
Pierre-vh requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

D124866  seem to have had an unintended side 
effect: __noinline__ on lambdas was no longer accepted.

This fixes the regression and adds a test case for it.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D137251

Files:
  clang/lib/Parse/ParseExprCXX.cpp
  clang/test/Parser/lambda-attr.cu


Index: clang/test/Parser/lambda-attr.cu
===
--- clang/test/Parser/lambda-attr.cu
+++ clang/test/Parser/lambda-attr.cu
@@ -18,6 +18,10 @@
   ([&](int) __attribute__((device)){ device_fn(); })(0);
   // expected-warning@-1 {{nvcc does not allow '__device__' to appear after 
the parameter list in lambdas}}
   ([&] __attribute__((device)) (int) { device_fn(); })(0);
+
+  // test that noinline can appear anywhere.
+  ([&] __attribute__((device)) __noinline__ () { device_fn(); })();
+  ([&] __noinline__ __attribute__((device)) () { device_fn(); })();
 }
 
 __attribute__((host)) __attribute__((device)) void host_device_attrs() {
@@ -37,6 +41,11 @@
   // expected-warning@-1 {{nvcc does not allow '__host__' to appear after the 
parameter list in lambdas}}
   // expected-warning@-2 {{nvcc does not allow '__device__' to appear after 
the parameter list in lambdas}}
   ([&] __attribute__((host)) __attribute__((device)) (int) { hd_fn(); })(0);
+
+  // test that noinline can also appear anywhere.
+  ([] __attribute__((host)) __attribute__((device)) () { hd_fn(); })();
+  ([] __attribute__((host)) __noinline__ __attribute__((device)) () { hd_fn(); 
})();
+  ([] __attribute__((host)) __attribute__((device)) __noinline__ () { hd_fn(); 
})();
 }
 
 // TODO: Add tests for __attribute__((global)) once we support global lambdas.
Index: clang/lib/Parse/ParseExprCXX.cpp
===
--- clang/lib/Parse/ParseExprCXX.cpp
+++ clang/lib/Parse/ParseExprCXX.cpp
@@ -1291,7 +1291,23 @@
   if (getLangOpts().CUDA) {
 // In CUDA code, GNU attributes are allowed to appear immediately after the
 // "[...]", even if there is no "(...)" before the lambda body.
-MaybeParseGNUAttributes(D);
+//
+// Note that we support __noinline__ as a keyword in this mode and thus
+// it has to be separately handled.
+while (true) {
+  if (Tok.is(tok::kw___attribute)) {
+ParseGNUAttributes(Attr, nullptr, &D);
+  } else if (Tok.is(tok::kw___noinline__)) {
+IdentifierInfo *AttrName = Tok.getIdentifierInfo();
+SourceLocation AttrNameLoc = ConsumeToken();
+Attr.addNew(AttrName, AttrNameLoc, nullptr, AttrNameLoc, nullptr, 0,
+ParsedAttr::AS_Keyword);
+  } else {
+break;
+  }
+}
+
+D.takeAttributes(Attr);
   }
 
   // Helper to emit a warning if we see a CUDA host/device/global attribute


Index: clang/test/Parser/lambda-attr.cu
===
--- clang/test/Parser/lambda-attr.cu
+++ clang/test/Parser/lambda-attr.cu
@@ -18,6 +18,10 @@
   ([&](int) __attribute__((device)){ device_fn(); })(0);
   // expected-warning@-1 {{nvcc does not allow '__device__' to appear after the parameter list in lambdas}}
   ([&] __attribute__((device)) (int) { device_fn(); })(0);
+
+  // test that noinline can appear anywhere.
+  ([&] __attribute__((device)) __noinline__ () { device_fn(); })();
+  ([&] __noinline__ __attribute__((device)) () { device_fn(); })();
 }
 
 __attribute__((host)) __attribute__((device)) void host_device_attrs() {
@@ -37,6 +41,11 @@
   // expected-warning@-1 {{nvcc does not allow '__host__' to appear after the parameter list in lambdas}}
   // expected-warning@-2 {{nvcc does not allow '__device__' to appear after the parameter list in lambdas}}
   ([&] __attribute__((host)) __attribute__((device)) (int) { hd_fn(); })(0);
+
+  // test that noinline can also appear anywhere.
+  ([] __attribute__((host)) __attribute__((device)) () { hd_fn(); })();
+  ([] __attribute__((host)) __noinline__ __attribute__((device)) () { hd_fn(); })();
+  ([] __attribute__((host)) __attribute__((device)) __noinline__ () { hd_fn(); })();
 }
 
 // TODO: Add tests for __attribute__((global)) once we support global lambdas.
Index: clang/lib/Parse/ParseExprCXX.cpp
===
--- clang/lib/Parse/ParseExprCXX.cpp
+++ clang/lib/Parse/ParseExprCXX.cpp
@@ -1291,7 +1291,23 @@
   if (getLangOpts().CUDA) {
 // In CUDA code, GNU attributes are allowed to appear immediately after the
 // "[...]", even if there is no "(...)" before the lambda body.
-MaybeParseGNUAttributes(D);
+//
+// Note that we support __noinline__ as a keyword in this

[PATCH] D55226: [Fix][StaticAnalyzer] Bug 39792 - False positive on strcpy targeting struct member

2019-01-14 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

Hello!

I'm trying one last ping since it's been a month and it hasn't been commited (I 
think).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55226/new/

https://reviews.llvm.org/D55226



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55226: [Fix][StaticAnalyzer] Bug 39792 - False positive on strcpy targeting struct member

2018-12-11 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

Hello!

I'm pinging since it's been a week. If someone can commit this patch on my 
behalf, that would be great.

Thank you :)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55226/new/

https://reviews.llvm.org/D55226



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55226: [Fix][StaticAnalyzer] Bug 39792 - False positive on strcpy targeting struct member

2018-12-05 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh marked an inline comment as done.
Pierre-vh added a comment.

Hi again!

As I'm quite new to this, I don't know what the next step is. Do we need to 
wait for more people to review this diff?
What happens when it's considered "ready"? How is it committed? (I don't have 
commit access)

Thank you for your help!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55226/new/

https://reviews.llvm.org/D55226



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55226: [Fix][StaticAnalyzer] Bug 39792 - False positive on strcpy targeting struct member

2018-12-04 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 176676.
Pierre-vh added a comment.

Here's the diff without the extra newline :)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55226/new/

https://reviews.llvm.org/D55226

Files:
  lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
  test/Analysis/security-syntax-checks.m


Index: test/Analysis/security-syntax-checks.m
===
--- test/Analysis/security-syntax-checks.m
+++ test/Analysis/security-syntax-checks.m
@@ -177,6 +177,11 @@
   strcpy(x, "abcd");
 }
 
+void test_strcpy_safe_2() {
+  struct {char s1[100];} s;
+  strcpy(s.s1, "hello");
+}
+
 //===--===
 // strcat()
 //===--===
Index: lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
===
--- lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
+++ lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
@@ -651,13 +651,12 @@
 
   const auto *Target = CE->getArg(0)->IgnoreImpCasts(),
  *Source = CE->getArg(1)->IgnoreImpCasts();
-  if (const auto *DeclRef = dyn_cast(Target))
-if (const auto *Array = dyn_cast(DeclRef->getType())) {
-  uint64_t ArraySize = BR.getContext().getTypeSize(Array) / 8;
-  if (const auto *String = dyn_cast(Source)) {
-if (ArraySize >= String->getLength() + 1)
-  return;
-  }
+
+  if (const auto *Array = dyn_cast(Target->getType())) {
+uint64_t ArraySize = BR.getContext().getTypeSize(Array) / 8;
+if (const auto *String = dyn_cast(Source)) {
+  if (ArraySize >= String->getLength() + 1)
+return;
 }
 
   // Issue a warning.


Index: test/Analysis/security-syntax-checks.m
===
--- test/Analysis/security-syntax-checks.m
+++ test/Analysis/security-syntax-checks.m
@@ -177,6 +177,11 @@
   strcpy(x, "abcd");
 }
 
+void test_strcpy_safe_2() {
+  struct {char s1[100];} s;
+  strcpy(s.s1, "hello");
+}
+
 //===--===
 // strcat()
 //===--===
Index: lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
===
--- lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
+++ lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
@@ -651,13 +651,12 @@
 
   const auto *Target = CE->getArg(0)->IgnoreImpCasts(),
  *Source = CE->getArg(1)->IgnoreImpCasts();
-  if (const auto *DeclRef = dyn_cast(Target))
-if (const auto *Array = dyn_cast(DeclRef->getType())) {
-  uint64_t ArraySize = BR.getContext().getTypeSize(Array) / 8;
-  if (const auto *String = dyn_cast(Source)) {
-if (ArraySize >= String->getLength() + 1)
-  return;
-  }
+
+  if (const auto *Array = dyn_cast(Target->getType())) {
+uint64_t ArraySize = BR.getContext().getTypeSize(Array) / 8;
+if (const auto *String = dyn_cast(Source)) {
+  if (ArraySize >= String->getLength() + 1)
+return;
 }
 
   // Issue a warning.
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55226: [Fix][StaticAnalyzer] Bug 39792 - False positive on strcpy targeting struct member

2018-12-04 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh updated this revision to Diff 176661.
Pierre-vh added a comment.

Hello again! I updated the diff and completely removed the outer if.  Please 
let me know what you think!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55226/new/

https://reviews.llvm.org/D55226

Files:
  lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
  test/Analysis/security-syntax-checks.m


Index: test/Analysis/security-syntax-checks.m
===
--- test/Analysis/security-syntax-checks.m
+++ test/Analysis/security-syntax-checks.m
@@ -177,6 +177,11 @@
   strcpy(x, "abcd");
 }
 
+void test_strcpy_safe_2() {
+  struct {char s1[100];} s;
+  strcpy(s.s1, "hello");
+}
+
 //===--===
 // strcat()
 //===--===
Index: lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
===
--- lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
+++ lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
@@ -651,15 +651,15 @@
 
   const auto *Target = CE->getArg(0)->IgnoreImpCasts(),
  *Source = CE->getArg(1)->IgnoreImpCasts();
-  if (const auto *DeclRef = dyn_cast(Target))
-if (const auto *Array = dyn_cast(DeclRef->getType())) {
-  uint64_t ArraySize = BR.getContext().getTypeSize(Array) / 8;
-  if (const auto *String = dyn_cast(Source)) {
-if (ArraySize >= String->getLength() + 1)
-  return;
-  }
+
+  if (const auto *Array = dyn_cast(Target->getType())) {
+uint64_t ArraySize = BR.getContext().getTypeSize(Array) / 8;
+if (const auto *String = dyn_cast(Source)) {
+  if (ArraySize >= String->getLength() + 1)
+return;
 }
 
+
   // Issue a warning.
   PathDiagnosticLocation CELoc =
 PathDiagnosticLocation::createBegin(CE, BR.getSourceManager(), AC);


Index: test/Analysis/security-syntax-checks.m
===
--- test/Analysis/security-syntax-checks.m
+++ test/Analysis/security-syntax-checks.m
@@ -177,6 +177,11 @@
   strcpy(x, "abcd");
 }
 
+void test_strcpy_safe_2() {
+  struct {char s1[100];} s;
+  strcpy(s.s1, "hello");
+}
+
 //===--===
 // strcat()
 //===--===
Index: lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
===
--- lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
+++ lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
@@ -651,15 +651,15 @@
 
   const auto *Target = CE->getArg(0)->IgnoreImpCasts(),
  *Source = CE->getArg(1)->IgnoreImpCasts();
-  if (const auto *DeclRef = dyn_cast(Target))
-if (const auto *Array = dyn_cast(DeclRef->getType())) {
-  uint64_t ArraySize = BR.getContext().getTypeSize(Array) / 8;
-  if (const auto *String = dyn_cast(Source)) {
-if (ArraySize >= String->getLength() + 1)
-  return;
-  }
+
+  if (const auto *Array = dyn_cast(Target->getType())) {
+uint64_t ArraySize = BR.getContext().getTypeSize(Array) / 8;
+if (const auto *String = dyn_cast(Source)) {
+  if (ArraySize >= String->getLength() + 1)
+return;
 }
 
+
   // Issue a warning.
   PathDiagnosticLocation CELoc =
 PathDiagnosticLocation::createBegin(CE, BR.getSourceManager(), AC);
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55226: [Fix][StaticAnalyzer] Bug 39792 - False positive on strcpy targeting struct member

2018-12-03 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh added a comment.

In D55226#1317083 , @george.karpenkov 
wrote:

> Thank you for the fix, but how far can the pattern matching go? Seems easy 
> enough to think of cases not covered by the above.
>  In any case, the fix looks good.


Hey,

Sadly I'm not experienced enough to think of every case that should pass this 
check, so I limited myself to just fixing the bug.
Can't we totally remove the outer if so we allow every `Target` expression that 
has a `ConstantArrayType` to pass this check?

Thank you for your time!


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55226/new/

https://reviews.llvm.org/D55226



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55226: [Fix][StaticAnalyzer] Bug 39792 - False positive on strcpy targeting struct member

2018-12-03 Thread Pierre van Houtryve via Phabricator via cfe-commits
Pierre-vh created this revision.
Pierre-vh added reviewers: dcoughlin, MaskRay.
Herald added subscribers: cfe-commits, dkrupp, donat.nagy, Szelethus, 
a.sidorin, szepet, baloghadamsoftware.
Herald added a reviewer: george.karpenkov.

Fix for the bug n°39792: False positive on strcpy targeting struct member
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=39792

I fixed it by replacing the use of `dyn_cast` by two `isa`s to check if 
`Target` is a `DeclRefExpr` or a `MemberExpr`.
The removal of the `DeclRef` variable seems to be meaningless because the only 
place where the `DeclRef` variable was used is one line below, and it was used 
to call a method which is inherited from Expr. 
Thus, replacing the only use of `DeclRef` by `Target` should have no effect.

I also added a small test for this bugfix in 
`test/Analysis/security-syntax-checks.m`

**Note:** I think we can completely remove the outer `if 
(isa(Target) || isa(Target))`, no? Why should we only 
allow `DeclRefExpr`s to pass this check?

**PS:** This is my first contribution ever to CLang (or any other open source 
project), so I'm totally open to feedback, even if it's harsh.

Thank you for your attention!


Repository:
  rC Clang

https://reviews.llvm.org/D55226

Files:
  lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
  test/Analysis/security-syntax-checks.m


Index: test/Analysis/security-syntax-checks.m
===
--- test/Analysis/security-syntax-checks.m
+++ test/Analysis/security-syntax-checks.m
@@ -177,6 +177,11 @@
   strcpy(x, "abcd");
 }
 
+void test_strcpy_safe_2() {
+  struct {char s1[100];} s;
+  strcpy(s.s1, "hello");
+}
+
 //===--===
 // strcat()
 //===--===
Index: lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
===
--- lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
+++ lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
@@ -651,8 +651,8 @@
 
   const auto *Target = CE->getArg(0)->IgnoreImpCasts(),
  *Source = CE->getArg(1)->IgnoreImpCasts();
-  if (const auto *DeclRef = dyn_cast(Target))
-if (const auto *Array = dyn_cast(DeclRef->getType())) {
+   if (isa(Target) || isa(Target))
+if (const auto *Array = dyn_cast(Target->getType())) {
   uint64_t ArraySize = BR.getContext().getTypeSize(Array) / 8;
   if (const auto *String = dyn_cast(Source)) {
 if (ArraySize >= String->getLength() + 1)


Index: test/Analysis/security-syntax-checks.m
===
--- test/Analysis/security-syntax-checks.m
+++ test/Analysis/security-syntax-checks.m
@@ -177,6 +177,11 @@
   strcpy(x, "abcd");
 }
 
+void test_strcpy_safe_2() {
+  struct {char s1[100];} s;
+  strcpy(s.s1, "hello");
+}
+
 //===--===
 // strcat()
 //===--===
Index: lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
===
--- lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
+++ lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
@@ -651,8 +651,8 @@
 
   const auto *Target = CE->getArg(0)->IgnoreImpCasts(),
  *Source = CE->getArg(1)->IgnoreImpCasts();
-  if (const auto *DeclRef = dyn_cast(Target))
-if (const auto *Array = dyn_cast(DeclRef->getType())) {
+	if (isa(Target) || isa(Target))
+if (const auto *Array = dyn_cast(Target->getType())) {
   uint64_t ArraySize = BR.getContext().getTypeSize(Array) / 8;
   if (const auto *String = dyn_cast(Source)) {
 if (ArraySize >= String->getLength() + 1)
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits