[clang] [clang] Improve diagnostics for constraints of inline asm (NFC) (PR #96363)

2024-06-24 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented:

It's really unfortunate to have to add all this asm handling to clang. Can't it 
rely on backend diagnostic remarks for this? 

https://github.com/llvm/llvm-project/pull/96363
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Improve diagnostics for constraints of inline asm (NFC) (PR #96363)

2024-06-24 Thread Matt Arsenault via cfe-commits


@@ -2626,14 +2629,20 @@ void CodeGenFunction::EmitAsmStmt(const AsmStmt ) {
   SmallVector OutputConstraintInfos;
   SmallVector InputConstraintInfos;
 
+  const FunctionDecl *FD = dyn_cast_or_null(CurCodeDecl);

arsenm wrote:

Where do you get dyn_cast_or_null is deprecated? I only remember a rejected RFC 
to start using dyn_cast_if_present

https://github.com/llvm/llvm-project/pull/96363
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-24 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Kindly review only the top commit here

If you're going to repost with a pre-commit, it would be better to have all the 
pieces squashed into one. Also you could look into using graphite or SPR for 
managing dependent pull requests 

https://github.com/llvm/llvm-project/pull/96473
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-23 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/95396
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] AMDGPU: Start selecting buffer fat pointer atomicrmw fmin/fmax (PR #95593)

2024-06-23 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/95593
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-23 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/95592
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-23 Thread Matt Arsenault via cfe-commits

arsenm wrote:

### Merge activity

* **Jun 23, 4:06 AM EDT**: @arsenm started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/95592).


https://github.com/llvm/llvm-project/pull/95592
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Matt Arsenault via cfe-commits

arsenm wrote:

Incrementing by align is just a bug, of course the size is the real value. 
Whether we want to continue wasting space is another not-correctness discussion 

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Here, because the minimum alignment is 4, we will only increment the
buffer by 4, 

It should be incrementing by the size? 4 byte aligned access of 8 byte type 
should work fine

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-22 Thread Matt Arsenault via cfe-commits


@@ -1671,6 +1671,7 @@ int main(int Argc, char **Argv) {
 NewArgv.push_back(Arg->getValue());
   for (const opt::Arg *Arg : Args.filtered(OPT_offload_opt_eq_minus))
 NewArgv.push_back(Args.MakeArgString(StringRef("-") + Arg->getValue()));
+  llvm::errs() << "asdfasdf\n";

arsenm wrote:

Leftover debug print 

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Replace `emitXXXBuiltin` with a unified interface (PR #96313)

2024-06-21 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/96313
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits


@@ -149,6 +149,12 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", 
"nc")
 BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")
 
 BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc")
+BUILTIN(__builtin_amdgcn_raw_buffer_store_b8, "vcQbiiIi", "n")
+BUILTIN(__builtin_amdgcn_raw_buffer_store_b16, "vsQbiiIi", "n")
+BUILTIN(__builtin_amdgcn_raw_buffer_store_b32, "viQbiiIi", "n")
+BUILTIN(__builtin_amdgcn_raw_buffer_store_b64, "vWiQbiiIi", "n")
+BUILTIN(__builtin_amdgcn_raw_buffer_store_b96, "vV3iQbiiIi", "n")

arsenm wrote:

It's no more of a pain than any other bitcast 

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits


@@ -149,6 +149,12 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", 
"nc")
 BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")
 
 BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc")
+BUILTIN(__builtin_amdgcn_raw_buffer_store_b8, "vcQbiiIi", "n")
+BUILTIN(__builtin_amdgcn_raw_buffer_store_b16, "vsQbiiIi", "n")
+BUILTIN(__builtin_amdgcn_raw_buffer_store_b32, "viQbiiIi", "n")
+BUILTIN(__builtin_amdgcn_raw_buffer_store_b64, "vWiQbiiIi", "n")
+BUILTIN(__builtin_amdgcn_raw_buffer_store_b96, "vV3iQbiiIi", "n")

arsenm wrote:

The b64/b96/b128 cases should canonically use v2i32/v3i32/v4i32 

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Replace `emitXXXBuiltin` with a unified interface (PR #96313)

2024-06-21 Thread Matt Arsenault via cfe-commits


@@ -581,49 +581,19 @@ static Value 
*emitCallMaybeConstrainedFPBuiltin(CodeGenFunction ,
 return CGF.Builder.CreateCall(F, Args);
 }
 
-// Emit a simple mangled intrinsic that has 1 argument and a return type
-// matching the argument type.
-static Value *emitUnaryBuiltin(CodeGenFunction , const CallExpr *E,
-   unsigned IntrinsicID,
-   llvm::StringRef Name = "") {
-  llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
-
-  Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
-  return CGF.Builder.CreateCall(F, Src0, Name);
-}
-
-// Emit an intrinsic that has 2 operands of the same type as its result.
-static Value *emitBinaryBuiltin(CodeGenFunction ,
-const CallExpr *E,
-unsigned IntrinsicID) {
-  llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
-  llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
-
-  Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
-  return CGF.Builder.CreateCall(F, { Src0, Src1 });
-}
-
-// Emit an intrinsic that has 3 operands of the same type as its result.
-static Value *emitTernaryBuiltin(CodeGenFunction ,
- const CallExpr *E,
- unsigned IntrinsicID) {
-  llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
-  llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
-  llvm::Value *Src2 = CGF.EmitScalarExpr(E->getArg(2));
-
-  Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
-  return CGF.Builder.CreateCall(F, { Src0, Src1, Src2 });
-}
-
-static Value *emitQuaternaryBuiltin(CodeGenFunction , const CallExpr *E,
-unsigned IntrinsicID) {
-  llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
-  llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
-  llvm::Value *Src2 = CGF.EmitScalarExpr(E->getArg(2));
-  llvm::Value *Src3 = CGF.EmitScalarExpr(E->getArg(3));
-
-  Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
-  return CGF.Builder.CreateCall(F, {Src0, Src1, Src2, Src3});
+// Emit a simple intrinsic that has N arguments and a return type matching the
+// argument type. It is assumed that only the first argument is mangled and all
+// arguments are scalar expressions.
+template 
+Value *emitBuiltinWithSingleMangling(CodeGenFunction , const CallExpr *E,

arsenm wrote:

WithOneOverloadedType? 

https://github.com/llvm/llvm-project/pull/96313
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits


@@ -149,6 +149,19 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", 
"nc")
 BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")
 
 BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc")
+BUILTIN(__builtin_amdgcn_raw_ptr_buffer_store_i8, "vcQbiiIi", "n")

arsenm wrote:

Plus we can always write the instcombine to fold the bitcast into the intrinsic 
signature 

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits


@@ -149,6 +149,19 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", 
"nc")
 BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")
 
 BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc")
+BUILTIN(__builtin_amdgcn_raw_ptr_buffer_store_i8, "vcQbiiIi", "n")

arsenm wrote:

Yes. Pre-gfx12 it was _byte, _short, _dword, _dwordxN. But that also means you 
need casts and we drop the FP and 16-element versions so I don't know which way 
is best. Ideally we would just use the elementwise intrinsic approach but that 
won't work for loads 

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits


@@ -626,6 +626,18 @@ static Value *emitQuaternaryBuiltin(CodeGenFunction , 
const CallExpr *E,
   return CGF.Builder.CreateCall(F, {Src0, Src1, Src2, Src3});
 }
 
+static Value *emitQuinaryBuiltin(CodeGenFunction , const CallExpr *E,

arsenm wrote:

The name is a bit misleading. It's more like a unary builtin with a single 
mangling type 

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits


@@ -149,6 +149,19 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", 
"nc")
 BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")
 
 BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc")
+BUILTIN(__builtin_amdgcn_raw_ptr_buffer_store_i8, "vcQbiiIi", "n")

arsenm wrote:

Probably should drop the _ptr part of the name. This was more of a legacy issue 
in the intrinsic case, since the resource-as-vector case already took the name.

Also not sure if we should follow the naming convention of the instruction 
instead (probably the gfx12 one?).

Also should document these somewhere 

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented:

I'm wondering if we should really have all the different typed variants, and if 
this should be the name. I guess

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-21 Thread Matt Arsenault via cfe-commits


@@ -169,6 +180,11 @@
 // COMMON-UNSAFE-MATH-SAME: "-mlink-builtin-bitcode" 
"{{.*}}/amdgcn/bitcode/oclc_finite_only_off.bc"
 // COMMON-UNSAFE-MATH-SAME: "-mlink-builtin-bitcode" 
"{{.*}}/amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc"
 
+// ASAN-SAME: "-fsanitize=address"
+
+// NOASAN-NOT: "-fsanitize=address"
+// NOASAN-NOT: amdgcn/bitcode/asanrtl.bc

arsenm wrote:

Usual gripe about fragility of negative tests 

https://github.com/llvm/llvm-project/pull/96262
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-21 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/96262
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-21 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/96262
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-18 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/95396

>From f0f8e09caff2df5632d4252ca354b24c0c6f0e87 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 10 Jun 2024 19:48:13 +0200
Subject: [PATCH] AMDGPU: Remove ds atomic fadd intrinsics

These have been replaced with atomicrmw fadd
---
 clang/lib/CodeGen/CGBuiltin.cpp   |   2 +-
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   5 -
 llvm/lib/IR/AutoUpgrade.cpp   |  92 --
 llvm/lib/Target/AMDGPU/AMDGPUInstructions.td  |   1 -
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   3 -
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   3 +-
 .../Target/AMDGPU/AMDGPUSearchableTables.td   |   2 -
 .../AMDGPU/AMDGPUTargetTransformInfo.cpp  |   3 -
 llvm/lib/Target/AMDGPU/DSInstructions.td  |  10 -
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  15 -
 llvm/test/Bitcode/amdgcn-atomic.ll| 136 +
 .../AMDGPU/GlobalISel/fp-atomics-gfx940.ll|  55 
 .../AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll  | 125 +---
 .../AMDGPU/GlobalISel/llvm.amdgcn.ds.fadd.ll  | 279 --
 .../test/CodeGen/AMDGPU/fp-atomics-gfx1200.ll | 102 ---
 llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll |  13 +-
 .../CodeGen/AMDGPU/fp64-atomics-gfx90a.ll |  38 ++-
 llvm/test/CodeGen/AMDGPU/lds-atomic-fadd.ll   |  25 --
 18 files changed, 255 insertions(+), 654 deletions(-)
 delete mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ds.fadd.ll
 delete mode 100644 llvm/test/CodeGen/AMDGPU/lds-atomic-fadd.ll

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 08a89bd123d03..45625611773da 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -19084,7 +19084,7 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)),
   EmitScalarExpr(E->getArg(3)), AO, SSID);
 } else {
-  // The ds_fadd_* builtins do not have syncscope/order arguments.
+  // The ds_atomic_fadd_* builtins do not have syncscope/order arguments.
   SSID = llvm::SyncScope::System;
   AO = AtomicOrdering::SequentiallyConsistent;
 
diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 8a5566ae12020..7a5e919fe26e3 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -571,7 +571,6 @@ def int_amdgcn_ds_ordered_swap : AMDGPUDSOrderedIntrinsic;
 def int_amdgcn_ds_append : AMDGPUDSAppendConsumedIntrinsic;
 def int_amdgcn_ds_consume : AMDGPUDSAppendConsumedIntrinsic;
 
-def int_amdgcn_ds_fadd : AMDGPULDSIntrin;
 def int_amdgcn_ds_fmin : AMDGPULDSIntrin;
 def int_amdgcn_ds_fmax : AMDGPULDSIntrin;
 
@@ -2930,10 +2929,6 @@ multiclass AMDGPUMFp8SmfmacIntrinsic {
 // bf16 atomics use v2i16 argument since there is no bf16 data type in the 
llvm.
 def int_amdgcn_global_atomic_fadd_v2bf16 : AMDGPUAtomicRtn;
 def int_amdgcn_flat_atomic_fadd_v2bf16   : AMDGPUAtomicRtn;
-def int_amdgcn_ds_fadd_v2bf16 : DefaultAttrsIntrinsic<
-[llvm_v2i16_ty],
-[LLVMQualPointerType<3>, llvm_v2i16_ty],
-[IntrArgMemOnly, NoCapture>]>;
 
 defset list AMDGPUMFMAIntrinsics940 = {
 def int_amdgcn_mfma_i32_16x16x32_i8 : AMDGPUMfmaIntrinsic;
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 2f4b8351e747a..29310ad79ef70 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1033,6 +1033,12 @@ static bool upgradeIntrinsicFunction1(Function *F, 
Function *,
 break; // No other 'amdgcn.atomic.*'
   }
 
+  if (Name.starts_with("ds.fadd")) {
+// Replaced with atomicrmw fadd, so there's no new declaration.
+NewFn = nullptr;
+return true;
+  }
+
   if (Name.starts_with("ldexp.")) {
 // Target specific intrinsic became redundant
 NewFn = Intrinsic::getDeclaration(
@@ -2331,40 +2337,74 @@ static Value *upgradeARMIntrinsicCall(StringRef Name, 
CallBase *CI, Function *F,
   llvm_unreachable("Unknown function for ARM CallBase upgrade.");
 }
 
+// These are expected to have the arguments:
+// atomic.intrin (ptr, rmw_value, ordering, scope, isVolatile)
+//
+// Except for int_amdgcn_ds_fadd_v2bf16 which only has (ptr, rmw_value).
+//
 static Value *upgradeAMDGCNIntrinsicCall(StringRef Name, CallBase *CI,
  Function *F, IRBuilder<> ) {
-  const bool IsInc = Name.starts_with("atomic.inc.");
-  if (IsInc || Name.starts_with("atomic.dec.")) {
-if (CI->getNumOperands() != 6) // Malformed bitcode.
-  return nullptr;
+  AtomicRMWInst::BinOp RMWOp =
+  StringSwitch(Name)
+  .StartsWith("ds.fadd", AtomicRMWInst::FAdd)
+  .StartsWith("atomic.inc.", AtomicRMWInst::UIncWrap)
+  .StartsWith("atomic.dec.", AtomicRMWInst::UDecWrap);
+
+  unsigned NumOperands = CI->getNumOperands();
+  if (NumOperands < 

[clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-18 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/95396
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-18 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/95395
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-18 Thread Matt Arsenault via cfe-commits


@@ -33,6 +33,7 @@
 //  q -> Scalable vector, followed by the number of elements and the base type.
 //  Q -> target builtin type, followed by a character to distinguish the 
builtin type
 //Qa -> AArch64 svcount_t builtin type.
+//Qb -> AMDGPU __amdgpu_buffer_rsrc_t builtin type.

arsenm wrote:

True it's not testable, but it still logically belongs with the definition of 
the type 

https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-18 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/95395

>From 35c741fe2563094bc20c179ee9f244620025405c Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 10 Jun 2024 19:40:59 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins

We should have done this for the f32/f64 case a long time ago. Now that
codegen handles atomicrmw selection for the v2f16/v2bf16 case, start emitting
it instead.

This also does upgrade the behavior to respect a volatile qualified pointer,
which was previously ignored (for the cases that don't have an explicit
volatile argument).
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 113 +++---
 clang/test/CodeGenCUDA/builtins-amdgcn.cu |   2 +-
 .../test/CodeGenCUDA/builtins-spirv-amdgcn.cu |   2 +-
 .../builtins-unsafe-atomics-gfx90a.cu |   5 +-
 ...tins-unsafe-atomics-spirv-amdgcn-gfx90a.cu |   2 +-
 .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl  |  37 +-
 .../builtins-fp-atomics-gfx12.cl  |  14 ++-
 .../CodeGenOpenCL/builtins-fp-atomics-gfx8.cl |   9 +-
 .../builtins-fp-atomics-gfx90a.cl |   4 +-
 .../builtins-fp-atomics-gfx940.cl |  10 +-
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   3 +-
 11 files changed, 139 insertions(+), 62 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 511e1fd4016d7..d81cf40c912de 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18140,9 +18140,35 @@ void CodeGenFunction::ProcessOrderScopeAMDGCN(Value 
*Order, Value *Scope,
 break;
   }
 
+  // Some of the atomic builtins take the scope as a string name.
   StringRef scp;
-  llvm::getConstantStringInfo(Scope, scp);
-  SSID = getLLVMContext().getOrInsertSyncScopeID(scp);
+  if (llvm::getConstantStringInfo(Scope, scp)) {
+SSID = getLLVMContext().getOrInsertSyncScopeID(scp);
+return;
+  }
+
+  // Older builtins had an enum argument for the memory scope.
+  int scope = cast(Scope)->getZExtValue();
+  switch (scope) {
+  case 0: // __MEMORY_SCOPE_SYSTEM
+SSID = llvm::SyncScope::System;
+break;
+  case 1: // __MEMORY_SCOPE_DEVICE
+SSID = getLLVMContext().getOrInsertSyncScopeID("agent");
+break;
+  case 2: // __MEMORY_SCOPE_WRKGRP
+SSID = getLLVMContext().getOrInsertSyncScopeID("workgroup");
+break;
+  case 3: // __MEMORY_SCOPE_WVFRNT
+SSID = getLLVMContext().getOrInsertSyncScopeID("wavefront");
+break;
+  case 4: // __MEMORY_SCOPE_SINGLE
+SSID = llvm::SyncScope::SingleThread;
+break;
+  default:
+SSID = llvm::SyncScope::System;
+break;
+  }
 }
 
 llvm::Value *CodeGenFunction::EmitScalarOrConstFoldImmArg(unsigned 
ICEArguments,
@@ -18558,14 +18584,10 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_ds_faddf:
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
   case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: {
 Intrinsic::ID Intrin;
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_ds_faddf:
-  Intrin = Intrinsic::amdgcn_ds_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
   Intrin = Intrinsic::amdgcn_ds_fmin;
   break;
@@ -18656,35 +18678,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 llvm::Function *F = CGM.getIntrinsic(IID, {Addr->getType()});
 return Builder.CreateCall(F, {Addr, Val});
   }
-  case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f64:
-  case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16: {
-Intrinsic::ID IID;
-llvm::Type *ArgTy;
-switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
-  ArgTy = llvm::Type::getFloatTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_ds_fadd;
-  break;
-case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f64:
-  ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_ds_fadd;
-  break;
-case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_ds_fadd;
-  break;
-}
-llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
-llvm::Value *Val = EmitScalarExpr(E->getArg(1));
-llvm::Constant *ZeroI32 = llvm::ConstantInt::getIntegerValue(
-llvm::Type::getInt32Ty(getLLVMContext()), APInt(32, 0, true));
-llvm::Constant *ZeroI1 = llvm::ConstantInt::getIntegerValue(
-llvm::Type::getInt1Ty(getLLVMContext()), APInt(1, 0));
-llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
-return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
-  }
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   case 

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-18 Thread Matt Arsenault via cfe-commits


@@ -33,6 +33,7 @@
 //  q -> Scalable vector, followed by the number of elements and the base type.
 //  Q -> target builtin type, followed by a character to distinguish the 
builtin type
 //Qa -> AArch64 svcount_t builtin type.
+//Qb -> AMDGPU __amdgpu_buffer_rsrc_t builtin type.

arsenm wrote:

This belongs in the parent patch 

https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-18 Thread Matt Arsenault via cfe-commits


@@ -19082,6 +19082,15 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType});
 return Builder.CreateCall(F, {Arg});
   }
+  case AMDGPU::BI__builtin_amdgcn_make_buffer_rsrc: {
+llvm::Value *Base = EmitScalarExpr(E->getArg(0));

arsenm wrote:

I guess you need this just for the pointer type mangling. Can you abuse the 
CreateUnary* functions for this case? 

https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.

LGTM but I'm not a frontend expert 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,11 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -verify -triple amdgcn-amd-amdhsa -Wno-unused-value %s

arsenm wrote:

set explicit -cl-std, and check 1.2 and 2.0? 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -305,43 +304,43 @@ bool SIAnnotateControlFlow::handleLoop(BranchInst *Term) {
 }
 
 /// Close the last opened control flow
-bool SIAnnotateControlFlow::closeControlFlow(BasicBlock *BB) {
-  llvm::Loop *L = LI->getLoopFor(BB);
+bool SIAnnotateControlFlow::tryWaveReconverge(BasicBlock *BB) {
 
-  assert(Stack.back().first == BB);
+  if (succ_empty(BB))
+return false;
 
-  if (L && L->getHeader() == BB) {
-// We can't insert an EndCF call into a loop header, because it will
-// get executed on every iteration of the loop, when it should be
-// executed only once before the loop.
-SmallVector  Latches;
-L->getLoopLatches(Latches);
+  BranchInst *Term = dyn_cast(BB->getTerminator());
+  if (Term->getNumSuccessors() == 1) {
+// The current BBs single successor is a top of the stack. We need to
+// reconverge over thaqt path.
+BasicBlock *SingleSucc = *succ_begin(BB);
+BasicBlock::iterator InsPt = Term ? BasicBlock::iterator(Term) : BB->end();
 
-SmallVector Preds;
-for (BasicBlock *Pred : predecessors(BB)) {
-  if (!is_contained(Latches, Pred))
-Preds.push_back(Pred);
+if (isTopOfStack(SingleSucc)) {
+  Value *Exec = Stack.back().second;
+  IRBuilder<>(BB, InsPt).CreateCall(WaveReconverge, {Exec});
 }
-
-BB = SplitBlockPredecessors(BB, Preds, "endcf.split", DT, LI, nullptr,
-false);
-  }
-
-  Value *Exec = popSaved();
-  BasicBlock::iterator FirstInsertionPt = BB->getFirstInsertionPt();
-  if (!isa(Exec) && !isa(FirstInsertionPt)) {
-Instruction *ExecDef = cast(Exec);
-BasicBlock *DefBB = ExecDef->getParent();
-if (!DT->dominates(DefBB, BB)) {
-  // Split edge to make Def dominate Use
-  FirstInsertionPt = SplitEdge(DefBB, BB, DT, LI)->getFirstInsertionPt();
+  } else {
+// We have a uniform conditional branch terminating the block.
+// THis block may be the last in the Then path of the enclosing divergent

arsenm wrote:

Typo 'THis'

https://github.com/llvm/llvm-project/pull/92809
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1 @@
+remark: :0:0: removing function 'needs_extimg': +extended-image-insts 
is not supported on the current target

arsenm wrote:

accidentally added file? 

https://github.com/llvm/llvm-project/pull/92809
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -15740,6 +15740,32 @@ void 
SITargetLowering::finalizeLowering(MachineFunction ) const {
 }
   }
 
+  // ISel inserts copy to regs for the successor PHIs
+  // at the BB end. We need to move the SI_WAVE_RECONVERGE right before the

arsenm wrote:

Can you avoid this by gluing the pseudo to the root node? Also, I think you can 
avoid a second walk over the function by doing this in 
EmitInstrWithCustomInserter 

https://github.com/llvm/llvm-project/pull/92809
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -2103,12 +2103,36 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr ) 
const {
 MI.setDesc(get(AMDGPU::S_MOV_B64));
 break;
 
+  case AMDGPU::S_CMOV_B64_term:
+// This is only a terminator to get the correct spill code placement during
+// register allocation.
+MI.setDesc(get(AMDGPU::S_CMOV_B64));
+break;
+
   case AMDGPU::S_MOV_B32_term:
 // This is only a terminator to get the correct spill code placement during
 // register allocation.
 MI.setDesc(get(AMDGPU::S_MOV_B32));
 break;
 
+  case AMDGPU::S_CMOV_B32_term:
+// This is only a terminator to get the correct spill code placement during
+// register allocation.
+MI.setDesc(get(AMDGPU::S_CMOV_B32));
+break;
+
+  case AMDGPU::S_CSELECT_B32_term:
+// This is only a terminator to get the correct spill code placement during
+// register allocation.
+MI.setDesc(get(AMDGPU::S_CSELECT_B32));
+break;
+
+  case AMDGPU::S_CSELECT_B64_term:
+// This is only a terminator to get the correct spill code placement during
+// register allocation.
+MI.setDesc(get(AMDGPU::S_CSELECT_B64));
+break;

arsenm wrote:

Can you split out the low level operation handling like this into a separate 
PR? 

https://github.com/llvm/llvm-project/pull/92809
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -1,3 +1,4 @@
+; XFAIL: *

arsenm wrote:

can't just xfail tests 

https://github.com/llvm/llvm-project/pull/92809
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -3172,8 +3172,8 @@ def int_amdgcn_loop : Intrinsic<[llvm_i1_ty],
   [llvm_anyint_ty], [IntrWillReturn, IntrNoCallback, IntrNoFree]
 >;
 
-def int_amdgcn_end_cf : Intrinsic<[], [llvm_anyint_ty],
-  [IntrWillReturn, IntrNoCallback, IntrNoFree]>;
+def int_amdgcn_wave_reconverge : Intrinsic<[], [llvm_anyint_ty],

arsenm wrote:

Should document what this means 

https://github.com/llvm/llvm-project/pull/92809
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -15740,6 +15740,32 @@ void 
SITargetLowering::finalizeLowering(MachineFunction ) const {
 }
   }
 
+  // ISel inserts copy to regs for the successor PHIs
+  // at the BB end. We need to move the SI_WAVE_RECONVERGE right before the
+  // branch.
+  for (auto  : MF) {
+for (auto  : MBB) {
+  if (MI.getOpcode() == AMDGPU::SI_WAVE_RECONVERGE) {
+MachineBasicBlock::iterator I(MI);
+MachineBasicBlock::iterator Next = std::next(I);
+bool NeedToMove = false;
+while (Next != MBB.end() && !Next->isBranch()) {
+  NeedToMove = true;
+  Next++;
+}
+
+assert((Next == MBB.end() || !Next->readsRegister(AMDGPU::SCC, TRI)) &&
+   "Malformed CFG detected!\n");

arsenm wrote:

No newline in the string, this isn't real printing 

https://github.com/llvm/llvm-project/pull/92809
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/92809
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm requested changes to this pull request.

There are quite a few code quality regressions, and XFAILed tests. The 
description needs more elaboration on what the strategy is here 

https://github.com/llvm/llvm-project/pull/92809
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-17 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][CodeGen] Add query for a target's flat address space (PR #95728)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -1764,6 +1764,13 @@ class TargetInfo : public TransferrableTargetInfo,
 return 0;
   }
 
+  /// \returns Target specific flat ptr address space; a flat ptr is a ptr that
+  /// can be casted to / from all other target address spaces. If the target
+  /// exposes no such address space / does not care, we return 0.

arsenm wrote:

> here's no such thing as LangAS::Generic, there's LangAS::Default.

Right, so query the target address space map for that. 

The IR does not have the concept of a generic, flat, or no address space. 0 is 
the "default address space", which has its own specific properties. 

> this merely maintains the status quo, wherein everybody grabs an AS 0 pointer 
> and hopes for the best.

For the most part this isn't true, at least in llvm proper. 

>From the comment here, I'm not even sure what this is returning (either an IR 
>address space or a clang address space) 



https://github.com/llvm/llvm-project/pull/95728
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,86 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm 
-o - | FileCheck --check-prefix=OPENCL12 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL2.0 -emit-llvm 
-o - | FileCheck --check-prefix=OPENCL20 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL3.0 -emit-llvm 
-o - | FileCheck --check-prefix=OPENCL30 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL3.0 
-cl-ext=+__opencl_c_generic_address_space -emit-llvm -o - | FileCheck 
--check-prefix=OPENCL30-EXT %s
+
+// OPENCL12-LABEL: define dso_local ptr addrspace(5) @test1(
+// OPENCL12-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL12-NEXT:  [[ENTRY:.*:]]
+// OPENCL12-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, 
addrspace(5)
+// OPENCL12-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL12-NEXT:store ptr addrspace(5) [[TMP0]], ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL12-NEXT:[[TMP1:%.*]] = load ptr addrspace(5), ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL12-NEXT:ret ptr addrspace(5) [[TMP1]]
+//
+// OPENCL20-LABEL: define dso_local ptr @test1(
+// OPENCL20-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL20-NEXT:  [[ENTRY:.*:]]
+// OPENCL20-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL20-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL20-NEXT:[[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[TMP0]] to 
ptr
+// OPENCL20-NEXT:store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], align 
8
+// OPENCL20-NEXT:[[TMP2:%.*]] = load ptr, ptr addrspace(5) [[ALLOC_PTR]], 
align 8
+// OPENCL20-NEXT:ret ptr [[TMP2]]
+//
+// OPENCL30-LABEL: define dso_local ptr addrspace(5) @test1(
+// OPENCL30-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL30-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, 
addrspace(5)
+// OPENCL30-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL30-NEXT:store ptr addrspace(5) [[TMP0]], ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL30-NEXT:[[TMP1:%.*]] = load ptr addrspace(5), ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL30-NEXT:ret ptr addrspace(5) [[TMP1]]
+//
+// OPENCL30-EXT-LABEL: define dso_local ptr @test1(
+// OPENCL30-EXT-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL30-EXT-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-EXT-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL30-EXT-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, 
addrspace(5)
+// OPENCL30-EXT-NEXT:[[TMP1:%.*]] = addrspacecast ptr addrspace(5) 
[[TMP0]] to ptr
+// OPENCL30-EXT-NEXT:store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], 
align 8
+// OPENCL30-EXT-NEXT:[[TMP2:%.*]] = load ptr, ptr addrspace(5) 
[[ALLOC_PTR]], align 8
+// OPENCL30-EXT-NEXT:ret ptr [[TMP2]]
+//
+float* test1() {
+float* alloc_ptr = (float*)__builtin_alloca(32 * sizeof(int));

arsenm wrote:

Yes, this is fundamentally a stack / private pointer 

https://github.com/llvm/llvm-project/pull/95750
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,84 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature
+ // REQUIRES: amdgpu-registered-target
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde 
-emit-llvm -o - %s | FileCheck %s
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu tonga 
-emit-llvm -o - %s | FileCheck %s
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1100 
-emit-llvm -o - %s | FileCheck %s

arsenm wrote:

Not much point in testing all the subtargets here

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,84 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature
+ // REQUIRES: amdgpu-registered-target
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde 
-emit-llvm -o - %s | FileCheck %s
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu tonga 
-emit-llvm -o - %s | FileCheck %s
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1100 
-emit-llvm -o - %s | FileCheck %s
+
+typedef struct AA_ty {
+  int x;
+  __amdgcn_buffer_rsrc_t r;
+} AA;
+
+AA getAA(void *p);
+__amdgcn_buffer_rsrc_t getBufferImpl(void *p);
+void consumeBuffer(__amdgcn_buffer_rsrc_t);
+
+// CHECK-LABEL: define {{[^@]+}}@getBuffer

arsenm wrote:

I guess that covers it, but we don't get to see the return attributes now 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,9 @@
+

arsenm wrote:

Extra blank line 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,17 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -fsyntax-only -verify -std=gnu++11 -triple amdgcn 
-Wno-unused-value %s
+

arsenm wrote:

We probably want another similar sema test for OpenCL/HIP/OpenMP 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][CodeGen] Add query for a target's flat address space (PR #95728)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -1764,6 +1764,13 @@ class TargetInfo : public TransferrableTargetInfo,
 return 0;
   }
 
+  /// \returns Target specific flat ptr address space; a flat ptr is a ptr that
+  /// can be casted to / from all other target address spaces. If the target
+  /// exposes no such address space / does not care, we return 0.

arsenm wrote:

This further spreads the notion that 0 is a lack of address space, which it is 
not. You shouldn't need a new thing for this; it should be equivalent to 
querying the target address space map for the LangAS Generic thing 

https://github.com/llvm/llvm-project/pull/95728
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,86 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm 
-o - | FileCheck --check-prefix=OPENCL12 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL2.0 -emit-llvm 
-o - | FileCheck --check-prefix=OPENCL20 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL3.0 -emit-llvm 
-o - | FileCheck --check-prefix=OPENCL30 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL3.0 
-cl-ext=+__opencl_c_generic_address_space -emit-llvm -o - | FileCheck 
--check-prefix=OPENCL30-EXT %s
+
+// OPENCL12-LABEL: define dso_local ptr addrspace(5) @test1(
+// OPENCL12-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL12-NEXT:  [[ENTRY:.*:]]
+// OPENCL12-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, 
addrspace(5)
+// OPENCL12-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL12-NEXT:store ptr addrspace(5) [[TMP0]], ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL12-NEXT:[[TMP1:%.*]] = load ptr addrspace(5), ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL12-NEXT:ret ptr addrspace(5) [[TMP1]]
+//
+// OPENCL20-LABEL: define dso_local ptr @test1(
+// OPENCL20-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL20-NEXT:  [[ENTRY:.*:]]
+// OPENCL20-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL20-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL20-NEXT:[[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[TMP0]] to 
ptr
+// OPENCL20-NEXT:store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], align 
8
+// OPENCL20-NEXT:[[TMP2:%.*]] = load ptr, ptr addrspace(5) [[ALLOC_PTR]], 
align 8
+// OPENCL20-NEXT:ret ptr [[TMP2]]
+//
+// OPENCL30-LABEL: define dso_local ptr addrspace(5) @test1(
+// OPENCL30-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL30-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, 
addrspace(5)
+// OPENCL30-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL30-NEXT:store ptr addrspace(5) [[TMP0]], ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL30-NEXT:[[TMP1:%.*]] = load ptr addrspace(5), ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL30-NEXT:ret ptr addrspace(5) [[TMP1]]
+//
+// OPENCL30-EXT-LABEL: define dso_local ptr @test1(
+// OPENCL30-EXT-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL30-EXT-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-EXT-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL30-EXT-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, 
addrspace(5)
+// OPENCL30-EXT-NEXT:[[TMP1:%.*]] = addrspacecast ptr addrspace(5) 
[[TMP0]] to ptr
+// OPENCL30-EXT-NEXT:store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], 
align 8
+// OPENCL30-EXT-NEXT:[[TMP2:%.*]] = load ptr, ptr addrspace(5) 
[[ALLOC_PTR]], align 8
+// OPENCL30-EXT-NEXT:ret ptr [[TMP2]]
+//
+float* test1() {
+float* alloc_ptr = (float*)__builtin_alloca(32 * sizeof(int));
+return alloc_ptr;
+}
+
+// OPENCL12-LABEL: define dso_local void @test2(
+// OPENCL12-SAME: ) #[[ATTR0]] {
+// OPENCL12-NEXT:  [[ENTRY:.*:]]
+// OPENCL12-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, 
addrspace(5)
+// OPENCL12-NEXT:[[TMP0:%.*]] = alloca i8, i64 28, align 8, addrspace(5)
+// OPENCL12-NEXT:store ptr addrspace(5) [[TMP0]], ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL12-NEXT:ret void
+//
+// OPENCL20-LABEL: define dso_local void @test2(
+// OPENCL20-SAME: ) #[[ATTR0]] {
+// OPENCL20-NEXT:  [[ENTRY:.*:]]
+// OPENCL20-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL20-NEXT:[[TMP0:%.*]] = alloca i8, i64 28, align 8, addrspace(5)
+// OPENCL20-NEXT:[[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[TMP0]] to 
ptr
+// OPENCL20-NEXT:store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], align 
8
+// OPENCL20-NEXT:ret void
+//
+// OPENCL30-LABEL: define dso_local void @test2(
+// OPENCL30-SAME: ) #[[ATTR0]] {
+// OPENCL30-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, 
addrspace(5)
+// OPENCL30-NEXT:[[TMP0:%.*]] = alloca i8, i64 28, align 8, addrspace(5)
+// OPENCL30-NEXT:store ptr addrspace(5) [[TMP0]], ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL30-NEXT:ret void
+//
+// OPENCL30-EXT-LABEL: define dso_local void @test2(
+// OPENCL30-EXT-SAME: ) #[[ATTR0]] {
+// OPENCL30-EXT-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-EXT-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL30-EXT-NEXT:[[TMP0:%.*]] = alloca i8, i64 28, align 8, 
addrspace(5)
+// OPENCL30-EXT-NEXT:[[TMP1:%.*]] = addrspacecast ptr addrspace(5) 
[[TMP0]] to ptr
+// OPENCL30-EXT-NEXT:store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], 
align 8
+// OPENCL30-EXT-NEXT:ret void
+//
+void test2() {
+void *alloc_ptr = __builtin_alloca(28);


[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,86 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm 
-o - | FileCheck --check-prefix=OPENCL12 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL2.0 -emit-llvm 
-o - | FileCheck --check-prefix=OPENCL20 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL3.0 -emit-llvm 
-o - | FileCheck --check-prefix=OPENCL30 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL3.0 
-cl-ext=+__opencl_c_generic_address_space -emit-llvm -o - | FileCheck 
--check-prefix=OPENCL30-EXT %s
+
+// OPENCL12-LABEL: define dso_local ptr addrspace(5) @test1(
+// OPENCL12-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL12-NEXT:  [[ENTRY:.*:]]
+// OPENCL12-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, 
addrspace(5)
+// OPENCL12-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL12-NEXT:store ptr addrspace(5) [[TMP0]], ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL12-NEXT:[[TMP1:%.*]] = load ptr addrspace(5), ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL12-NEXT:ret ptr addrspace(5) [[TMP1]]
+//
+// OPENCL20-LABEL: define dso_local ptr @test1(
+// OPENCL20-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL20-NEXT:  [[ENTRY:.*:]]
+// OPENCL20-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL20-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL20-NEXT:[[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[TMP0]] to 
ptr
+// OPENCL20-NEXT:store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], align 
8
+// OPENCL20-NEXT:[[TMP2:%.*]] = load ptr, ptr addrspace(5) [[ALLOC_PTR]], 
align 8
+// OPENCL20-NEXT:ret ptr [[TMP2]]
+//
+// OPENCL30-LABEL: define dso_local ptr addrspace(5) @test1(
+// OPENCL30-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL30-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, 
addrspace(5)
+// OPENCL30-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL30-NEXT:store ptr addrspace(5) [[TMP0]], ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL30-NEXT:[[TMP1:%.*]] = load ptr addrspace(5), ptr addrspace(5) 
[[ALLOC_PTR]], align 4
+// OPENCL30-NEXT:ret ptr addrspace(5) [[TMP1]]
+//
+// OPENCL30-EXT-LABEL: define dso_local ptr @test1(
+// OPENCL30-EXT-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL30-EXT-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-EXT-NEXT:[[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL30-EXT-NEXT:[[TMP0:%.*]] = alloca i8, i64 128, align 8, 
addrspace(5)
+// OPENCL30-EXT-NEXT:[[TMP1:%.*]] = addrspacecast ptr addrspace(5) 
[[TMP0]] to ptr
+// OPENCL30-EXT-NEXT:store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], 
align 8
+// OPENCL30-EXT-NEXT:[[TMP2:%.*]] = load ptr, ptr addrspace(5) 
[[ALLOC_PTR]], align 8
+// OPENCL30-EXT-NEXT:ret ptr [[TMP2]]
+//
+float* test1() {
+float* alloc_ptr = (float*)__builtin_alloca(32 * sizeof(int));

arsenm wrote:

Should test with __private qualified pointer, we want to see that there is no 
cast 

https://github.com/llvm/llvm-project/pull/95750
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-15 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/95395
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

arsenm wrote:

My worry is this test won't get updated whenever that's fixed. How about you 
leave the existing checks, but add an additional "not --crash" run line that 
checks the verifier error? 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,21 @@
+//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+//  This file defines various AMDGPU builtin types.
+//
+//===--===//
+
+#ifndef AMDGPU_OPAQUE_PTR_TYPE
+#define AMDGPU_OPAQUE_PTR_TYPE(Name, MangledName, AS, Width, Align, Id, 
SingletonId) \
+  AMDGPU_TYPE(Name, Id, SingletonId)
+#endif
+
+AMDGPU_OPAQUE_PTR_TYPE("__amdgcn_buffer_rsrc_t", "__amdgcn_buffer_rsrc_t", 8, 
128, 128, AMDGPUBufferRsrc, AMDGPUBufferRsrcTy)

arsenm wrote:

Things are gradually moving to use amdgpu 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,84 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature
+ // REQUIRES: amdgpu-registered-target
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde 
-emit-llvm -o - %s | FileCheck %s
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu tonga 
-emit-llvm -o - %s | FileCheck %s
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1100 
-emit-llvm -o - %s | FileCheck %s
+
+typedef struct AA_ty {
+  int x;
+  __amdgcn_buffer_rsrc_t r;
+} AA;
+
+AA getAA(void *p);
+__amdgcn_buffer_rsrc_t getBufferImpl(void *p);
+void consumeBuffer(__amdgcn_buffer_rsrc_t);
+
+// CHECK-LABEL: define {{[^@]+}}@getBuffer

arsenm wrote:

This is not showing the return type 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,21 @@
+//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+//  This file defines various AMDGPU builtin types.
+//
+//===--===//
+
+#ifndef AMDGPU_OPAQUE_PTR_TYPE
+#define AMDGPU_OPAQUE_PTR_TYPE(Name, MangledName, AS, Width, Align, Id, 
SingletonId) \
+  AMDGPU_TYPE(Name, Id, SingletonId)
+#endif
+
+AMDGPU_OPAQUE_PTR_TYPE("__amdgcn_buffer_rsrc_t", "__amdgcn_buffer_rsrc_t", 8, 
128, 128, AMDGPUBufferRsrc, AMDGPUBufferRsrcTy)

arsenm wrote:

Should use amdgpu not amdgcn 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

arsenm wrote:

I thought you switched to using the intrinsics such that this would avoid the 
verifier error. This test is going to fail if you are still hitting the error 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits


@@ -6129,13 +6150,55 @@ static SDValue lowerLaneOp(const SITargetLowering , 
SDNode *N,
   if (ValSize % 32 != 0)
 return SDValue();
 
+  auto unrollLaneOp = [, ](SDNode *N) -> SDValue {
+EVT VT = N->getValueType(0);
+unsigned NE = VT.getVectorNumElements();
+EVT EltVT = VT.getVectorElementType();
+SmallVector Scalars;
+unsigned NumOperands = N->getNumOperands();
+SmallVector Operands(NumOperands);
+SDNode *GL = N->getGluedNode();
+
+if (GL) {
+  // only handle convegrencectrl_glue
+  assert(GL->getOpcode() == ISD::CONVERGENCECTRL_GLUE);
+}

arsenm wrote:

Typo convegrencectrl_glue. Also assert(!GL || ...)

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,65 @@
+; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 
-verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s
+
+; CHECK-LABEL: name:basic_readfirstlane_i64
+;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR

arsenm wrote:

We don't need these isel checks for the tokens if the IR verifier is going to 
catch the breakage. This test can jus be merged with the existing intrinsic 
tests 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-13 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,69 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+ // REQUIRES: amdgpu-registered-target
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde 
-emit-llvm -o - %s | FileCheck %s
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu tonga 
-emit-llvm -o - %s | FileCheck %s
+ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1100 
-emit-llvm -o - %s | FileCheck %s
+
+typedef struct AA_ty {
+  int x;
+  __amdgcn_buffer_rsrc_t r;
+} AA;
+
+AA getAA(void *p);
+__amdgcn_buffer_rsrc_t getBuffer(void *p);
+void consumeBuffer(__amdgcn_buffer_rsrc_t);
+
+// CHECK-LABEL: @consumeBufferPtr(

arsenm wrote:

The interesting check here is the IR type in the function signature which are 
missing. Also should test it as a return value (and return value in a struct)

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [lld] [lldb] [llvm] [mlir] [openmp] [llvm-project] Fix typo "seperate" (PR #95373)

2024-06-13 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/95373
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Offload] Introduce the concept of "default streams" (PR #95371)

2024-06-13 Thread Matt Arsenault via cfe-commits


@@ -1125,6 +1125,22 @@ void Clang::AddPreprocessingOptions(Compilation , 
const JobAction ,
 CmdArgs.push_back("__clang_openmp_device_functions.h");
   }
 
+  if (Args.hasArg(options::OPT_foffload_via_llvm)) {
+// Add llvm_wrappers/* to our system include path.  This lets us wrap
+// standard library headers and other headers.
+SmallString<128> P(D.ResourceDir);
+llvm::sys::path::append(P, "include");
+llvm::sys::path::append(P, "llvm_offload_wrappers");
+CmdArgs.push_back("-internal-isystem");
+CmdArgs.push_back(Args.MakeArgString(P));
+
+CmdArgs.push_back("-include");
+if (JA.isDeviceOffloading(Action::OFK_OpenMP))
+  CmdArgs.push_back("__llvm_offload_device.h");
+else
+  CmdArgs.push_back("__llvm_offload_host.h");

arsenm wrote:

Push pack select of string name? 

https://github.com/llvm/llvm-project/pull/95371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Offload] Introduce the concept of "default streams" (PR #95371)

2024-06-13 Thread Matt Arsenault via cfe-commits


@@ -1125,6 +1125,22 @@ void Clang::AddPreprocessingOptions(Compilation , 
const JobAction ,
 CmdArgs.push_back("__clang_openmp_device_functions.h");
   }
 
+  if (Args.hasArg(options::OPT_foffload_via_llvm)) {
+// Add llvm_wrappers/* to our system include path.  This lets us wrap
+// standard library headers and other headers.
+SmallString<128> P(D.ResourceDir);
+llvm::sys::path::append(P, "include");
+llvm::sys::path::append(P, "llvm_offload_wrappers");

arsenm wrote:

I think append is variadic and you can do both pieces in one call 

https://github.com/llvm/llvm-project/pull/95371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-13 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Just a note - and maybe this was already discussed above - is there good 
> reason not to explicitly make this type a 128-bit scalar? The LLVM data 
> layout already does this

I thought this was the 160 bit version? 

Can we have an opaque-but-sized type? The concern is exposing this as a raw 
pointer right away 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-13 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,95 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
verde -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
tonga -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
gfx1100 -emit-llvm -o - %s | FileCheck %s
+
+// CHECK-LABEL: @test_amdgcn_make_buffer_rsrc_p0(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call ptr addrspace(8) 
@llvm.amdgcn.make.buffer.rsrc.p0(ptr [[P:%.*]], i16 [[STRIDE:%.*]], i32 
[[NUM:%.*]], i32 [[FLAGS:%.*]])
+// CHECK-NEXT:ret ptr addrspace(8) [[TMP0]]
+//
+__buffer_rsrc_t test_amdgcn_make_buffer_rsrc_p0(void *p, short stride, int 
num, int flags) {
+  return __builtin_amdgcn_make_buffer_rsrc(p, stride, num, flags);

arsenm wrote:

The context there is slightly different. They are using the raw descriptor, 
which is addrspace(7), not 8. But the same stands, it probably should be 
passable around as a value 

https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-13 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> I understand the chance of conflict is low. It may be like the chance of 
> hitting by a meteor. However, if we prefix with `__amdgcn_`, there is no such 
> risk. And we have the benefit to clearly indicate it is a amdgcn 
> target-specific type.

Should use amdgpu 

https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,95 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
verde -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
tonga -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
gfx1100 -emit-llvm -o - %s | FileCheck %s
+
+// CHECK-LABEL: @test_amdgcn_make_buffer_rsrc_p0(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call ptr addrspace(8) 
@llvm.amdgcn.make.buffer.rsrc.p0(ptr [[P:%.*]], i16 [[STRIDE:%.*]], i32 
[[NUM:%.*]], i32 [[FLAGS:%.*]])
+// CHECK-NEXT:ret ptr addrspace(8) [[TMP0]]
+//
+__buffer_rsrc_t test_amdgcn_make_buffer_rsrc_p0(void *p, short stride, int 
num, int flags) {
+  return __builtin_amdgcn_make_buffer_rsrc(p, stride, num, flags);

arsenm wrote:

The alternative would be to have a builtin address space fat pointer, but I 
assume that will be substantially more implementation work. 

https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,14 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm 
-o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu tonga -emit-llvm 
-o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1100 
-emit-llvm -o - %s | FileCheck %s
+
+// CHECK-LABEL: @test_amdgcn_make_buffer_rsrc(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call ptr addrspace(8) 
@llvm.amdgcn.make.buffer.rsrc.p5(ptr addrspace(5) [[P:%.*]], i16 4, i32 
[[NUM:%.*]], i32 [[FLAGS:%.*]])
+// CHECK-NEXT:ret ptr addrspace(8) [[TMP0]]
+//
+ __buffer_rsrc_t test_amdgcn_make_buffer_rsrc(void *p, int num, int flags) {
+   return __builtin_amdgcn_make_buffer_rsrc(p, 4, num, flags);

arsenm wrote:

Test more combinations of arguments. All arguments can be variable, but you 
have a constant here 

Also test with different address space pointers. We also probably should have a 
HIP test. Ideally we would error for private/local inputs.

Also test null and literal pointer inputs? 

https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-12 Thread Matt Arsenault via cfe-commits


@@ -128,12 +128,13 @@ enum class CudaArch {
   GFX12_GENERIC,
   GFX1200,
   GFX1201,
+  AMDGCNSPIRV,
   Generic, // A processor model named 'generic' if the target backend defines a
// public one.
   LAST,
 
   CudaDefault = CudaArch::SM_52,
-  HIPDefault = CudaArch::GFX906,
+  HIPDefault = CudaArch::AMDGCNSPIRV,

arsenm wrote:

Behavior change should definitely be done separately 

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Or drop the new nodes altogether and legelaize to intrinsics directly ?

That's another option. The only real plus to the intermediate is it's slightly 
less annoying to write combines for. But there are limited combining 
opportunities for these 



https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,46 @@
+# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o 
/dev/null %s 2>&1 | FileCheck %s

arsenm wrote:

I'd still test all 3, but yes an IR test 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,46 @@
+# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o 
/dev/null %s 2>&1 | FileCheck %s

arsenm wrote:

You should not need to introduce any new machine verifier tests, they are not 
useful. The useful test would be the IR that produces the verifier error, which 
ideally would be fixed 

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits


@@ -2201,6 +2207,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const 
{
 Align = 8; 
\
 break;
 #include "clang/Basic/WebAssemblyReferenceTypes.def"
+case BuiltinType::AMDGPUBufferRsrc:
+  Width = 0;
+  Align = 256;

arsenm wrote:

If we expose this as a pointer there are a lot of sharp edges we need to fix. 
If we expose it as an opaque type for builtins, the broken cases are hidden 
from end users. 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,11 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wno-unused-value %s
+
+void foo() {
+  int n = 100;
+  __buffer_rsrc_t v = 0; // expected-error {{cannot initialize a variable of 
type '__buffer_rsrc_t' with an rvalue of type 'int'}}
+  static_cast<__buffer_rsrc_t>(n); // expected-error {{static_cast from 'int' 
to '__buffer_rsrc_t' is not allowed}}
+  dynamic_cast<__buffer_rsrc_t>(n); // expected-error {{invalid target type 
'__buffer_rsrc_t' for dynamic_cast; target type must be a reference or pointer 
type to a defined class}}
+  reinterpret_cast<__buffer_rsrc_t>(n); // expected-error {{reinterpret_cast 
from 'int' to '__buffer_rsrc_t' is not allowed}}
+  int c(v); // expected-error {{cannot initialize a variable of type 'int' 
with an lvalue of type '__buffer_rsrc_t'}}
+}

arsenm wrote:

Also test some c style casts to other pointer types 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,11 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wno-unused-value %s
+
+void foo() {
+  int n = 100;
+  __buffer_rsrc_t v = 0; // expected-error {{cannot initialize a variable of 
type '__buffer_rsrc_t' with an rvalue of type 'int'}}
+  static_cast<__buffer_rsrc_t>(n); // expected-error {{static_cast from 'int' 
to '__buffer_rsrc_t' is not allowed}}
+  dynamic_cast<__buffer_rsrc_t>(n); // expected-error {{invalid target type 
'__buffer_rsrc_t' for dynamic_cast; target type must be a reference or pointer 
type to a defined class}}
+  reinterpret_cast<__buffer_rsrc_t>(n); // expected-error {{reinterpret_cast 
from 'int' to '__buffer_rsrc_t' is not allowed}}
+  int c(v); // expected-error {{cannot initialize a variable of type 'int' 
with an lvalue of type '__buffer_rsrc_t'}}
+}

arsenm wrote:

What about sizeof / alignof? 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits


@@ -2200,6 +2206,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const 
{
 Align = 8; 
\
 break;
 #include "clang/Basic/WebAssemblyReferenceTypes.def"
+case BuiltinType::AMDGPUBufferRsrc:
+  Width = 128;
+  Align = 128;

arsenm wrote:

Sizeless object is probably the safest option for now. It shouldn't be a 
vector. I foresee more difficulty introducing a wide pointer, and if we change 
our minds on how fat pointers should work it's harder to change the size later 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Offload][CUDA] Add initial cuda_runtime.h overlay (PR #94821)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,30 @@
+// RUN: %clang++ -foffload-via-llvm --offload-arch=native %s -o %t
+// RUN: %t | %fcheck-generic
+
+// UNSUPPORTED: aarch64-unknown-linux-gnu
+// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
+// UNSUPPORTED: x86_64-pc-linux-gnu
+// UNSUPPORTED: x86_64-pc-linux-gnu-LTO

arsenm wrote:

Better to enable supported cases? 

https://github.com/llvm/llvm-project/pull/94821
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,9 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -fclang-abi-compat=latest -triple amdgcn %s -emit-llvm -o - 
| FileCheck %s

arsenm wrote:

Why do you need -fclang-abi-compat=latest

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,21 @@
+//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+//  This file defines various AMDGPU builtin types.
+//
+//===--===//
+
+#ifndef AMDGPU_OPAQUE_TYPE
+#define AMDGPU_OPAQUE_TYPE(Name, MangledName, Id, SingletonId) 
\
+  AMDGPU_TYPE(Name, Id, SingletonId)
+#endif
+
+AMDGPU_OPAQUE_TYPE("__buffer_rsrc_t", "__buffer_rsrc_t", AMDGPUBufferRsrc, 
AMDGPUBufferRsrcTy)

arsenm wrote:

Should it include an amdgpu prefix? 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -2200,6 +2206,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const 
{
 Align = 8; 
\
 break;
 #include "clang/Basic/WebAssemblyReferenceTypes.def"
+case BuiltinType::AMDGPUBufferRsrc:
+  Width = 128;
+  Align = 128;

arsenm wrote:

If we were exposing the pointer, it would be 160/192 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented:

Need stacked PR that adds the make_buffer_rsrc builtin that shows its use 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -1091,6 +1091,9 @@ enum PredefinedTypeIDs {
 // \brief WebAssembly reference types with auto numeration
 #define WASM_TYPE(Name, Id, SingletonId) PREDEF_TYPE_##Id##_ID,
 #include "clang/Basic/WebAssemblyReferenceTypes.def"
+// \breif AMDGPU types with auto numeration

arsenm wrote:

Typo breif

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -16055,6 +16145,90 @@ of the two arguments. -0.0 is considered to be less 
than +0.0 for this
 intrinsic. Note that these are the semantics specified in the draft of
 IEEE 754-2019.
 
+.. _i_minimumnum:
+
+'``llvm.minimumnum.*``' Intrinsic
+^
+
+Syntax:
+"""
+
+This is an overloaded intrinsic. You can use ``llvm.minimumnum`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
+::
+
+  declare float @llvm.minimumnum.f32(float %Val0, float %Val1)
+  declare double@llvm.minimumnum.f64(double %Val0, double %Val1)
+  declare x86_fp80  @llvm.minimumnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
+  declare fp128 @llvm.minimumnum.f128(fp128 %Val0, fp128 %Val1)
+  declare ppc_fp128 @llvm.minimumnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 
%Val1)
+
+Overview:
+"
+
+The '``llvm.minimumnum.*``' intrinsics return the minimum of the two
+arguments, not propagating NaNs and treating -0.0 as less than +0.0.
+
+
+Arguments:
+""
+
+The arguments and return value are floating-point numbers of the same
+type.
+
+Semantics:
+""
+If both operands are NaNs (including sNaN), returns qNaN. If one operand
+is NaN (including sNaN) and another operand is a number, return the number.
+Otherwise returns the lesser of the two arguments. -0.0 is considered to
+be less than +0.0 for this intrinsic.
+
+Note that these are the semantics of minimumNumber specified in IEEE 754-2019.
+
+.. _i_maximumnum:
+
+'``llvm.maximumnum.*``' Intrinsic
+^
+
+Syntax:
+"""
+
+This is an overloaded intrinsic. You can use ``llvm.maximumnum`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
+::
+
+  declare float @llvm.maximumnum.f32(float %Val0, float %Val1)
+  declare double@llvm.maximumnum.f64(double %Val0, double %Val1)
+  declare x86_fp80  @llvm.maximumnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
+  declare fp128 @llvm.maximumnum.f128(fp128 %Val0, fp128 %Val1)
+  declare ppc_fp128 @llvm.maximumnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 
%Val1)
+
+Overview:
+"
+
+The '``llvm.maximumnum.*``' intrinsics return the maximum of the two
+arguments, not propagating NaNs and treating -0.0 as less than +0.0.
+
+
+Arguments:
+""
+
+The arguments and return value are floating-point numbers of the same
+type.
+
+Semantics:
+""
+If both operands are NaNs (including sNaN), returns qNaN. If one operand
+is NaN (including sNaN) and another operand is a number, return the number.
+Otherwise returns the greater of the two arguments. -0.0 is considered to
+be less than +0.0 for this intrinsic.
+
+Note that these are the semantics of minimumNumber specified in IEEE 754-2019.

arsenm wrote:

Copy paste error minimumNumber. Also like above, this is not the signaling nan 
behavior (where the behavior inverts from quiet nan) 

https://github.com/llvm/llvm-project/pull/93841
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -16055,6 +16145,90 @@ of the two arguments. -0.0 is considered to be less 
than +0.0 for this
 intrinsic. Note that these are the semantics specified in the draft of
 IEEE 754-2019.
 
+.. _i_minimumnum:
+
+'``llvm.minimumnum.*``' Intrinsic
+^
+
+Syntax:
+"""
+
+This is an overloaded intrinsic. You can use ``llvm.minimumnum`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
+::
+
+  declare float @llvm.minimumnum.f32(float %Val0, float %Val1)
+  declare double@llvm.minimumnum.f64(double %Val0, double %Val1)
+  declare x86_fp80  @llvm.minimumnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
+  declare fp128 @llvm.minimumnum.f128(fp128 %Val0, fp128 %Val1)
+  declare ppc_fp128 @llvm.minimumnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 
%Val1)
+
+Overview:
+"
+
+The '``llvm.minimumnum.*``' intrinsics return the minimum of the two
+arguments, not propagating NaNs and treating -0.0 as less than +0.0.
+
+
+Arguments:
+""
+
+The arguments and return value are floating-point numbers of the same
+type.
+
+Semantics:
+""
+If both operands are NaNs (including sNaN), returns qNaN. If one operand
+is NaN (including sNaN) and another operand is a number, return the number.
+Otherwise returns the lesser of the two arguments. -0.0 is considered to
+be less than +0.0 for this intrinsic.
+
+Note that these are the semantics of minimumNumber specified in IEEE 754-2019.
+
+.. _i_maximumnum:
+
+'``llvm.maximumnum.*``' Intrinsic
+^
+
+Syntax:
+"""
+
+This is an overloaded intrinsic. You can use ``llvm.maximumnum`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
+::
+
+  declare float @llvm.maximumnum.f32(float %Val0, float %Val1)
+  declare double@llvm.maximumnum.f64(double %Val0, double %Val1)
+  declare x86_fp80  @llvm.maximumnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
+  declare fp128 @llvm.maximumnum.f128(fp128 %Val0, fp128 %Val1)
+  declare ppc_fp128 @llvm.maximumnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 
%Val1)
+
+Overview:
+"
+
+The '``llvm.maximumnum.*``' intrinsics return the maximum of the two
+arguments, not propagating NaNs and treating -0.0 as less than +0.0.
+
+
+Arguments:
+""
+
+The arguments and return value are floating-point numbers of the same
+type.
+
+Semantics:
+""
+If both operands are NaNs (including sNaN), returns qNaN. If one operand
+is NaN (including sNaN) and another operand is a number, return the number.
+Otherwise returns the greater of the two arguments. -0.0 is considered to
+be less than +0.0 for this intrinsic.
+
+Note that these are the semantics of minimumNumber specified in IEEE 754-2019.

arsenm wrote:

Should also state the explicit difference here, on the intrinsic, the 
comparison to minnum 

https://github.com/llvm/llvm-project/pull/93841
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -15874,6 +15874,96 @@ The returned value is completely identical to the 
input except for the sign bit;
 in particular, if the input is a NaN, then the quiet/signaling bit and payload
 are perfectly preserved.
 
+.. _i_fminmax_family:
+
+'``llvm.min.*``' Intrinsics Comparation
+^^^
+
+Standard:
+"
+
+IEEE754 and ISO C define some min/max operations, and they have some 
differences
+on working with qNaN/sNaN and +0.0/-0.0. Here is the list:
+
+.. list-table::
+   :header-rows: 2
+
+   * - ``ISO C``
+ - fmin/fmax
+ - none
+ - fmininum/fmaximum
+ - fminimum_num/fmaximum_num
+
+   * - ``IEEE754``
+ - none
+ - nimNUM/maxNUM (2008)
+ - minimum/maximum (2019)
+ - minimumNumber/maximumNumber (2019)
+
+   * - ``+0.0 vs -0.0``
+ - either one
+ - +0.0 > -0.0
+ - +0.0 > -0.0
+ - +0.0 > -0.0
+
+   * - ``NUM vs sNaN``
+ - qNaN, invalid excpetion
+ - qNaN, invalid excpetion
+ - qNaN, invalid excpetion
+ - NUM, invalid excpetion

arsenm wrote:

Typo exception throughout 

https://github.com/llvm/llvm-project/pull/93841
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -15874,6 +15874,96 @@ The returned value is completely identical to the 
input except for the sign bit;
 in particular, if the input is a NaN, then the quiet/signaling bit and payload
 are perfectly preserved.
 
+.. _i_fminmax_family:
+
+'``llvm.min.*``' Intrinsics Comparation
+^^^
+
+Standard:
+"
+
+IEEE754 and ISO C define some min/max operations, and they have some 
differences
+on working with qNaN/sNaN and +0.0/-0.0. Here is the list:
+
+.. list-table::
+   :header-rows: 2
+
+   * - ``ISO C``
+ - fmin/fmax
+ - none
+ - fmininum/fmaximum
+ - fminimum_num/fmaximum_num
+
+   * - ``IEEE754``
+ - none
+ - nimNUM/maxNUM (2008)
+ - minimum/maximum (2019)
+ - minimumNumber/maximumNumber (2019)
+
+   * - ``+0.0 vs -0.0``
+ - either one
+ - +0.0 > -0.0
+ - +0.0 > -0.0
+ - +0.0 > -0.0
+
+   * - ``NUM vs sNaN``
+ - qNaN, invalid excpetion
+ - qNaN, invalid excpetion
+ - qNaN, invalid excpetion
+ - NUM, invalid excpetion
+
+   * - ``NUM vs qNaN``
+ - NUM, no excpetion
+ - NUM, no excpetion
+ - qNaN, no excpetion
+ - NUM, no excpetion
+

arsenm wrote:

cover nan vs. nan case 

https://github.com/llvm/llvm-project/pull/93841
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -16055,6 +16145,90 @@ of the two arguments. -0.0 is considered to be less 
than +0.0 for this
 intrinsic. Note that these are the semantics specified in the draft of
 IEEE 754-2019.
 
+.. _i_minimumnum:
+
+'``llvm.minimumnum.*``' Intrinsic
+^
+
+Syntax:
+"""
+
+This is an overloaded intrinsic. You can use ``llvm.minimumnum`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
+::
+
+  declare float @llvm.minimumnum.f32(float %Val0, float %Val1)
+  declare double@llvm.minimumnum.f64(double %Val0, double %Val1)
+  declare x86_fp80  @llvm.minimumnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
+  declare fp128 @llvm.minimumnum.f128(fp128 %Val0, fp128 %Val1)
+  declare ppc_fp128 @llvm.minimumnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 
%Val1)
+
+Overview:
+"
+
+The '``llvm.minimumnum.*``' intrinsics return the minimum of the two
+arguments, not propagating NaNs and treating -0.0 as less than +0.0.
+
+
+Arguments:
+""
+
+The arguments and return value are floating-point numbers of the same
+type.
+
+Semantics:
+""
+If both operands are NaNs (including sNaN), returns qNaN. If one operand
+is NaN (including sNaN) and another operand is a number, return the number.
+Otherwise returns the lesser of the two arguments. -0.0 is considered to
+be less than +0.0 for this intrinsic.
+
+Note that these are the semantics of minimumNumber specified in IEEE 754-2019.

arsenm wrote:

This is not the IEEE semantics for signaling nan. For a signaling nan, returns 
a quiet nan, not the non-non operand 

https://github.com/llvm/llvm-project/pull/93841
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> "aggregates" here might even be unusual cases like `<4 x i8>`

Vectors aren't aggregates and are more reasonable 

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> `voffset` and `soffset` are "offset that goes in VGPRs" and "offset that goes 
> in SGPRs", with the latter having some different bounds-checking semantics on 
> ... at least some of the gfx9's, IIRC.
> 

Right, that's the problem. We need to know the parameters of the SRD in order 
to make use of the scalar offset. Ideally we would have one pointer operand and 
be able to addressing mode match into soffset/voffset/imm 

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> 2. What I mean is that "types that work" isn't the right framing: any type 
> can be legalized to one or more types that work. That is, down in the isel 
> legalizer, if I call for, for example
>```llvm
>%0 = call {i64, i64, i8} @llvm.amdgcn.raw.buffer.ptr.load(ptr addrspace(8) 
> %rsrc, i32 %off, ...)
>```

Handling arbitrary aggregates here isn't really reasonable or necessary. We can 
restrict this to a reasonable set of legal-ish types 


https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> 1. For the swizzled case, that's `struct.ptr.buffer.*`, and yeah, those will 
> always need builtins because LLVM can't deal in 2D addressing schemes

But the raw buffer intrinsics have both the soffset and voffset parameters 
though? Not just the struct 



https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Actually, even ignoring address space 7, it feels like these builtins if you 
> could `raw.ptr.buffer.store` any type you liked, and then they could be 
> type-varying in Clang?

We could either have a builtin for all the types that would work, or if we want 
to treat them more like a normal pointer, clang could verify you only use them 
with types that will work 

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -68,6 +68,10 @@ enum class fltNonfiniteBehavior {
   // `fltNanEncoding` enum. We treat all NaNs as quiet, as the available
   // encodings do not distinguish between signalling and quiet NaN.
   NanOnly,
+
+  // This behavior is present in Float6E3M2FN and Float6E2M3FN types.
+  // There is no representation for Inf or NaN.
+  NoNanInf,

arsenm wrote:

Invert and call SupportsNonFinite? 

https://github.com/llvm/llvm-project/pull/94735
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -878,6 +896,10 @@ void IEEEFloat::copySignificand(const IEEEFloat ) {
for the significand.  If double or longer, this is a signalling NaN,
which may not be ideal.  If float, this is QNaN(0).  */
 void IEEEFloat::makeNaN(bool SNaN, bool Negative, const APInt *fill) {
+  if (semantics->nonFiniteBehavior == fltNonfiniteBehavior::NoNanInf) {
+assert(false && "This floating point format does not support NaN\n");
+return;

arsenm wrote:

llvm_unreachable, also no dead return 

https://github.com/llvm/llvm-project/pull/94735
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

2024-06-07 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/94751
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -205,7 +205,7 @@ class ToolChain {
 
   /// Executes the given \p Executable and returns the stdout.
   llvm::Expected>
-  executeToolChainProgram(StringRef Executable) const;
+  executeToolChainProgram(StringRef Executable, unsigned Timeout = 0) const;

arsenm wrote:

Name this SecondsToWait to match ExecuteAndWait? 

https://github.com/llvm/llvm-project/pull/94751
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -1881,6 +1890,20 @@ TEST(APFloatTest, getSmallest) {
   EXPECT_TRUE(test.isFiniteNonZero());
   EXPECT_TRUE(test.isDenormal());
   EXPECT_TRUE(test.bitwiseIsEqual(expected));
+
+  test = APFloat::getSmallest(APFloat::Float6E3M2FN(), false);
+  expected = APFloat(APFloat::Float6E3M2FN(), "0x0.1p0");
+  EXPECT_FALSE(test.isNegative());
+  EXPECT_TRUE(test.isFiniteNonZero());
+  EXPECT_TRUE(test.isDenormal());
+  EXPECT_TRUE(test.bitwiseIsEqual(expected));
+
+  test = APFloat::getSmallest(APFloat::Float6E2M3FN(), false);
+  expected = APFloat(APFloat::Float6E2M3FN(), "0x0.2p0");
+  EXPECT_FALSE(test.isNegative());
+  EXPECT_TRUE(test.isFiniteNonZero());
+  EXPECT_TRUE(test.isDenormal());
+  EXPECT_TRUE(test.bitwiseIsEqual(expected));
 }
 

arsenm wrote:

Should also test getZero and the other special case constructors 

https://github.com/llvm/llvm-project/pull/94735
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Matt Arsenault via cfe-commits


@@ -47,6 +47,10 @@ static std::string convertToString(double d, unsigned Prec, 
unsigned Pad,
   return std::string(Buffer.data(), Buffer.size());
 }
 
+static bool hasNanOrInf(APFloat::Semantics S) {
+  return (S != APFloat::S_Float6E3M2FN) && (S != APFloat::S_Float6E2M3FN);
+}

arsenm wrote:

Probably would be useful as a helper somewhere in APFloat or Semantics 

https://github.com/llvm/llvm-project/pull/94735
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   10   >