[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-24 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Improve diagnostics for constraints of inline asm (NFC) (PR #96363)

2024-06-24 Thread Matt Arsenault via cfe-commits
@@ -2626,14 +2629,20 @@ void CodeGenFunction::EmitAsmStmt(const AsmStmt ) { SmallVector OutputConstraintInfos; SmallVector InputConstraintInfos; + const FunctionDecl *FD = dyn_cast_or_null(CurCodeDecl); arsenm wrote: I think we should just get rid of

[clang] [clang] Improve diagnostics for constraints of inline asm (NFC) (PR #96363)

2024-06-24 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: It's really unfortunate to have to add all this asm handling to clang. Can't it rely on backend diagnostic remarks for this? https://github.com/llvm/llvm-project/pull/96363 ___ cfe-commits mailing list

[clang] [clang] Improve diagnostics for constraints of inline asm (NFC) (PR #96363)

2024-06-24 Thread Matt Arsenault via cfe-commits
@@ -2626,14 +2629,20 @@ void CodeGenFunction::EmitAsmStmt(const AsmStmt ) { SmallVector OutputConstraintInfos; SmallVector InputConstraintInfos; + const FunctionDecl *FD = dyn_cast_or_null(CurCodeDecl); arsenm wrote: Where do you get dyn_cast_or_null is

[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-24 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Kindly review only the top commit here If you're going to repost with a pre-commit, it would be better to have all the pieces squashed into one. Also you could look into using graphite or SPR for managing dependent pull requests

[clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-23 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/95396 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] AMDGPU: Start selecting buffer fat pointer atomicrmw fmin/fmax (PR #95593)

2024-06-23 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/95593 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-23 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/95592 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-23 Thread Matt Arsenault via cfe-commits
arsenm wrote: ### Merge activity * **Jun 23, 4:06 AM EDT**: @arsenm started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/95592). https://github.com/llvm/llvm-project/pull/95592

[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Matt Arsenault via cfe-commits
arsenm wrote: Incrementing by align is just a bug, of course the size is the real value. Whether we want to continue wasting space is another not-correctness discussion https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list

[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Here, because the minimum alignment is 4, we will only increment the buffer by 4, It should be incrementing by the size? 4 byte aligned access of 8 byte type should work fine https://github.com/llvm/llvm-project/pull/96370 ___

[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-22 Thread Matt Arsenault via cfe-commits
@@ -1671,6 +1671,7 @@ int main(int Argc, char **Argv) { NewArgv.push_back(Arg->getValue()); for (const opt::Arg *Arg : Args.filtered(OPT_offload_opt_eq_minus)) NewArgv.push_back(Args.MakeArgString(StringRef("-") + Arg->getValue())); + llvm::errs() << "asdfasdf\n";

[clang] [Clang] Replace `emitXXXBuiltin` with a unified interface (PR #96313)

2024-06-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/96313 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -149,6 +149,12 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc") BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc") +BUILTIN(__builtin_amdgcn_raw_buffer_store_b8, "vcQbiiIi", "n")

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -149,6 +149,12 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc") BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc") +BUILTIN(__builtin_amdgcn_raw_buffer_store_b8, "vcQbiiIi", "n")

[clang] [Clang] Replace `emitXXXBuiltin` with a unified interface (PR #96313)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -581,49 +581,19 @@ static Value *emitCallMaybeConstrainedFPBuiltin(CodeGenFunction , return CGF.Builder.CreateCall(F, Args); } -// Emit a simple mangled intrinsic that has 1 argument and a return type -// matching the argument type. -static Value

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -149,6 +149,19 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc") BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc") +BUILTIN(__builtin_amdgcn_raw_ptr_buffer_store_i8, "vcQbiiIi", "n")

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -149,6 +149,19 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc") BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc") +BUILTIN(__builtin_amdgcn_raw_ptr_buffer_store_i8, "vcQbiiIi", "n")

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -626,6 +626,18 @@ static Value *emitQuaternaryBuiltin(CodeGenFunction , const CallExpr *E, return CGF.Builder.CreateCall(F, {Src0, Src1, Src2, Src3}); } +static Value *emitQuinaryBuiltin(CodeGenFunction , const CallExpr *E, arsenm wrote: The name is a

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/94576 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -149,6 +149,19 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc") BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc") +BUILTIN(__builtin_amdgcn_raw_ptr_buffer_store_i8, "vcQbiiIi", "n")

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: I'm wondering if we should really have all the different typed variants, and if this should be the name. I guess https://github.com/llvm/llvm-project/pull/94576 ___ cfe-commits mailing list

[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -169,6 +180,11 @@ // COMMON-UNSAFE-MATH-SAME: "-mlink-builtin-bitcode" "{{.*}}/amdgcn/bitcode/oclc_finite_only_off.bc" // COMMON-UNSAFE-MATH-SAME: "-mlink-builtin-bitcode" "{{.*}}/amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc" +// ASAN-SAME: "-fsanitize=address" +

[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/96262 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/96262 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-18 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/95396 >From f0f8e09caff2df5632d4252ca354b24c0c6f0e87 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 10 Jun 2024 19:48:13 +0200 Subject: [PATCH] AMDGPU: Remove ds atomic fadd intrinsics These have been

[clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-18 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/95396 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-18 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/95395 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-18 Thread Matt Arsenault via cfe-commits
@@ -33,6 +33,7 @@ // q -> Scalable vector, followed by the number of elements and the base type. // Q -> target builtin type, followed by a character to distinguish the builtin type //Qa -> AArch64 svcount_t builtin type. +//Qb -> AMDGPU __amdgpu_buffer_rsrc_t

[clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-18 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/95395 >From 35c741fe2563094bc20c179ee9f244620025405c Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 10 Jun 2024 19:40:59 +0200 Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins We should

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-18 Thread Matt Arsenault via cfe-commits
@@ -33,6 +33,7 @@ // q -> Scalable vector, followed by the number of elements and the base type. // Q -> target builtin type, followed by a character to distinguish the builtin type //Qa -> AArch64 svcount_t builtin type. +//Qb -> AMDGPU __amdgpu_buffer_rsrc_t

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-18 Thread Matt Arsenault via cfe-commits
@@ -19082,6 +19082,15 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType}); return Builder.CreateCall(F, {Arg}); } + case AMDGPU::BI__builtin_amdgcn_make_buffer_rsrc: { +

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. LGTM but I'm not a frontend expert https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,11 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -verify -triple amdgcn-amd-amdhsa -Wno-unused-value %s arsenm wrote: set explicit -cl-std, and check 1.2 and 2.0? https://github.com/llvm/llvm-project/pull/94830

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -305,43 +304,43 @@ bool SIAnnotateControlFlow::handleLoop(BranchInst *Term) { } /// Close the last opened control flow -bool SIAnnotateControlFlow::closeControlFlow(BasicBlock *BB) { - llvm::Loop *L = LI->getLoopFor(BB); +bool

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1 @@ +remark: :0:0: removing function 'needs_extimg': +extended-image-insts is not supported on the current target arsenm wrote: accidentally added file? https://github.com/llvm/llvm-project/pull/92809 ___

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -15740,6 +15740,32 @@ void SITargetLowering::finalizeLowering(MachineFunction ) const { } } + // ISel inserts copy to regs for the successor PHIs + // at the BB end. We need to move the SI_WAVE_RECONVERGE right before the arsenm wrote: Can you

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -2103,12 +2103,36 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr ) const { MI.setDesc(get(AMDGPU::S_MOV_B64)); break; + case AMDGPU::S_CMOV_B64_term: +// This is only a terminator to get the correct spill code placement during +// register

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -1,3 +1,4 @@ +; XFAIL: * arsenm wrote: can't just xfail tests https://github.com/llvm/llvm-project/pull/92809 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -3172,8 +3172,8 @@ def int_amdgcn_loop : Intrinsic<[llvm_i1_ty], [llvm_anyint_ty], [IntrWillReturn, IntrNoCallback, IntrNoFree] >; -def int_amdgcn_end_cf : Intrinsic<[], [llvm_anyint_ty], - [IntrWillReturn, IntrNoCallback, IntrNoFree]>; +def int_amdgcn_wave_reconverge :

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -15740,6 +15740,32 @@ void SITargetLowering::finalizeLowering(MachineFunction ) const { } } + // ISel inserts copy to regs for the successor PHIs + // at the BB end. We need to move the SI_WAVE_RECONVERGE right before the + // branch. + for (auto : MF) { +

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/92809 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm requested changes to this pull request. There are quite a few code quality regressions, and XFAILed tests. The description needs more elaboration on what the strategy is here https://github.com/llvm/llvm-project/pull/92809

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CodeGen] Add query for a target's flat address space (PR #95728)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -1764,6 +1764,13 @@ class TargetInfo : public TransferrableTargetInfo, return 0; } + /// \returns Target specific flat ptr address space; a flat ptr is a ptr that + /// can be casted to / from all other target address spaces. If the target + /// exposes no such

[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,86 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -O0 -triple

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,84 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + //

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,84 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + //

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,9 @@ + arsenm wrote: Extra blank line https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,17 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fsyntax-only -verify -std=gnu++11 -triple amdgcn -Wno-unused-value %s + arsenm wrote: We probably want another similar sema test for OpenCL/HIP/OpenMP

[clang] [clang][CodeGen] Add query for a target's flat address space (PR #95728)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -1764,6 +1764,13 @@ class TargetInfo : public TransferrableTargetInfo, return 0; } + /// \returns Target specific flat ptr address space; a flat ptr is a ptr that + /// can be casted to / from all other target address spaces. If the target + /// exposes no such

[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,86 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -O0 -triple

[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,86 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -O0 -triple

[clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-15 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/95395 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} =

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,21 @@ +//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier:

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,84 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + //

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,21 @@ +//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier:

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} =

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -6129,13 +6150,55 @@ static SDValue lowerLaneOp(const SITargetLowering , SDNode *N, if (ValSize % 32 != 0) return SDValue(); + auto unrollLaneOp = [, ](SDNode *N) -> SDValue { +EVT VT = N->getValueType(0); +unsigned NE = VT.getVectorNumElements(); +EVT

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} =

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-13 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,69 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + // RUN: %clang_cc1 -triple

[clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [lld] [lldb] [llvm] [mlir] [openmp] [llvm-project] Fix typo "seperate" (PR #95373)

2024-06-13 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/95373 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Offload] Introduce the concept of "default streams" (PR #95371)

2024-06-13 Thread Matt Arsenault via cfe-commits
@@ -1125,6 +1125,22 @@ void Clang::AddPreprocessingOptions(Compilation , const JobAction , CmdArgs.push_back("__clang_openmp_device_functions.h"); } + if (Args.hasArg(options::OPT_foffload_via_llvm)) { +// Add llvm_wrappers/* to our system include path. This lets

[clang] [llvm] [Offload] Introduce the concept of "default streams" (PR #95371)

2024-06-13 Thread Matt Arsenault via cfe-commits
@@ -1125,6 +1125,22 @@ void Clang::AddPreprocessingOptions(Compilation , const JobAction , CmdArgs.push_back("__clang_openmp_device_functions.h"); } + if (Args.hasArg(options::OPT_foffload_via_llvm)) { +// Add llvm_wrappers/* to our system include path. This lets

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-13 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Just a note - and maybe this was already discussed above - is there good > reason not to explicitly make this type a 128-bit scalar? The LLVM data > layout already does this I thought this was the 160 bit version? Can we have an opaque-but-sized type? The concern is exposing

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-13 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,95 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu verde -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-13 Thread Matt Arsenault via cfe-commits
arsenm wrote: > I understand the chance of conflict is low. It may be like the chance of > hitting by a meteor. However, if we prefix with `__amdgcn_`, there is no such > risk. And we have the benefit to clearly indicate it is a amdgcn > target-specific type. Should use amdgpu

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,95 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu verde -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,14 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple

[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-12 Thread Matt Arsenault via cfe-commits
@@ -128,12 +128,13 @@ enum class CudaArch { GFX12_GENERIC, GFX1200, GFX1201, + AMDGCNSPIRV, Generic, // A processor model named 'generic' if the target backend defines a // public one. LAST, CudaDefault = CudaArch::SM_52, - HIPDefault =

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Or drop the new nodes altogether and legelaize to intrinsics directly ? That's another option. The only real plus to the intermediate is it's slightly less annoying to write combines for. But there are limited combining opportunities for these

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,46 @@ +# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o /dev/null %s 2>&1 | FileCheck %s arsenm wrote: I'd still test all 3, but yes an IR test https://github.com/llvm/llvm-project/pull/89217

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,46 @@ +# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o /dev/null %s 2>&1 | FileCheck %s arsenm wrote: You should not need to introduce any new machine verifier tests, they are not useful. The useful test would be the

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits
@@ -2201,6 +2207,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { Align = 8; \ break; #include "clang/Basic/WebAssemblyReferenceTypes.def" +case BuiltinType::AMDGPUBufferRsrc: +

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,11 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wno-unused-value %s + +void foo() { + int n = 100; + __buffer_rsrc_t v = 0; // expected-error {{cannot initialize a variable of type '__buffer_rsrc_t' with an

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,11 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wno-unused-value %s + +void foo() { + int n = 100; + __buffer_rsrc_t v = 0; // expected-error {{cannot initialize a variable of type '__buffer_rsrc_t' with an

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits
@@ -2200,6 +2206,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { Align = 8; \ break; #include "clang/Basic/WebAssemblyReferenceTypes.def" +case BuiltinType::AMDGPUBufferRsrc: +

[clang] [llvm] [Offload][CUDA] Add initial cuda_runtime.h overlay (PR #94821)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,30 @@ +// RUN: %clang++ -foffload-via-llvm --offload-arch=native %s -o %t +// RUN: %t | %fcheck-generic + +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED:

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,9 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fclang-abi-compat=latest -triple amdgcn %s -emit-llvm -o - | FileCheck %s arsenm wrote: Why do you need -fclang-abi-compat=latest https://github.com/llvm/llvm-project/pull/94830

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,21 @@ +//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier:

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -2200,6 +2206,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { Align = 8; \ break; #include "clang/Basic/WebAssemblyReferenceTypes.def" +case BuiltinType::AMDGPUBufferRsrc: +

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: Need stacked PR that adds the make_buffer_rsrc builtin that shows its use https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -1091,6 +1091,9 @@ enum PredefinedTypeIDs { // \brief WebAssembly reference types with auto numeration #define WASM_TYPE(Name, Id, SingletonId) PREDEF_TYPE_##Id##_ID, #include "clang/Basic/WebAssemblyReferenceTypes.def" +// \breif AMDGPU types with auto numeration

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -16055,6 +16145,90 @@ of the two arguments. -0.0 is considered to be less than +0.0 for this intrinsic. Note that these are the semantics specified in the draft of IEEE 754-2019. +.. _i_minimumnum: + +'``llvm.minimumnum.*``' Intrinsic +^ +

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -16055,6 +16145,90 @@ of the two arguments. -0.0 is considered to be less than +0.0 for this intrinsic. Note that these are the semantics specified in the draft of IEEE 754-2019. +.. _i_minimumnum: + +'``llvm.minimumnum.*``' Intrinsic +^ +

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -15874,6 +15874,96 @@ The returned value is completely identical to the input except for the sign bit; in particular, if the input is a NaN, then the quiet/signaling bit and payload are perfectly preserved. +.. _i_fminmax_family: + +'``llvm.min.*``' Intrinsics Comparation

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -15874,6 +15874,96 @@ The returned value is completely identical to the input except for the sign bit; in particular, if the input is a NaN, then the quiet/signaling bit and payload are perfectly preserved. +.. _i_fminmax_family: + +'``llvm.min.*``' Intrinsics Comparation

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -16055,6 +16145,90 @@ of the two arguments. -0.0 is considered to be less than +0.0 for this intrinsic. Note that these are the semantics specified in the draft of IEEE 754-2019. +.. _i_minimumnum: + +'``llvm.minimumnum.*``' Intrinsic +^ +

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits
arsenm wrote: > "aggregates" here might even be unusual cases like `<4 x i8>` Vectors aren't aggregates and are more reasonable https://github.com/llvm/llvm-project/pull/94576 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits
arsenm wrote: > `voffset` and `soffset` are "offset that goes in VGPRs" and "offset that goes > in SGPRs", with the latter having some different bounds-checking semantics on > ... at least some of the gfx9's, IIRC. > Right, that's the problem. We need to know the parameters of the SRD in

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits
arsenm wrote: > 2. What I mean is that "types that work" isn't the right framing: any type > can be legalized to one or more types that work. That is, down in the isel > legalizer, if I call for, for example >```llvm >%0 = call {i64, i64, i8} @llvm.amdgcn.raw.buffer.ptr.load(ptr

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits
arsenm wrote: > 1. For the swizzled case, that's `struct.ptr.buffer.*`, and yeah, those will > always need builtins because LLVM can't deal in 2D addressing schemes But the raw buffer intrinsics have both the soffset and voffset parameters though? Not just the struct

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Actually, even ignoring address space 7, it feels like these builtins if you > could `raw.ptr.buffer.store` any type you liked, and then they could be > type-varying in Clang? We could either have a builtin for all the types that would work, or if we want to treat them more

[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -68,6 +68,10 @@ enum class fltNonfiniteBehavior { // `fltNanEncoding` enum. We treat all NaNs as quiet, as the available // encodings do not distinguish between signalling and quiet NaN. NanOnly, + + // This behavior is present in Float6E3M2FN and Float6E2M3FN types.

[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -878,6 +896,10 @@ void IEEEFloat::copySignificand(const IEEEFloat ) { for the significand. If double or longer, this is a signalling NaN, which may not be ideal. If float, this is QNaN(0). */ void IEEEFloat::makeNaN(bool SNaN, bool Negative, const APInt *fill) { +

[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

2024-06-07 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/94751 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -205,7 +205,7 @@ class ToolChain { /// Executes the given \p Executable and returns the stdout. llvm::Expected> - executeToolChainProgram(StringRef Executable) const; + executeToolChainProgram(StringRef Executable, unsigned Timeout = 0) const; arsenm

  1   2   3   4   5   6   7   8   9   10   >