[PATCH] D148796: [AMDGPU][GFX908] Add builtin support for global add atomic f16/f32

2023-04-20 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision. rampitec added a comment. This revision now requires changes to proceed. We used to support it that way and decided just not doing it. It is very hard to explain why a supported atomic results in error. Someone who really needs it can use intrinsic.

[PATCH] D147732: [AMDGPU] Add f32 permlane{16, x16} builtin variants

2023-04-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D147732#4267553 , @foad wrote: > Changing the existing intrinsics to use type mangling could break clients > like LLPC and Mesa. I've put up a patch for LLPC to protect it against this > change:

[PATCH] D147732: [AMDGPU] Add f32 permlane{16, x16} builtin variants

2023-04-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D147732#4249584 , @jrbyrnes wrote: > In D147732#4249567 , @rampitec > wrote: > >> Isn't it simpler to lower it to an existing int intrinsic and casts in clang? > > Thanks for your

[PATCH] D147732: [AMDGPU] Add f32 permlane{16, x16} builtin variants

2023-04-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. Isn't it simpler to lower it to an existing int intrinsic and casts in clang? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D147732/new/ https://reviews.llvm.org/D147732 ___

[PATCH] D146840: [AMDGPU] Replace target feature for global fadd32

2023-03-28 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM. Please wait for @b-sumner. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D146840/new/ https://reviews.llvm.org/D146840

[PATCH] D146840: [AMDGPU] Replace target feature for global fadd32

2023-03-28 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. Can you please also add gfx90a and gfx940 tests? Otherwise LGTM *if* @b-sumner has no objections. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D146840/new/ https://reviews.llvm.org/D146840

[PATCH] D146840: [AMDGPU] Replace target feature for global fadd32

2023-03-27 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision. rampitec added a comment. This revision now requires changes to proceed. You cannot just enable it on gfx908 which does not have return version of it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D142507#4127505 , @b-sumner wrote: >> My current understanding is the c-p will go into already forked clang-16, >> but not to rocm 5.4. So rocm device-libs will be accompanied by the older >> clang-16 w/o this and stay

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D142507#4127421 , @b-sumner wrote: > I have no objection to backporting this, but it may need to be accompanied > with a device-libs patch, and I don't know where that patch would be checked > in. The ROCm-Device-Libs in

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D142507#4127374 , @aaronmondal wrote: > I think unless conflicts arise creating an issue similar to this > https://github.com/llvm/llvm-project/issues/60600 with the `cherry-pick` line > set to this commit should be

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D142507#4127275 , @aaronmondal wrote: >> I cannot say there was much choice. The only real choice was to postpone the >> split and magnify the problem in the future. As for the ifdefs, this might >> be possible in the

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D142507#4127167 , @aaronmondal wrote: > Well, I can already feel the pain that distro maintainers having to build the > next ROCm releases  > > I wonder what the better course of action is here: > > 1. Port this patch to

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D142507#4126864 , @aaronmondal wrote: >> It shall be complimented by the device-lib change in the corresponding >> release, so it is not that simple. > > @rampitec I'm not sure I understand. Does this mean that this is

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D142507#4125940 , @aaronmondal wrote: > Would it be possible to backport this to Clang 16? > > If > https://github.com/RadeonOpenCompute/ROCm-Device-Libs/commit/8dc779e19cbf2ccfd3307b60f7db57cf4203a5be > makes it into ROCm

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-01-26 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. rampitec marked an inline comment as done. Closed by commit rGdf0488369d32: [AMDGPU] Split dot7 feature (authored by rampitec). Herald added a project: clang. Herald

[PATCH] D142407: [AMDGPU] Split dot8 feature

2023-01-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG870b92977e89: [AMDGPU] Split dot8 feature (authored by rampitec). Herald added a project: clang. Herald added a subscriber: cfe-commits.

[PATCH] D142493: [AMDGPU] Remove dot1 and dot6 features from clang for gfx11

2023-01-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG4ab2246d486b: [AMDGPU] Remove dot1 and dot6 features from clang for gfx11 (authored by rampitec). Herald added a project: clang. Herald added a

[PATCH] D133966: [AMDGPU] Added __builtin_amdgcn_ds_bvh_stack_rtn

2022-09-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGe540965915a4: [AMDGPU] Added __builtin_amdgcn_ds_bvh_stack_rtn (authored by rampitec). Herald added a project: clang. Herald added a subscriber:

[PATCH] D129908: [AMDGPU] Support for gfx940 fp8 smfmac

2022-07-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG523a99c0eb03: [AMDGPU] Support for gfx940 fp8 smfmac (authored by rampitec). Herald added a project: clang. Herald added a subscriber: cfe-commits.

[PATCH] D129906: [AMDGPU] Support for gfx940 fp8 mfma

2022-07-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG2695f0a688e9: [AMDGPU] Support for gfx940 fp8 mfma (authored by rampitec). Herald added a project: clang. Herald added a subscriber: cfe-commits.

[PATCH] D129902: [AMDGPU] Support for gfx940 fp8 conversions

2022-07-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG9fa5a6b7e8a2: [AMDGPU] Support for gfx940 fp8 conversions (authored by rampitec). Herald added a project: clang. Herald added a subscriber:

[PATCH] D128952: [AMDGPU] Add WMMA clang builtins

2022-06-30 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128952/new/ https://reviews.llvm.org/D128952

[PATCH] D127904: [AMDGPU] gfx11 new dot instruction codegen support

2022-06-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D127904/new/ https://reviews.llvm.org/D127904

[PATCH] D127904: [AMDGPU] gfx11 new dot instruction codegen support

2022-06-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl:1 // REQUIRES: amdgpu-registered-target Also need positive tests like in builtins-amdgcn-dl-insts.cl. Comment at:

[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-05-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision. rampitec added a comment. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D124700/new/ https://reviews.llvm.org/D124700 ___ cfe-commits mailing list

[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-04-29 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. In D124700#3483715 , @kerbowa wrote: > In D124700#3483633 , @rampitec > wrote: > >> In D124700#3483609

[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-04-29 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D124700#3483609 , @kerbowa wrote: > In D124700#3483556 , @rampitec > wrote: > >> You do not handle masks other than 0 yet? > > We handle 0 and 1 only. Do you mean 1 is supported

[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-04-29 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. You do not handle masks other than 0 yet? Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:219 +// MASK = 0: No instructions may be scheduled across SCHED_BARRIER. +// MASK = 1: Non-memory, non-side-effect producing instructions may be +//

[PATCH] D123825: clang/AMDGPU: Define macro for -munsafe-fp-atomics

2022-04-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. Thanks! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D123825/new/ https://reviews.llvm.org/D123825 ___ cfe-commits mailing list

[PATCH] D122191: [AMDGPU] Support gfx940 smfmac instructions

2022-03-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG6e3e14f600af: [AMDGPU] Support gfx940 smfmac instructions (authored by rampitec). Herald added subscribers: cfe-commits, hsmhsm. Herald added a

[PATCH] D122044: [AMDGPU] New gfx940 mfma instructions

2022-03-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG27439a764230: [AMDGPU] New gfx940 mfma instructions (authored by rampitec). Herald added a project: clang. Herald added a subscriber: cfe-commits.

[PATCH] D121172: [AMDGPU] Set noclobber metadata on loads instead of cast to constant

2022-03-07 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG9eabea396814: [AMDGPU] Set noclobber metadata on loads instead of cast to constant (authored by rampitec). Herald added a project: clang. Herald

[PATCH] D121028: [AMDGPU] new gfx940 fp atomics

2022-03-07 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG932f628121d8: [AMDGPU] new gfx940 fp atomics (authored by rampitec). Herald added a project: clang. Herald added a subscriber: cfe-commits.

[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120846/new/ https://reviews.llvm.org/D120846

[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. Looks like cuda-bad-arch.cu does not have any gfx10. Let's fix this in a separate followup patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120846/new/ https://reviews.llvm.org/D120846

[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. You also need to rebase it, I have just landed gfx940 target. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120846/new/ https://reviews.llvm.org/D120846 ___ cfe-commits mailing

[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. Please also update these 2 files: clang/test/Driver/cuda-bad-arch.cu openmp/libomptarget/DeviceRTL/CMakeLists.txt In fact the last one was not updated before too, so the last target gfx1031 there. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D120688: [AMDGPU] Add gfx940 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG2e2e64df4a4f: [AMDGPU] Add gfx940 target (authored by rampitec). Herald added projects: clang, OpenMP. Herald added subscribers: openmp-commits,

[PATCH] D119886: [AMDGPU] Promote recursive loads from kernel argument to constant

2022-02-17 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGb0aa1946dfe1: [AMDGPU] Promote recursive loads from kernel argument to constant (authored by rampitec). Herald added a project: clang. Herald added

[PATCH] D115032: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D115032/new/ https://reviews.llvm.org/D115032

[PATCH] D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.

2021-10-20 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D112041#3074418 , @hliao wrote: > In D112041#3073676 , @rampitec > wrote: > >> In D112041#3073637 , @hliao wrote: >> >>> In D112041#3073560

[PATCH] D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.

2021-10-19 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D112041#3073637 , @hliao wrote: > In D112041#3073560 , @rampitec > wrote: > >> Is there anything to remove assume() call after address space is inferred? >> We do not need it

[PATCH] D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.

2021-10-19 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. Is there anything to remove assume() call after address space is inferred? We do not need it anymore. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D112041/new/ https://reviews.llvm.org/D112041

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12194 - return (fpModeMatchesGlobalFPAtomicMode(RMW) || - RMW->getFunction() - ->getFnAttribute("amdgpu-unsafe-fp-atomics") -

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D108150#2950479 , @gandhi21299 wrote: > My understanding is that since we are reporting unsafe expansion into hw > instructions, `fpModeMatchesGlobalFPAtomicMode(RMW)` must be false to match > the logic. Please run

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D108150#2950458 , @gandhi21299 wrote: > @rampitec Which part of the logic is wrong? Still the same around LDS. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108150/new/

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision. rampitec added a comment. This revision now requires changes to proceed. Logic is still wrong. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108150/new/ https://reviews.llvm.org/D108150

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12195 + if (!fpModeMatchesGlobalFPAtomicMode(RMW)) +return reportUnsafeHWInst(RMW, AtomicExpansionKind::None); gandhi21299 wrote: > rampitec wrote: > > rampitec

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-17 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:9 +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=si-lower -munsafe-fp-atomics %s -S -o - 2>&1 | \

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-17 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:9 +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=si-lower -munsafe-fp-atomics %s -S -o - 2>&1 | \ You are

[PATCH] D108150: [Remarks] Emit optimization remarks for atomics generating hardware instructions

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. - Add [AMDGPU] to the title. - Rebase on top of D106891 . - Add tests to atomics-remarks-gfx90a.ll as well, including LDS with matching and non-matching rounding mode. Comment at:

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision. rampitec added a comment. LGTM, but please wait for others too. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:175 + ORE = std::make_unique(); auto = TPC->getTM(); gandhi21299 wrote: > rampitec wrote: > > Is there a reason to construct it upfront and not just use a local variable >

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:175 + ORE = std::make_unique(); auto = TPC->getTM(); Is there a reason to construct it upfront and not just use a local variable only when needed? Like in

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:33 +float atomic_cas(__global atomic_float *d, float a) { + return __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_work_group); +} Just combine

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:32 +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("workgroup-one-as") monotonic +float atomic_cas_system(__global atomic_float *d, float a) { + return

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. Please restore opencl test. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:622 +return OptimizationRemark(DEBUG_TYPE, "Passed", AI->getFunction()) + << "A compare and swap loop was generated for an " + <<

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. Please retitle it without AMDGPU and remove the changes to pass ORE to targets. It is not a part of this change, it is a part of the folloup target specific change. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf gandhi21299 wrote: >

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf gandhi21299 wrote: >

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf gandhi21299 wrote: >

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:618 expandAtomicRMWToCmpXchg(AI, createCmpXchgInstFun); + Ctx.getSyncScopeNames(SSNs); + auto MemScope = SSNs[AI->getSyncScopeID()].empty() Only if SSNs.empty().

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); gandhi21299 wrote: > rampitec wrote:

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); gandhi21299 wrote: > rampitec wrote:

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); gandhi21299 wrote: > rampitec wrote:

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:629 + Remark << "A compare and swap loop was generated for an atomic " +"operation " +"at " Need to name the operation.

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. You also need to retitle it now, it is not about AMDGPU and not about FP. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D106891#2940411 , @gandhi21299 wrote: > - eliminated unsafe hardware remarks in SIISelLowering.cpp Most of this patch is not needed now. You do not need to pass ORE to targets, it is a part of the next patch.

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-11 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D106891#2938128 , @gandhi21299 wrote: > @rampitec besides the remarks, am I missing anything else in the patch? You should not use AMD specific code in the common code. Repository: rG LLVM Github Monorepo CHANGES SINCE

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-10 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:628 + AI, Kind, + Remark << "A hardware CAS loop generated: if the memory is " +"known to be coarse-grain allocated then a hardware " Still

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-09 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12120 + OptimizationRemarkEmitter *ORE, + OptimizationRemark OptRemark) { + ORE->emit([&]() { return OptRemark; }); gandhi21299 wrote: >

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12120 + OptimizationRemarkEmitter *ORE, + OptimizationRemark OptRemark) { + ORE->emit([&]() { return OptRemark; }); gandhi21299 wrote: >

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision. rampitec added inline comments. This revision now requires changes to proceed. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:588 + Remark + << "A hardware CAS loop generated: if the memory is " + "known

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/fp-atomics-optremarks-gfx90a.cl:23 + +// GFX90A-HW: A floating-point atomic instruction will generate an unsafe hardware instruction which may fail to update memory [-Rpass=si-lower] +// GFX90A-HW-LABEL:

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12118 -TargetLowering::AtomicExpansionKind -SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const { +TargetLowering::AtomicExpansionKind SITargetLowering::reportAtomicExpand(

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12155 + OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction()); + Remark << "A floating-point atomic instruction with no following use" +"

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12155 + OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction()); + Remark << "A floating-point atomic instruction with no following use" +"

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106909/new/ https://reviews.llvm.org/D106909

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12146 +OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction()); +Remark << "A floating-point atomic instruction will generate an unsafe" + "

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics-unsupported-gfx7.cl:8 +} \ No newline at end of file Add new line. Comment at: clang/test/CodeGenOpenCL/unsupported-fadd2f16-gfx908.cl:1 +// REQUIRES:

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics in GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. The title should not mention gfx90a, it is not true. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics in GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12139 +OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction()); +Remark << "A hardware instruction was generated"; +return Remark;

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:201 +TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1f", "t", "gfx90a-insts") +TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2f16, "V2hV2h*1V2h", "t", "gfx90a-insts")

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. JBTW, patch title is way too long. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D106891#2925692 , @gandhi21299 wrote: > - eliminated the scope argument as per discussion > - added more tests You have updated wrong patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:595 + OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction()); + Remark << "A hardware instruction was generated"; + return Remark; Nothing was generated

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. It still does not do anything useful and still produces useless, wrong and misleading remarks for all targets. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16270 +llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy}); +return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1}); + } gandhi21299 wrote: > rampitec wrote: > >

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D106909#2923724 , @gandhi21299 wrote: > @rampitec what should I be testing exactly in the IR test? Produced call to the intrinsic. All of these tests there doing that. Repository: rG LLVM Github Monorepo CHANGES SINCE

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D106909#2922567 , @gandhi21299 wrote: > @rampitec how do I handle the following? > > builtins-fp-atomics.cl:38:10: error: > '__builtin_amdgcn_global_atomic_fadd_f64' needs target feature > atomic-fadd-insts > *rtn =

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D106891#2921108 , @gandhi21299 wrote: > How can I construct an ORE to start off with? I don't think its appropriate > to construct it in `shouldExpandAtomicRMWInsts(RMW)` You have already constructed it. You can just pass

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D106891#2921096 , @gandhi21299 wrote: > @rampitec Since remarks cannot be emitted in SIISelLowering because it isn't > a pass, in what form can I emit the diagnostics in SIISelLowering? You could pass ORE to the TLI.

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16270 +llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy}); +return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1}); + } Should we map flags since we already have

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212 + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: { +Intrinsic::ID IID; +llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); arsenm wrote: > rampitec

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212 + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: { +Intrinsic::ID IID; +llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); arsenm wrote: > rampitec

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments. Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212 + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: { +Intrinsic::ID IID; +llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); gandhi21299 wrote: > rampitec

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment. In D106891#2921048 , @gandhi21299 wrote: > @rampitec should the unsafe check go in some pass later in the pipeline then? No. The only place which has all the knowledge is `SITargetLowering::shouldExpandAtomicRMWInIR()`. That

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision. rampitec added a comment. This revision now requires changes to proceed. You cannot do it in a generic llvm code, it simply has no knowledge of what was the reason for BE's choice. Comment at:

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision. rampitec added a comment. This revision now requires changes to proceed. Needs an IR test, a test for different supported targets, and a negative test for unsupported features. Comment at:

  1   2   >