[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-29 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment.

In D153953#4458386 , @foad wrote:

> In D153953#4458134 , @sameerds 
> wrote:
>
>> @pravinjagtap @arsenm ... reverting the mbcnt intrinsic affects tests that 
>> were added for atomic optimizations. In particular, the mbcnt is now being 
>> moved across/into/out of control flow because it is no longer convergent. I 
>> eyeballed one example and it seemed okay to me, but a more thorough check 
>> will be useful.
>
> They are just being moved from before the loop to after the loop. This is 
> fine. It is even a bit weird that the atomic optimizer pass emits them before 
> the loop in the first place.

@foad just to respect the process, are you okay with approving the review 
again? I had set it back to "needs review".


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153953/new/

https://reviews.llvm.org/D153953

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-29 Thread Jay Foad via Phabricator via cfe-commits
foad added a comment.

In D153953#4458134 , @sameerds wrote:

> @pravinjagtap @arsenm ... reverting the mbcnt intrinsic affects tests that 
> were added for atomic optimizations. In particular, the mbcnt is now being 
> moved across/into/out of control flow because it is no longer convergent. I 
> eyeballed one example and it seemed okay to me, but a more thorough check 
> will be useful.

They are just being moved from before the loop to after the loop. This is fine. 
It is even a bit weird that the atomic optimizer pass emits them before the 
loop in the first place.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153953/new/

https://reviews.llvm.org/D153953

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-29 Thread Pravin Jagtap via Phabricator via cfe-commits
pravinjagtap added a comment.

In D153953#4458134 , @sameerds wrote:

> @pravinjagtap @arsenm ... reverting the mbcnt intrinsic affects tests that 
> were added for atomic optimizations. In particular, the mbcnt is now being 
> moved across/into/out of control flow because it is no longer convergent. I 
> eyeballed one example and it seemed okay to me, but a more thorough check 
> will be useful.

Hello @sameerds,

The existing logic (before D147408 ) in 
atomic optimizer uses `mbcnt(exec)` to setup the control flow to allow only one 
lane to update the reduced value [Please refer: 
https://github.com/llvm/llvm-project/blob/80155cbf0be1744953edf68b9729c24bd0de79c4/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp#L671].
 The instructions related to that seems to be moved to other basic block 
(ComputeEnd). In my opinion it should not affect the logic. @ruiling and @foad 
can you please confirm this ?

While working on D147408 , I introduced 
cttz(exec) [Please refer: 
https://github.com/llvm/llvm-project/blob/80155cbf0be1744953edf68b9729c24bd0de79c4/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp#L538C1-L539C72]
 which is not being affected here. The `EXEC` is being moved to sgpr before the 
`ComputeLoop` (which we want) and then used it inside the `ComputeLoop` and 
modified inside it to set up the logic for `ComputeLoop`. This is not being 
affected because of this change.

I could be completely wrong here. Lets see what others have to say.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153953/new/

https://reviews.llvm.org/D153953

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-28 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds requested review of this revision.
sameerds added a comment.

@pravinjagtap @arsenm ... reverting the mbcnt intrinsic affects tests that were 
added for atomic optimizations. In particular, the mbcnt is now being moved 
across/into/out of control flow because it is no longer convergent. I eyeballed 
one example and it seemed okay to me, but a more thorough check will be useful.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153953/new/

https://reviews.llvm.org/D153953

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-28 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl accepted this revision.
yaxunl added a comment.
This revision is now accepted and ready to land.

I checked the ISA manual about mbcnt and I agree that its value should only 
depend on thread position in warp and not on CFG.

It is possible that the side effect of preventing it from merging fixed 
something by accident, but marking mbcnt as convergent seems not right. 
Therefore I agree to revert that patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153953/new/

https://reviews.llvm.org/D153953

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-28 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment.

In D153953#4455794 , @yaxunl wrote:

> Marking mbcnt as convergent, together with https://reviews.llvm.org/D144756 
> prevent mbcnt to be merged, which fixed the reported issue.
>
> Do you have an alternative fix for the issue?

I completely disagree with this line of thought. The change to mbcnt is 
fundamentally incorrect and not related to the issue. There is no ground to ask 
for changes to this revision. Why was the change to mbcnt committed if was not 
an actual fix for anything?

Also, please note that the issue itself is invalid. I have put a comment in 
github explaining the same. The original program itself is invalid.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153953/new/

https://reviews.llvm.org/D153953

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-28 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl requested changes to this revision.
yaxunl added a comment.
This revision now requires changes to proceed.

Marking mbcnt as convergent, together with https://reviews.llvm.org/D144756 
prevent mbcnt to be merged, which fixed the reported issue.

Do you have an alternative fix for the issue?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153953/new/

https://reviews.llvm.org/D153953

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-28 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds created this revision.
Herald added subscribers: kerbowa, tpr, dstuttard, yaxunl, jvesely, kzhuravl.
Herald added a project: All.
sameerds requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, wdng.
Herald added projects: clang, LLVM.

This reverts commit 37114036aa57e53217a57afacd7f47b36114edfb 
.

The output of mbcnt does not depend on other active lanes, and hence it is not
convergent. The original change was made as a possible fix for

https://github.com/ROCm-Developer-Tools/HIP/issues/3172

But changing mbcnt does not fix that issue.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D153953

Files:
  clang/test/CodeGenOpenCL/builtins-amdgcn.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td


Index: llvm/include/llvm/IR/IntrinsicsAMDGPU.td
===
--- llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -1833,12 +1833,12 @@
 def int_amdgcn_mbcnt_lo :
   ClangBuiltin<"__builtin_amdgcn_mbcnt_lo">,
   DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
-   [IntrNoMem, IntrConvergent]>;
+   [IntrNoMem]>;
 
 def int_amdgcn_mbcnt_hi :
   ClangBuiltin<"__builtin_amdgcn_mbcnt_hi">,
   DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
-[IntrNoMem, IntrConvergent]>;
+[IntrNoMem]>;
 
 // llvm.amdgcn.ds.swizzle src offset
 def int_amdgcn_ds_swizzle :
Index: clang/test/CodeGenOpenCL/builtins-amdgcn.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -690,14 +690,12 @@
 
 // CHECK-LABEL: @test_mbcnt_lo(
 // CHECK: call i32 @llvm.amdgcn.mbcnt.lo(i32 %src0, i32 %src1)
-// CHECK: declare i32 @llvm.amdgcn.mbcnt.lo(i32, i32) #[[$MBCNT_ATTRS:[0-9]+]]
 kernel void test_mbcnt_lo(global uint* out, uint src0, uint src1) {
   *out = __builtin_amdgcn_mbcnt_lo(src0, src1);
 }
 
 // CHECK-LABEL: @test_mbcnt_hi(
 // CHECK: call i32 @llvm.amdgcn.mbcnt.hi(i32 %src0, i32 %src1)
-// CHECK: declare i32 @llvm.amdgcn.mbcnt.hi(i32, i32) #[[$MBCNT_ATTRS]]
 kernel void test_mbcnt_hi(global uint* out, uint src0, uint src1) {
   *out = __builtin_amdgcn_mbcnt_hi(src0, src1);
 }
@@ -834,7 +832,6 @@
 // CHECK-DAG: [[$WS_RANGE]] = !{i16 1, i16 1025}
 // CHECK-DAG: attributes #[[$NOUNWIND_READONLY]] = { mustprogress nocallback 
nofree nosync nounwind willreturn memory(read) }
 // CHECK-DAG: attributes #[[$READ_EXEC_ATTRS]] = { convergent }
-// CHECK-DAG: attributes #[[$MBCNT_ATTRS]] = {{.* convergent .*}}
 // CHECK-DAG: ![[$EXEC]] = !{!"exec"}
 // CHECK-DAG: ![[$EXEC_LO]] = !{!"exec_lo"}
 // CHECK-DAG: ![[$EXEC_HI]] = !{!"exec_hi"}


Index: llvm/include/llvm/IR/IntrinsicsAMDGPU.td
===
--- llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -1833,12 +1833,12 @@
 def int_amdgcn_mbcnt_lo :
   ClangBuiltin<"__builtin_amdgcn_mbcnt_lo">,
   DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
-   [IntrNoMem, IntrConvergent]>;
+   [IntrNoMem]>;
 
 def int_amdgcn_mbcnt_hi :
   ClangBuiltin<"__builtin_amdgcn_mbcnt_hi">,
   DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
-[IntrNoMem, IntrConvergent]>;
+[IntrNoMem]>;
 
 // llvm.amdgcn.ds.swizzle src offset
 def int_amdgcn_ds_swizzle :
Index: clang/test/CodeGenOpenCL/builtins-amdgcn.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -690,14 +690,12 @@
 
 // CHECK-LABEL: @test_mbcnt_lo(
 // CHECK: call i32 @llvm.amdgcn.mbcnt.lo(i32 %src0, i32 %src1)
-// CHECK: declare i32 @llvm.amdgcn.mbcnt.lo(i32, i32) #[[$MBCNT_ATTRS:[0-9]+]]
 kernel void test_mbcnt_lo(global uint* out, uint src0, uint src1) {
   *out = __builtin_amdgcn_mbcnt_lo(src0, src1);
 }
 
 // CHECK-LABEL: @test_mbcnt_hi(
 // CHECK: call i32 @llvm.amdgcn.mbcnt.hi(i32 %src0, i32 %src1)
-// CHECK: declare i32 @llvm.amdgcn.mbcnt.hi(i32, i32) #[[$MBCNT_ATTRS]]
 kernel void test_mbcnt_hi(global uint* out, uint src0, uint src1) {
   *out = __builtin_amdgcn_mbcnt_hi(src0, src1);
 }
@@ -834,7 +832,6 @@
 // CHECK-DAG: [[$WS_RANGE]] = !{i16 1, i16 1025}
 // CHECK-DAG: attributes #[[$NOUNWIND_READONLY]] = { mustprogress nocallback nofree nosync nounwind willreturn memory(read) }
 // CHECK-DAG: attributes #[[$READ_EXEC_ATTRS]] = { convergent }
-// CHECK-DAG: attributes #[[$MBCNT_ATTRS]] = {{.* convergent .*}}
 // CHECK-DAG: ![[$EXEC]] = !{!"exec"}
 // CHECK-DAG: ![[$EXEC_LO]] = !{!"exec_lo"}
 // CHECK-DAG: ![[$EXEC_HI]] = !{!"exec_hi"}
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits