[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-05-31 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

Too late to backport - no more 18.x releases are planned.

https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-05-31 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)

2024-05-10 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> Fixed encoding of AMDGPU instructions

I don't think the release notes should say that. It makes it sound like all 
encodings were wrong.

https://github.com/llvm/llvm-project/pull/91034
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)

2024-05-06 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.


https://github.com/llvm/llvm-project/pull/91034
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815) (PR #90204)

2024-05-02 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> Hi @jayfoad (or anyone else). If you would like to add a note about this fix 
> in the release notes (completely optional). Please reply to this comment with 
> a one or two sentence description of the fix. When you are done, please add 
> the release:note label to this PR.

I don't think this fix is particularly noteworthy. Would there already be a 
list of bugs fixed in the release notes?

https://github.com/llvm/llvm-project/pull/90204
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-05-01 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> Let's not backport this yet since @pendingchaos has pointed out a problem 
> with #90201.

Fixed by #90710 which I have added to this PR.

https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-05-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad ready_for_review 
https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-05-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/90582

>From 17b75a9517891d662e677a357713c920bb79c43c Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Tue, 30 Apr 2024 10:41:51 +0100
Subject: [PATCH 1/2] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load
 (#90201)

image_msaa_load is actually encoded as a VSAMPLE instruction and
requires the appropriate waitcnt variant.
---
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp   |  8 --
 .../AMDGPU/llvm.amdgcn.image.msaa.load.ll | 26 +--
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 6ecb1c8bf6e1db..97c55e4d9e41c2 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -187,8 +187,12 @@ VmemType getVmemType(const MachineInstr ) {
   const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(Inst.getOpcode());
   const AMDGPU::MIMGBaseOpcodeInfo *BaseInfo =
   AMDGPU::getMIMGBaseOpcodeInfo(Info->BaseOpcode);
-  return BaseInfo->BVH ? VMEM_BVH
-   : BaseInfo->Sampler ? VMEM_SAMPLER : VMEM_NOSAMPLER;
+  // The test for MSAA here is because gfx12+ image_msaa_load is actually
+  // encoded as VSAMPLE and requires the appropriate s_waitcnt variant for 
that.
+  // Pre-gfx12 doesn't care since all vmem types result in the same s_waitcnt.
+  return BaseInfo->BVH ? VMEM_BVH
+ : BaseInfo->Sampler || BaseInfo->MSAA ? VMEM_SAMPLER
+   : VMEM_NOSAMPLER;
 }
 
 unsigned (AMDGPU::Waitcnt , InstCounterType T) {
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
index 1348315e72e7bc..8da48551855570 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
@@ -12,7 +12,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg 
%rsrc, i32 %s, i32 %t,
 ; GFX12-LABEL: load_2dmsaa:
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2], s[0:7] dmask:0x1 
dim:SQ_RSRC_IMG_2D_MSAA unorm ; encoding: 
[0x06,0x20,0x46,0xe4,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x00]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
   %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2dmsaa.v4f32.i32(i32 1, 
i32 %s, i32 %t, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
@@ -32,7 +32,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa_both(<8 x i32> 
inreg %rsrc, ptr addrsp
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2], s[0:7] dmask:0x2 
dim:SQ_RSRC_IMG_2D_MSAA unorm tfe lwe ; encoding: 
[0x0e,0x20,0x86,0xe4,0x00,0x01,0x00,0x00,0x00,0x01,0x02,0x00]
 ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: 
[0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
@@ -53,7 +53,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> 
inreg %rsrc, i32 %s, i3
 ; GFX12-LABEL: load_2darraymsaa:
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2, v3], s[0:7] dmask:0x4 
dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm ; encoding: 
[0x07,0x20,0x06,0xe5,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
   %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2darraymsaa.v4f32.i32(i32 
4, i32 %s, i32 %t, i32 %slice, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
@@ -73,7 +73,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa_tfe(<8 x i32> 
inreg %rsrc, ptr ad
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2, v3], s[0:7] dmask:0x8 
dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm tfe ; encoding: 
[0x0f,0x20,0x06,0xe6,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03]
 ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: 
[0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
@@ -94,7 +94,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa_glc(<8 x i32> inreg 
%rsrc, i32 %s, i32
 ; GFX12-LABEL: load_2dmsaa_glc:
 ; GFX12:   ; 

[llvm-branch-commits] [llvm] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) (PR #90719)

2024-05-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned 
https://github.com/llvm/llvm-project/pull/90719
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) (PR #90719)

2024-05-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/90719

Code to determine if a waitcnt is required before a barrier instruction
only
considered S_BARRIER.
gfx12 adds barrier_signal/wait so need to enhance the existing code to
look for
a barrier start (which is just an S_BARRIER for earlier architectures).

>From e31113098e4669850f3ff924bead9e0fb9618f20 Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Wed, 1 May 2024 11:37:13 +0100
Subject: [PATCH] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12
 (#90595)

Code to determine if a waitcnt is required before a barrier instruction
only
considered S_BARRIER.
gfx12 adds barrier_signal/wait so need to enhance the existing code to
look for
a barrier start (which is just an S_BARRIER for earlier architectures).
---
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp   |  2 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  | 11 ++
 .../CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll   |  2 ++
 .../AMDGPU/llvm.amdgcn.s.barrier.wait.ll  | 22 +++
 4 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 6ecb1c8bf6e1db..7a3198612f86fc 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1832,7 +1832,7 @@ bool 
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr ,
   // not, we need to ensure the subtarget is capable of backing off barrier
   // instructions in case there are any outstanding memory operations that may
   // cause an exception. Otherwise, insert an explicit S_WAITCNT 0 here.
-  if (MI.getOpcode() == AMDGPU::S_BARRIER &&
+  if (TII->isBarrierStart(MI.getOpcode()) &&
   !ST->hasAutoWaitcntBeforeBarrier() && !ST->supportsBackOffBarrier()) {
 Wait = Wait.combined(
 AMDGPU::Waitcnt::allZero(ST->hasExtendedWaitCounts(), ST->hasVscnt()));
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 1c9dacc09f8154..626d903c0c6958 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -908,6 +908,17 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 return MI.getDesc().TSFlags & SIInstrFlags::IsNeverUniform;
   }
 
+  // Check to see if opcode is for a barrier start. Pre gfx12 this is just the
+  // S_BARRIER, but after support for S_BARRIER_SIGNAL* / S_BARRIER_WAIT we 
want
+  // to check for the barrier start (S_BARRIER_SIGNAL*)
+  bool isBarrierStart(unsigned Opcode) const {
+return Opcode == AMDGPU::S_BARRIER ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_M0 ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_M0 ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_IMM ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_IMM;
+  }
+
   static bool doesNotReadTiedSource(const MachineInstr ) {
 return MI.getDesc().TSFlags & SIInstrFlags::TiedSourceNotRead;
   }
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
index a7d3115af29bff..47c021769aa56f 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
@@ -96,6 +96,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) 
%out, i32 %size) #0 {
 ; VARIANT4-NEXT:s_wait_kmcnt 0x0
 ; VARIANT4-NEXT:v_xad_u32 v1, v0, -1, s2
 ; VARIANT4-NEXT:global_store_b32 v3, v0, s[0:1]
+; VARIANT4-NEXT:s_wait_storecnt 0x0
 ; VARIANT4-NEXT:s_barrier_signal -1
 ; VARIANT4-NEXT:s_barrier_wait -1
 ; VARIANT4-NEXT:v_ashrrev_i32_e32 v2, 31, v1
@@ -142,6 +143,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) 
%out, i32 %size) #0 {
 ; VARIANT6-NEXT:v_dual_mov_b32 v4, s1 :: v_dual_mov_b32 v3, s0
 ; VARIANT6-NEXT:v_sub_nc_u32_e32 v1, s2, v0
 ; VARIANT6-NEXT:global_store_b32 v5, v0, s[0:1]
+; VARIANT6-NEXT:s_wait_storecnt 0x0
 ; VARIANT6-NEXT:s_barrier_signal -1
 ; VARIANT6-NEXT:s_barrier_wait -1
 ; VARIANT6-NEXT:v_ashrrev_i32_e32 v2, 31, v1
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
index 4ab5e97964a857..38a34ec6daf73c 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
@@ -12,6 +12,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr 
addrspace(1) %out) #0 {
 ; GCN-NEXT:v_sub_nc_u32_e32 v0, v1, v0
 ; GCN-NEXT:s_wait_kmcnt 0x0
 ; GCN-NEXT:global_store_b32 v3, v2, s[0:1]
+; GCN-NEXT:s_wait_storecnt 0x0
 ; GCN-NEXT:s_barrier_signal -1
 ; GCN-NEXT:s_barrier_wait -1
 ; GCN-NEXT:global_store_b32 v3, v0, s[0:1]
@@ -28,6 +29,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr 
addrspace(1) %out) #0 {
 ; GLOBAL-ISEL-NEXT:v_sub_nc_u32_e32 v0, v1, v0
 ; GLOBAL-ISEL-NEXT:s_wait_kmcnt 0x0
 ; GLOBAL-ISEL-NEXT:  

[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-04-30 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad converted_to_draft 
https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-04-30 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

Let's not backport this yet since @pendingchaos has pointed out a problem with 
#90201.

https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-04-30 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned 
https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-04-30 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/90582

image_msaa_load is actually encoded as a VSAMPLE instruction and
requires the appropriate waitcnt variant.


>From 17b75a9517891d662e677a357713c920bb79c43c Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Tue, 30 Apr 2024 10:41:51 +0100
Subject: [PATCH] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201)

image_msaa_load is actually encoded as a VSAMPLE instruction and
requires the appropriate waitcnt variant.
---
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp   |  8 --
 .../AMDGPU/llvm.amdgcn.image.msaa.load.ll | 26 +--
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 6ecb1c8bf6e1db..97c55e4d9e41c2 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -187,8 +187,12 @@ VmemType getVmemType(const MachineInstr ) {
   const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(Inst.getOpcode());
   const AMDGPU::MIMGBaseOpcodeInfo *BaseInfo =
   AMDGPU::getMIMGBaseOpcodeInfo(Info->BaseOpcode);
-  return BaseInfo->BVH ? VMEM_BVH
-   : BaseInfo->Sampler ? VMEM_SAMPLER : VMEM_NOSAMPLER;
+  // The test for MSAA here is because gfx12+ image_msaa_load is actually
+  // encoded as VSAMPLE and requires the appropriate s_waitcnt variant for 
that.
+  // Pre-gfx12 doesn't care since all vmem types result in the same s_waitcnt.
+  return BaseInfo->BVH ? VMEM_BVH
+ : BaseInfo->Sampler || BaseInfo->MSAA ? VMEM_SAMPLER
+   : VMEM_NOSAMPLER;
 }
 
 unsigned (AMDGPU::Waitcnt , InstCounterType T) {
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
index 1348315e72e7bc..8da48551855570 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
@@ -12,7 +12,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg 
%rsrc, i32 %s, i32 %t,
 ; GFX12-LABEL: load_2dmsaa:
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2], s[0:7] dmask:0x1 
dim:SQ_RSRC_IMG_2D_MSAA unorm ; encoding: 
[0x06,0x20,0x46,0xe4,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x00]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
   %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2dmsaa.v4f32.i32(i32 1, 
i32 %s, i32 %t, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
@@ -32,7 +32,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa_both(<8 x i32> 
inreg %rsrc, ptr addrsp
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2], s[0:7] dmask:0x2 
dim:SQ_RSRC_IMG_2D_MSAA unorm tfe lwe ; encoding: 
[0x0e,0x20,0x86,0xe4,0x00,0x01,0x00,0x00,0x00,0x01,0x02,0x00]
 ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: 
[0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
@@ -53,7 +53,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> 
inreg %rsrc, i32 %s, i3
 ; GFX12-LABEL: load_2darraymsaa:
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2, v3], s[0:7] dmask:0x4 
dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm ; encoding: 
[0x07,0x20,0x06,0xe5,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
   %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2darraymsaa.v4f32.i32(i32 
4, i32 %s, i32 %t, i32 %slice, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
@@ -73,7 +73,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa_tfe(<8 x i32> 
inreg %rsrc, ptr ad
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2, v3], s[0:7] dmask:0x8 
dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm tfe ; encoding: 
[0x0f,0x20,0x06,0xe6,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03]
 ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: 
[0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
@@ -94,7 +94,7 @@ define amdgpu_ps <4 x float> 

[llvm-branch-commits] [llvm] b544217 - [AMDGPU] Fix setting nontemporal in memory legalizer (#83815)

2024-04-26 Thread Jay Foad via llvm-branch-commits

Author: Mirko BrkuĊĦanin
Date: 2024-04-26T13:35:58+01:00
New Revision: b544217fb31ffafb9b072de53a28c71acc169cf8

URL: 
https://github.com/llvm/llvm-project/commit/b544217fb31ffafb9b072de53a28c71acc169cf8
DIFF: 
https://github.com/llvm/llvm-project/commit/b544217fb31ffafb9b072de53a28c71acc169cf8.diff

LOG: [AMDGPU] Fix setting nontemporal in memory legalizer (#83815)

Iterator MI can advance in insertWait() but we need original instruction
to set temporal hint. Just move it before handling volatile.

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
llvm/test/CodeGen/AMDGPU/memory-legalizer-global-nontemporal.ll
llvm/test/CodeGen/AMDGPU/memory-legalizer-local-nontemporal.ll
llvm/test/CodeGen/AMDGPU/memory-legalizer-private-nontemporal.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp 
b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 84b9330ef9633e..50d8bfa8750818 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -2358,6 +2358,11 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal(
 
   bool Changed = false;
 
+  if (IsNonTemporal) {
+// Set non-temporal hint for all cache levels.
+Changed |= setTH(MI, AMDGPU::CPol::TH_NT);
+  }
+
   if (IsVolatile) {
 Changed |= setScope(MI, AMDGPU::CPol::SCOPE_SYS);
 
@@ -2370,11 +2375,6 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal(
   Position::AFTER);
   }
 
-  if (IsNonTemporal) {
-// Set non-temporal hint for all cache levels.
-Changed |= setTH(MI, AMDGPU::CPol::TH_NT);
-  }
-
   return Changed;
 }
 

diff  --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll 
b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
index a59c0394bebe20..ca7486536cf556 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
@@ -582,5 +582,170 @@ entry:
   ret void
 }
 
+define amdgpu_kernel void @flat_nontemporal_volatile_load(
+; GFX7-LABEL: flat_nontemporal_volatile_load:
+; GFX7:   ; %bb.0: ; %entry
+; GFX7-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX7-NEXT:s_waitcnt lgkmcnt(0)
+; GFX7-NEXT:v_mov_b32_e32 v0, s0
+; GFX7-NEXT:v_mov_b32_e32 v1, s1
+; GFX7-NEXT:flat_load_dword v2, v[0:1] glc
+; GFX7-NEXT:s_waitcnt vmcnt(0)
+; GFX7-NEXT:v_mov_b32_e32 v0, s2
+; GFX7-NEXT:v_mov_b32_e32 v1, s3
+; GFX7-NEXT:s_waitcnt lgkmcnt(0)
+; GFX7-NEXT:flat_store_dword v[0:1], v2
+; GFX7-NEXT:s_endpgm
+;
+; GFX10-WGP-LABEL: flat_nontemporal_volatile_load:
+; GFX10-WGP:   ; %bb.0: ; %entry
+; GFX10-WGP-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s0
+; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s1
+; GFX10-WGP-NEXT:flat_load_dword v2, v[0:1] glc dlc
+; GFX10-WGP-NEXT:s_waitcnt vmcnt(0)
+; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s2
+; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s3
+; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-WGP-NEXT:flat_store_dword v[0:1], v2
+; GFX10-WGP-NEXT:s_endpgm
+;
+; GFX10-CU-LABEL: flat_nontemporal_volatile_load:
+; GFX10-CU:   ; %bb.0: ; %entry
+; GFX10-CU-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-CU-NEXT:v_mov_b32_e32 v0, s0
+; GFX10-CU-NEXT:v_mov_b32_e32 v1, s1
+; GFX10-CU-NEXT:flat_load_dword v2, v[0:1] glc dlc
+; GFX10-CU-NEXT:s_waitcnt vmcnt(0)
+; GFX10-CU-NEXT:v_mov_b32_e32 v0, s2
+; GFX10-CU-NEXT:v_mov_b32_e32 v1, s3
+; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-CU-NEXT:flat_store_dword v[0:1], v2
+; GFX10-CU-NEXT:s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: flat_nontemporal_volatile_load:
+; SKIP-CACHE-INV:   ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x0
+; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0)
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s0
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT:flat_load_dword v2, v[0:1] glc
+; SKIP-CACHE-INV-NEXT:s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s2
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0)
+; SKIP-CACHE-INV-NEXT:flat_store_dword v[0:1], v2
+; SKIP-CACHE-INV-NEXT:s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: flat_nontemporal_volatile_load:
+; GFX90A-NOTTGSPLIT:   ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX90A-NOTTGSPLIT-NEXT:s_waitcnt lgkmcnt(0)
+; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v0, s0
+; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v1, s1
+; GFX90A-NOTTGSPLIT-NEXT:flat_load_dword v2, v[0:1] glc
+; GFX90A-NOTTGSPLIT-NEXT:s_waitcnt vmcnt(0)

[llvm-branch-commits] [llvm] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815) (PR #90204)

2024-04-26 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned 
https://github.com/llvm/llvm-project/pull/90204
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815) (PR #90204)

2024-04-26 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/90204

Iterator MI can advance in insertWait() but we need original instruction
to set temporal hint. Just move it before handling volatile.

>From b544217fb31ffafb9b072de53a28c71acc169cf8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mirko=20Brku=C5=A1anin?= 
Date: Mon, 4 Mar 2024 15:05:31 +0100
Subject: [PATCH] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815)

Iterator MI can advance in insertWait() but we need original instruction
to set temporal hint. Just move it before handling volatile.
---
 llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp  |  10 +-
 .../memory-legalizer-flat-nontemporal.ll  | 165 ++
 .../memory-legalizer-global-nontemporal.ll| 158 ++
 .../memory-legalizer-local-nontemporal.ll | 179 +++
 .../memory-legalizer-private-nontemporal.ll   | 203 ++
 5 files changed, 710 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp 
b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 84b9330ef9633e..50d8bfa8750818 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -2358,6 +2358,11 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal(
 
   bool Changed = false;
 
+  if (IsNonTemporal) {
+// Set non-temporal hint for all cache levels.
+Changed |= setTH(MI, AMDGPU::CPol::TH_NT);
+  }
+
   if (IsVolatile) {
 Changed |= setScope(MI, AMDGPU::CPol::SCOPE_SYS);
 
@@ -2370,11 +2375,6 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal(
   Position::AFTER);
   }
 
-  if (IsNonTemporal) {
-// Set non-temporal hint for all cache levels.
-Changed |= setTH(MI, AMDGPU::CPol::TH_NT);
-  }
-
   return Changed;
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll 
b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
index a59c0394bebe20..ca7486536cf556 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
@@ -582,5 +582,170 @@ entry:
   ret void
 }
 
+define amdgpu_kernel void @flat_nontemporal_volatile_load(
+; GFX7-LABEL: flat_nontemporal_volatile_load:
+; GFX7:   ; %bb.0: ; %entry
+; GFX7-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX7-NEXT:s_waitcnt lgkmcnt(0)
+; GFX7-NEXT:v_mov_b32_e32 v0, s0
+; GFX7-NEXT:v_mov_b32_e32 v1, s1
+; GFX7-NEXT:flat_load_dword v2, v[0:1] glc
+; GFX7-NEXT:s_waitcnt vmcnt(0)
+; GFX7-NEXT:v_mov_b32_e32 v0, s2
+; GFX7-NEXT:v_mov_b32_e32 v1, s3
+; GFX7-NEXT:s_waitcnt lgkmcnt(0)
+; GFX7-NEXT:flat_store_dword v[0:1], v2
+; GFX7-NEXT:s_endpgm
+;
+; GFX10-WGP-LABEL: flat_nontemporal_volatile_load:
+; GFX10-WGP:   ; %bb.0: ; %entry
+; GFX10-WGP-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s0
+; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s1
+; GFX10-WGP-NEXT:flat_load_dword v2, v[0:1] glc dlc
+; GFX10-WGP-NEXT:s_waitcnt vmcnt(0)
+; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s2
+; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s3
+; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-WGP-NEXT:flat_store_dword v[0:1], v2
+; GFX10-WGP-NEXT:s_endpgm
+;
+; GFX10-CU-LABEL: flat_nontemporal_volatile_load:
+; GFX10-CU:   ; %bb.0: ; %entry
+; GFX10-CU-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-CU-NEXT:v_mov_b32_e32 v0, s0
+; GFX10-CU-NEXT:v_mov_b32_e32 v1, s1
+; GFX10-CU-NEXT:flat_load_dword v2, v[0:1] glc dlc
+; GFX10-CU-NEXT:s_waitcnt vmcnt(0)
+; GFX10-CU-NEXT:v_mov_b32_e32 v0, s2
+; GFX10-CU-NEXT:v_mov_b32_e32 v1, s3
+; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-CU-NEXT:flat_store_dword v[0:1], v2
+; GFX10-CU-NEXT:s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: flat_nontemporal_volatile_load:
+; SKIP-CACHE-INV:   ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x0
+; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0)
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s0
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT:flat_load_dword v2, v[0:1] glc
+; SKIP-CACHE-INV-NEXT:s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s2
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0)
+; SKIP-CACHE-INV-NEXT:flat_store_dword v[0:1], v2
+; SKIP-CACHE-INV-NEXT:s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: flat_nontemporal_volatile_load:
+; GFX90A-NOTTGSPLIT:   ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX90A-NOTTGSPLIT-NEXT:s_waitcnt lgkmcnt(0)
+; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v0, s0
+; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v1, s1
+; GFX90A-NOTTGSPLIT-NEXT:flat_load_dword v2, v[0:1] glc
+; 

[llvm-branch-commits] [llvm] release/18.x: Convert many LivePhysRegs uses to LiveRegUnits (PR #84118)

2024-03-07 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad requested changes to this pull request.

> this isn't fixing any known correctness issue

Exactly. I don't think there is any reason to backport this.

https://github.com/llvm/llvm-project/pull/84118
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)

2024-01-29 Thread Jay Foad via llvm-branch-commits


@@ -6883,6 +6883,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr 
,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr ,
+ MachineIRBuilder ) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!ST.hasArchitectedSGPRs())
+return false;
+  LLT S32 = LLT::scalar(32);
+  Register DstReg = MI.getOperand(0).getReg();
+  Register TTMP8 =
+  getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8,

jayfoad wrote:

True, 66c710ec9dcdbdec6cadd89b972d8945983dc92f improved this to avoid adding 
liveins. I wasn't going to bother backporting that since I didn't think it was 
required for correctness. But I have cherry-picked it into this PR now.

https://github.com/llvm/llvm-project/pull/79839
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)

2024-01-29 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/79839

>From c265c8527285075a58b2425198dbd4cca8b69477 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Thu, 25 Jan 2024 07:48:06 +
Subject: [PATCH 1/2] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)

This is only valid on targets with architected SGPRs.
---
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  4 ++
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 19 ++
 llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h  |  1 +
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 14 +
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |  1 +
 .../CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll | 61 +++
 6 files changed, 100 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll

diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 9eb1ac8e27befb..c5f43d17d1c148 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2777,6 +2777,10 @@ class AMDGPULoadTr:
 
 def int_amdgcn_global_load_tr : AMDGPULoadTr;
 
+// i32 @llvm.amdgcn.wave.id()
+def int_amdgcn_wave_id :
+  DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
+
 
//===--===//
 // Deep learning intrinsics.
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 615685822f91ee..e98ede88a7e2db 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -6883,6 +6883,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr 
,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr ,
+ MachineIRBuilder ) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!ST.hasArchitectedSGPRs())
+return false;
+  LLT S32 = LLT::scalar(32);
+  Register DstReg = MI.getOperand(0).getReg();
+  Register TTMP8 =
+  getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8,
+   AMDGPU::SReg_32RegClass, B.getDebugLoc(), S32);
+  auto LSB = B.buildConstant(S32, 25);
+  auto Width = B.buildConstant(S32, 5);
+  B.buildUbfx(DstReg, TTMP8, LSB, Width);
+  MI.eraseFromParent();
+  return true;
+}
+
 bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper ,
 MachineInstr ) const {
   MachineIRBuilder  = Helper.MIRBuilder;
@@ -7005,6 +7022,8 @@ bool 
AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper ,
   case Intrinsic::amdgcn_workgroup_id_z:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::WORKGROUP_ID_Z);
+  case Intrinsic::amdgcn_wave_id:
+return legalizeWaveID(MI, B);
   case Intrinsic::amdgcn_lds_kernel_id:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::LDS_KERNEL_ID);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
index 56aabd4f6ab71b..ecbe42681c6690 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
@@ -212,6 +212,7 @@ class AMDGPULegalizerInfo final : public LegalizerInfo {
 
   bool legalizeFPTruncRound(MachineInstr , MachineIRBuilder ) const;
   bool legalizeStackSave(MachineInstr , MachineIRBuilder ) const;
+  bool legalizeWaveID(MachineInstr , MachineIRBuilder ) const;
 
   bool legalizeImageIntrinsic(
   MachineInstr , MachineIRBuilder ,
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index d60f511302613e..c5ad9da88ec2b3 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7920,6 +7920,18 @@ SDValue SITargetLowering::lowerSBuffer(EVT VT, SDLoc DL, 
SDValue Rsrc,
   return Loads[0];
 }
 
+SDValue SITargetLowering::lowerWaveID(SelectionDAG , SDValue Op) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!Subtarget->hasArchitectedSGPRs())
+return {};
+  SDLoc SL(Op);
+  MVT VT = MVT::i32;
+  SDValue TTMP8 = CreateLiveInRegister(DAG, ::SReg_32RegClass,
+   AMDGPU::TTMP8, VT, SL);
+  return DAG.getNode(AMDGPUISD::BFE_U32, SL, VT, TTMP8,
+ DAG.getConstant(25, SL, VT), DAG.getConstant(5, SL, VT));
+}
+
 SDValue SITargetLowering::lowerWorkitemID(SelectionDAG , SDValue Op,
   unsigned Dim,
   const ArgDescriptor ) const {
@@ -8090,6 +8102,8 @@ SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue 
Op,
   case Intrinsic::amdgcn_workgroup_id_z:
 return getPreloadedValue(DAG, *MFI, 

[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-29 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> jayfoad closed this by deleting the head repository 3 hours ago

Sorry. Recreated as #79839

https://github.com/llvm/llvm-project/pull/79689
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)

2024-01-29 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned 
https://github.com/llvm/llvm-project/pull/79839
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)

2024-01-29 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/79839

This just missed the branch creation and is the last piece of functionality 
required to get AMDGPU GFX12 support working in the 18.x release.



>From c265c8527285075a58b2425198dbd4cca8b69477 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Thu, 25 Jan 2024 07:48:06 +
Subject: [PATCH] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)

This is only valid on targets with architected SGPRs.
---
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  4 ++
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 19 ++
 llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h  |  1 +
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 14 +
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |  1 +
 .../CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll | 61 +++
 6 files changed, 100 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll

diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 9eb1ac8e27befb1..c5f43d17d1c1481 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2777,6 +2777,10 @@ class AMDGPULoadTr:
 
 def int_amdgcn_global_load_tr : AMDGPULoadTr;
 
+// i32 @llvm.amdgcn.wave.id()
+def int_amdgcn_wave_id :
+  DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
+
 
//===--===//
 // Deep learning intrinsics.
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 615685822f91eeb..e98ede88a7e2db9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -6883,6 +6883,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr 
,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr ,
+ MachineIRBuilder ) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!ST.hasArchitectedSGPRs())
+return false;
+  LLT S32 = LLT::scalar(32);
+  Register DstReg = MI.getOperand(0).getReg();
+  Register TTMP8 =
+  getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8,
+   AMDGPU::SReg_32RegClass, B.getDebugLoc(), S32);
+  auto LSB = B.buildConstant(S32, 25);
+  auto Width = B.buildConstant(S32, 5);
+  B.buildUbfx(DstReg, TTMP8, LSB, Width);
+  MI.eraseFromParent();
+  return true;
+}
+
 bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper ,
 MachineInstr ) const {
   MachineIRBuilder  = Helper.MIRBuilder;
@@ -7005,6 +7022,8 @@ bool 
AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper ,
   case Intrinsic::amdgcn_workgroup_id_z:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::WORKGROUP_ID_Z);
+  case Intrinsic::amdgcn_wave_id:
+return legalizeWaveID(MI, B);
   case Intrinsic::amdgcn_lds_kernel_id:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::LDS_KERNEL_ID);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
index 56aabd4f6ab71b6..ecbe42681c6690c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
@@ -212,6 +212,7 @@ class AMDGPULegalizerInfo final : public LegalizerInfo {
 
   bool legalizeFPTruncRound(MachineInstr , MachineIRBuilder ) const;
   bool legalizeStackSave(MachineInstr , MachineIRBuilder ) const;
+  bool legalizeWaveID(MachineInstr , MachineIRBuilder ) const;
 
   bool legalizeImageIntrinsic(
   MachineInstr , MachineIRBuilder ,
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index d60f511302613e1..c5ad9da88ec2b31 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7920,6 +7920,18 @@ SDValue SITargetLowering::lowerSBuffer(EVT VT, SDLoc DL, 
SDValue Rsrc,
   return Loads[0];
 }
 
+SDValue SITargetLowering::lowerWaveID(SelectionDAG , SDValue Op) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!Subtarget->hasArchitectedSGPRs())
+return {};
+  SDLoc SL(Op);
+  MVT VT = MVT::i32;
+  SDValue TTMP8 = CreateLiveInRegister(DAG, ::SReg_32RegClass,
+   AMDGPU::TTMP8, VT, SL);
+  return DAG.getNode(AMDGPUISD::BFE_U32, SL, VT, TTMP8,
+ DAG.getConstant(25, SL, VT), DAG.getConstant(5, SL, VT));
+}
+
 SDValue SITargetLowering::lowerWorkitemID(SelectionDAG , SDValue Op,
   unsigned Dim,
   const ArgDescriptor ) const {
@@ -8090,6 +8102,8 

[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-29 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/79689
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-29 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

@tstellar does this backport PR look OK? I created it with `gh pr create -f -B 
release/18.x` and I wasn't sure if I had to edit anything, apart from adding 
the release milestone.

https://github.com/llvm/llvm-project/pull/79689
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-27 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/79689
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-27 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned 
https://github.com/llvm/llvm-project/pull/79689
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-27 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/79689

This is only valid on targets with architected SGPRs.

>From c5949b09b05e7417d0494b2301781b84d22b95ef Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Thu, 25 Jan 2024 07:48:06 +
Subject: [PATCH] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)

This is only valid on targets with architected SGPRs.
---
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  4 ++
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 19 ++
 llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h  |  1 +
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 14 +
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |  1 +
 .../CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll | 61 +++
 6 files changed, 100 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll

diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 9eb1ac8e27befb..c5f43d17d1c148 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2777,6 +2777,10 @@ class AMDGPULoadTr:
 
 def int_amdgcn_global_load_tr : AMDGPULoadTr;
 
+// i32 @llvm.amdgcn.wave.id()
+def int_amdgcn_wave_id :
+  DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
+
 
//===--===//
 // Deep learning intrinsics.
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 32921bb248caf0..118c8b7c66690f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -6848,6 +6848,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr 
,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr ,
+ MachineIRBuilder ) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!ST.hasArchitectedSGPRs())
+return false;
+  LLT S32 = LLT::scalar(32);
+  Register DstReg = MI.getOperand(0).getReg();
+  Register TTMP8 =
+  getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8,
+   AMDGPU::SReg_32RegClass, B.getDebugLoc(), S32);
+  auto LSB = B.buildConstant(S32, 25);
+  auto Width = B.buildConstant(S32, 5);
+  B.buildUbfx(DstReg, TTMP8, LSB, Width);
+  MI.eraseFromParent();
+  return true;
+}
+
 bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper ,
 MachineInstr ) const {
   MachineIRBuilder  = Helper.MIRBuilder;
@@ -6970,6 +6987,8 @@ bool 
AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper ,
   case Intrinsic::amdgcn_workgroup_id_z:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::WORKGROUP_ID_Z);
+  case Intrinsic::amdgcn_wave_id:
+return legalizeWaveID(MI, B);
   case Intrinsic::amdgcn_lds_kernel_id:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::LDS_KERNEL_ID);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
index 56aabd4f6ab71b..ecbe42681c6690 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
@@ -212,6 +212,7 @@ class AMDGPULegalizerInfo final : public LegalizerInfo {
 
   bool legalizeFPTruncRound(MachineInstr , MachineIRBuilder ) const;
   bool legalizeStackSave(MachineInstr , MachineIRBuilder ) const;
+  bool legalizeWaveID(MachineInstr , MachineIRBuilder ) const;
 
   bool legalizeImageIntrinsic(
   MachineInstr , MachineIRBuilder ,
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index d35b76c8ad54eb..9cbcf0012ea878 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7890,6 +7890,18 @@ SDValue SITargetLowering::lowerSBuffer(EVT VT, SDLoc DL, 
SDValue Rsrc,
   return Loads[0];
 }
 
+SDValue SITargetLowering::lowerWaveID(SelectionDAG , SDValue Op) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!Subtarget->hasArchitectedSGPRs())
+return {};
+  SDLoc SL(Op);
+  MVT VT = MVT::i32;
+  SDValue TTMP8 = CreateLiveInRegister(DAG, ::SReg_32RegClass,
+   AMDGPU::TTMP8, VT, SL);
+  return DAG.getNode(AMDGPUISD::BFE_U32, SL, VT, TTMP8,
+ DAG.getConstant(25, SL, VT), DAG.getConstant(5, SL, VT));
+}
+
 SDValue SITargetLowering::lowerWorkitemID(SelectionDAG , SDValue Op,
   unsigned Dim,
   const ArgDescriptor ) const {
@@ -8060,6 +8072,8 @@ SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue 
Op,
   case 

[llvm-branch-commits] [llvm] PR for llvm/llvm-project#79451 (PR #79457)

2024-01-25 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> @jayfoad What do you think about merging this PR to the release branch?

LGTM, but it was me that requested it.

https://github.com/llvm/llvm-project/pull/79457
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 14eea6b - [LegacyPM] Update InversedLastUser on the fly. NFC.

2021-01-22 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-22T09:48:54Z
New Revision: 14eea6b0ecddfe7d1c68754a8bfb7c21cde82df8

URL: 
https://github.com/llvm/llvm-project/commit/14eea6b0ecddfe7d1c68754a8bfb7c21cde82df8
DIFF: 
https://github.com/llvm/llvm-project/commit/14eea6b0ecddfe7d1c68754a8bfb7c21cde82df8.diff

LOG: [LegacyPM] Update InversedLastUser on the fly. NFC.

This speeds up setLastUser enough to give a 5% to 10% speed up on
trivial invocations of opt and llc, as measured by:

perf stat -r 100 opt -S -o /dev/null -O3 /dev/null
perf stat -r 100 llc -march=amdgcn /dev/null -filetype null

Don't dump last use information unless -debug-pass=Details to avoid
printing lots of spam that will break some existing lit tests. Before
this patch, dumping last use information was broken anyway, because it
used InversedLastUser before it had been populated.

Differential Revision: https://reviews.llvm.org/D92309

Added: 


Modified: 
llvm/include/llvm/IR/LegacyPassManagers.h
llvm/lib/IR/LegacyPassManager.cpp

Removed: 




diff  --git a/llvm/include/llvm/IR/LegacyPassManagers.h 
b/llvm/include/llvm/IR/LegacyPassManagers.h
index 498e736a0100..f4fae184e428 100644
--- a/llvm/include/llvm/IR/LegacyPassManagers.h
+++ b/llvm/include/llvm/IR/LegacyPassManagers.h
@@ -230,11 +230,11 @@ class PMTopLevelManager {
 
   // Map to keep track of last user of the analysis pass.
   // LastUser->second is the last user of Lastuser->first.
+  // This is kept in sync with InversedLastUser.
   DenseMap LastUser;
 
   // Map to keep track of passes that are last used by a pass.
-  // This inverse map is initialized at PM->run() based on
-  // LastUser map.
+  // This is kept in sync with LastUser.
   DenseMap > InversedLastUser;
 
   /// Immutable passes are managed by top level manager.

diff  --git a/llvm/lib/IR/LegacyPassManager.cpp 
b/llvm/lib/IR/LegacyPassManager.cpp
index 5575bc469a87..4547c3a01239 100644
--- a/llvm/lib/IR/LegacyPassManager.cpp
+++ b/llvm/lib/IR/LegacyPassManager.cpp
@@ -568,7 +568,12 @@ PMTopLevelManager::setLastUser(ArrayRef 
AnalysisPasses, Pass *P) {
 PDepth = P->getResolver()->getPMDataManager().getDepth();
 
   for (Pass *AP : AnalysisPasses) {
-LastUser[AP] = P;
+// Record P as the new last user of AP.
+auto  = LastUser[AP];
+if (LastUserOfAP)
+  InversedLastUser[LastUserOfAP].erase(AP);
+LastUserOfAP = P;
+InversedLastUser[P].insert(AP);
 
 if (P == AP)
   continue;
@@ -598,13 +603,13 @@ PMTopLevelManager::setLastUser(ArrayRef 
AnalysisPasses, Pass *P) {
 if (P->getResolver())
   setLastUser(LastPMUses, 
P->getResolver()->getPMDataManager().getAsPass());
 
-
 // If AP is the last user of other passes then make P last user of
 // such passes.
-for (auto  : LastUser) {
-  if (LU.second == AP)
-LU.second = P;
-}
+auto  = InversedLastUser[AP];
+for (Pass *L : LastUsedByAP)
+  LastUser[L] = P;
+InversedLastUser[P].insert(LastUsedByAP.begin(), LastUsedByAP.end());
+LastUsedByAP.clear();
   }
 }
 
@@ -850,11 +855,6 @@ void PMTopLevelManager::initializeAllAnalysisInfo() {
   // Initailize other pass managers
   for (PMDataManager *IPM : IndirectPassManagers)
 IPM->initializeAnalysisInfo();
-
-  for (auto LU : LastUser) {
-SmallPtrSet  = InversedLastUser[LU.second];
-L.insert(LU.first);
-  }
 }
 
 /// Destructor
@@ -1151,6 +1151,8 @@ Pass *PMDataManager::findAnalysisPass(AnalysisID AID, 
bool SearchParent) {
 
 // Print list of passes that are last used by P.
 void PMDataManager::dumpLastUses(Pass *P, unsigned Offset) const{
+  if (PassDebugging < Details)
+return;
 
   SmallVector LUses;
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] c0b3c5a - [AMDGPU][GlobalISel] Run SIAddImgInit

2021-01-21 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-21T15:54:54Z
New Revision: c0b3c5a06451aad4351e35c74ccf2fe5da917a41

URL: 
https://github.com/llvm/llvm-project/commit/c0b3c5a06451aad4351e35c74ccf2fe5da917a41
DIFF: 
https://github.com/llvm/llvm-project/commit/c0b3c5a06451aad4351e35c74ccf2fe5da917a41.diff

LOG: [AMDGPU][GlobalISel] Run SIAddImgInit

This pass is required to get correct codegen for image instructions with
the tfe or lwe bits set.

Differential Revision: https://reviews.llvm.org/D95132

Added: 


Modified: 
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.2d.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.2darraymsaa.a16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.2darraymsaa.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.3d.a16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.3d.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 58c436836d19..7d8e8486602b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -1109,6 +1109,10 @@ bool GCNPassConfig::addRegBankSelect() {
 
 bool GCNPassConfig::addGlobalInstructionSelect() {
   addPass(new InstructionSelect());
+  // TODO: Fix instruction selection to do the right thing for image
+  // instructions with tfe or lwe in the first place, instead of running a
+  // separate pass to fix them up?
+  addPass(createSIAddIMGInitPass());
   return false;
 }
 

diff  --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll
index 36f3e63598ca..99ab3580b91d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll
@@ -655,6 +655,7 @@ define amdgpu_ps <4 x half> @load_1d_v4f16_xyzw(<8 x i32> 
inreg %rsrc, i32 %s) {
 define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> inreg %rsrc, i32 %s) 
{
 ; GFX8-UNPACKED-LABEL: load_1d_f16_tfe_dmask_x:
 ; GFX8-UNPACKED:   ; %bb.0:
+; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v1, 0
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s0, s2
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s1, s3
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s2, s4
@@ -663,13 +664,15 @@ define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> 
inreg %rsrc, i32 %s) {
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s5, s7
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s6, s8
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s7, s9
-; GFX8-UNPACKED-NEXT:image_load v[0:1], v0, s[0:7] dmask:0x1 unorm tfe d16
+; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v2, v1
+; GFX8-UNPACKED-NEXT:image_load v[1:2], v0, s[0:7] dmask:0x1 unorm tfe d16
 ; GFX8-UNPACKED-NEXT:s_waitcnt vmcnt(0)
-; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v0, v1
+; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v0, v2
 ; GFX8-UNPACKED-NEXT:; return to shader part epilog
 ;
 ; GFX8-PACKED-LABEL: load_1d_f16_tfe_dmask_x:
 ; GFX8-PACKED:   ; %bb.0:
+; GFX8-PACKED-NEXT:v_mov_b32_e32 v1, 0
 ; GFX8-PACKED-NEXT:s_mov_b32 s0, s2
 ; GFX8-PACKED-NEXT:s_mov_b32 s1, s3
 ; GFX8-PACKED-NEXT:s_mov_b32 s2, s4
@@ -678,13 +681,15 @@ define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> 
inreg %rsrc, i32 %s) {
 ; GFX8-PACKED-NEXT:s_mov_b32 s5, s7
 ; GFX8-PACKED-NEXT:s_mov_b32 s6, s8
 ; GFX8-PACKED-NEXT:s_mov_b32 s7, s9
-; GFX8-PACKED-NEXT:image_load v[0:1], v0, s[0:7] dmask:0x1 unorm tfe d16
+; GFX8-PACKED-NEXT:v_mov_b32_e32 v2, v1
+; GFX8-PACKED-NEXT:image_load v[1:2], v0, s[0:7] dmask:0x1 unorm tfe d16
 ; GFX8-PACKED-NEXT:s_waitcnt vmcnt(0)
-; GFX8-PACKED-NEXT:v_mov_b32_e32 v0, v1
+; GFX8-PACKED-NEXT:v_mov_b32_e32 v0, v2
 ; GFX8-PACKED-NEXT:; return to shader part epilog
 ;
 ; GFX9-LABEL: load_1d_f16_tfe_dmask_x:
 ; GFX9:   ; %bb.0:
+; GFX9-NEXT:v_mov_b32_e32 v1, 0
 ; GFX9-NEXT:s_mov_b32 s0, s2
 ; GFX9-NEXT:s_mov_b32 s1, s3
 ; GFX9-NEXT:s_mov_b32 s2, s4
@@ -693,13 +698,15 @@ define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> 
inreg %rsrc, i32 %s) {
 ; GFX9-NEXT:s_mov_b32 s5, s7
 ; GFX9-NEXT:s_mov_b32 s6, s8
 ; GFX9-NEXT:s_mov_b32 s7, s9
-; GFX9-NEXT:image_load v[0:1], v0, s[0:7] dmask:0x1 unorm tfe d16
+; GFX9-NEXT:v_mov_b32_e32 v2, v1
+; GFX9-NEXT:image_load v[1:2], v0, s[0:7] dmask:0x1 unorm tfe d16
 ; GFX9-NEXT:s_waitcnt vmcnt(0)
-; GFX9-NEXT:v_mov_b32_e32 v0, v1
+; GFX9-NEXT:v_mov_b32_e32 v0, v2
 ; GFX9-NEXT:; return to shader part epilog
 ;
 ; GFX10-LABEL: load_1d_f16_tfe_dmask_x:
 ; GFX10:   ; %bb.0:
+; GFX10-NEXT:v_mov_b32_e32 v1, 0
 ; GFX10-NEXT:

[llvm-branch-commits] [llvm] 18cb744 - [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.

2021-01-19 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-19T18:47:14Z
New Revision: 18cb7441b69a22565dcc340bac0e58bc9f301439

URL: 
https://github.com/llvm/llvm-project/commit/18cb7441b69a22565dcc340bac0e58bc9f301439
DIFF: 
https://github.com/llvm/llvm-project/commit/18cb7441b69a22565dcc340bac0e58bc9f301439.diff

LOG: [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.

Rename the *_gfx9_gfx10 ttmp registers to *_gfx9plus for simplicity,
and use the corresponding isGFX9Plus predicate to decide when to use
them instead of the old *_vi versions.

Differential Revision: https://reviews.llvm.org/D94975

Added: 


Modified: 
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
llvm/lib/Target/AMDGPU/SIDefines.h
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp 
b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 7f68174e506d..08b340c8fd66 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -997,8 +997,8 @@ unsigned AMDGPUDisassembler::getTtmpClassId(const OpWidthTy 
Width) const {
 int AMDGPUDisassembler::getTTmpIdx(unsigned Val) const {
   using namespace AMDGPU::EncValues;
 
-  unsigned TTmpMin = isGFX9Plus() ? TTMP_GFX9_GFX10_MIN : TTMP_VI_MIN;
-  unsigned TTmpMax = isGFX9Plus() ? TTMP_GFX9_GFX10_MAX : TTMP_VI_MAX;
+  unsigned TTmpMin = isGFX9Plus() ? TTMP_GFX9PLUS_MIN : TTMP_VI_MIN;
+  unsigned TTmpMax = isGFX9Plus() ? TTMP_GFX9PLUS_MAX : TTMP_VI_MAX;
 
   return (TTmpMin <= Val && Val <= TTmpMax)? Val - TTmpMin : -1;
 }

diff  --git a/llvm/lib/Target/AMDGPU/SIDefines.h 
b/llvm/lib/Target/AMDGPU/SIDefines.h
index b9a2bcf81903..f7555f0453bb 100644
--- a/llvm/lib/Target/AMDGPU/SIDefines.h
+++ b/llvm/lib/Target/AMDGPU/SIDefines.h
@@ -247,8 +247,8 @@ enum : unsigned {
   SGPR_MAX_GFX10 = 105,
   TTMP_VI_MIN = 112,
   TTMP_VI_MAX = 123,
-  TTMP_GFX9_GFX10_MIN = 108,
-  TTMP_GFX9_GFX10_MAX = 123,
+  TTMP_GFX9PLUS_MIN = 108,
+  TTMP_GFX9PLUS_MAX = 123,
   INLINE_INTEGER_C_MIN = 128,
   INLINE_INTEGER_C_POSITIVE_MAX = 192, // 64
   INLINE_INTEGER_C_MAX = 208,

diff  --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index 378fc5df21e5..92390f1f3297 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -246,9 +246,9 @@ def TMA : RegisterWithSubRegs<"tma", [TMA_LO, TMA_HI]> {
 }
 
 foreach Index = 0...15 in {
-  defm TTMP#Index#_vi : SIRegLoHi16<"ttmp"#Index, !add(112, Index)>;
-  defm TTMP#Index#_gfx9_gfx10 : SIRegLoHi16<"ttmp"#Index, !add(108, Index)>;
-  defm TTMP#Index : SIRegLoHi16<"ttmp"#Index, 0>;
+  defm TTMP#Index#_vi   : SIRegLoHi16<"ttmp"#Index, !add(112, Index)>;
+  defm TTMP#Index#_gfx9plus : SIRegLoHi16<"ttmp"#Index, !add(108, Index)>;
+  defm TTMP#Index   : SIRegLoHi16<"ttmp"#Index, 0>;
 }
 
 multiclass FLAT_SCR_LOHI_m  ci_e, bits<16> vi_e> {
@@ -419,8 +419,8 @@ class TmpRegTuples.ret>;
 
 foreach Index = {0, 2, 4, 6, 8, 10, 12, 14} in {
-  def TTMP#Index#_TTMP#!add(Index,1)#_vi : TmpRegTuples<"_vi",   2, 
Index>;
-  def TTMP#Index#_TTMP#!add(Index,1)#_gfx9_gfx10 : TmpRegTuples<"_gfx9_gfx10", 
2, Index>;
+  def TTMP#Index#_TTMP#!add(Index,1)#_vi   : TmpRegTuples<"_vi",   2, 
Index>;
+  def TTMP#Index#_TTMP#!add(Index,1)#_gfx9plus : TmpRegTuples<"_gfx9plus", 2, 
Index>;
 }
 
 foreach Index = {0, 4, 8, 12} in {
@@ -429,7 +429,7 @@ foreach Index = {0, 4, 8, 12} in {
  _TTMP#!add(Index,3)#_vi : TmpRegTuples<"_vi",   4, Index>;
   def TTMP#Index#_TTMP#!add(Index,1)#
  _TTMP#!add(Index,2)#
- _TTMP#!add(Index,3)#_gfx9_gfx10 : TmpRegTuples<"_gfx9_gfx10", 
4, Index>;
+ _TTMP#!add(Index,3)#_gfx9plus : TmpRegTuples<"_gfx9plus", 4, 
Index>;
 }
 
 foreach Index = {0, 4, 8} in {
@@ -446,7 +446,7 @@ foreach Index = {0, 4, 8} in {
  _TTMP#!add(Index,4)#
  _TTMP#!add(Index,5)#
  _TTMP#!add(Index,6)#
- _TTMP#!add(Index,7)#_gfx9_gfx10 : TmpRegTuples<"_gfx9_gfx10", 
8, Index>;
+ _TTMP#!add(Index,7)#_gfx9plus : TmpRegTuples<"_gfx9plus", 8, 
Index>;
 }
 
 def 
TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TTMP12_TTMP13_TTMP14_TTMP15_vi
 :
@@ -456,12 +456,12 @@ def 
TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TT
 TTMP8_vi, TTMP9_vi, TTMP10_vi, TTMP11_vi,
 TTMP12_vi, TTMP13_vi, TTMP14_vi, TTMP15_vi]>;
 
-def 
TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TTMP12_TTMP13_TTMP14_TTMP15_gfx9_gfx10
 :
+def 

[llvm-branch-commits] [llvm] 0808c70 - [AMDGPU] Fix test case for D94010

2021-01-19 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-19T16:46:47Z
New Revision: 0808c7009a06773e78772c7b74d254fd3572f0ea

URL: 
https://github.com/llvm/llvm-project/commit/0808c7009a06773e78772c7b74d254fd3572f0ea
DIFF: 
https://github.com/llvm/llvm-project/commit/0808c7009a06773e78772c7b74d254fd3572f0ea.diff

LOG: [AMDGPU] Fix test case for D94010

Added: 


Modified: 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll

Removed: 




diff  --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
index 8df0215a6fe2..5c333f0ce97d 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | FileCheck 
-check-prefixes=GCN,SDAG %s
-; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | 
FileCheck -check-prefixes=GCN,GISEL %s
+; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | FileCheck 
-check-prefix=GCN %s
+; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | 
FileCheck -check-prefix=GCN %s
 
 define float @v_fma(float %a, float %b, float %c)  {
 ; GCN-LABEL: v_fma:



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] de2f942 - [AMDGPU] Simplify test case for D94010

2021-01-19 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-19T16:36:43Z
New Revision: de2f9423995d52a5457752256815dc54d317c8d1

URL: 
https://github.com/llvm/llvm-project/commit/de2f9423995d52a5457752256815dc54d317c8d1
DIFF: 
https://github.com/llvm/llvm-project/commit/de2f9423995d52a5457752256815dc54d317c8d1.diff

LOG: [AMDGPU] Simplify test case for D94010

Added: 


Modified: 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll

Removed: 




diff  --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
index 03584312e2af..8df0215a6fe2 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
@@ -10,7 +10,6 @@ define float @v_fma(float %a, float %b, float %c)  {
 ; GCN-NEXT:v_fmac_legacy_f32_e64 v2, v0, v1
 ; GCN-NEXT:v_mov_b32_e32 v0, v2
 ; GCN-NEXT:s_setpc_b64 s[30:31]
-;
   %fma = call float @llvm.amdgcn.fma.legacy(float %a, float %b, float %c)
   ret float %fma
 }
@@ -22,7 +21,6 @@ define float @v_fabs_fma(float %a, float %b, float %c)  {
 ; GCN-NEXT:s_waitcnt_vscnt null, 0x0
 ; GCN-NEXT:v_fma_legacy_f32 v0, |v0|, v1, v2
 ; GCN-NEXT:s_setpc_b64 s[30:31]
-;
   %fabs.a = call float @llvm.fabs.f32(float %a)
   %fma = call float @llvm.amdgcn.fma.legacy(float %fabs.a, float %b, float %c)
   ret float %fma
@@ -35,7 +33,6 @@ define float @v_fneg_fabs_fma(float %a, float %b, float %c)  {
 ; GCN-NEXT:s_waitcnt_vscnt null, 0x0
 ; GCN-NEXT:v_fma_legacy_f32 v0, v0, -|v1|, v2
 ; GCN-NEXT:s_setpc_b64 s[30:31]
-;
   %fabs.b = call float @llvm.fabs.f32(float %b)
   %neg.fabs.b = fneg float %fabs.b
   %fma = call float @llvm.amdgcn.fma.legacy(float %a, float %neg.fabs.b, float 
%c)
@@ -49,92 +46,21 @@ define float @v_fneg_fma(float %a, float %b, float %c)  {
 ; GCN-NEXT:s_waitcnt_vscnt null, 0x0
 ; GCN-NEXT:v_fma_legacy_f32 v0, v0, v1, -v2
 ; GCN-NEXT:s_setpc_b64 s[30:31]
-;
   %neg.c = fneg float %c
   %fma = call float @llvm.amdgcn.fma.legacy(float %a, float %b, float %neg.c)
   ret float %fma
 }
 
-define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, 
float, float, float, float, float, float, float, float, float, float, float }> 
@main(<4 x i32> addrspace(6)* inreg noalias align 32 
dereferenceable(18446744073709551615) %arg, <8 x i32> addrspace(6)* inreg 
noalias align 32 dereferenceable(18446744073709551615) %arg1, <4 x i32> 
addrspace(6)* inreg noalias align 32 dereferenceable(18446744073709551615) 
%arg2, <8 x i32> addrspace(6)* inreg noalias align 32 
dereferenceable(18446744073709551615) %arg3, i32 inreg %arg4, i32 inreg %arg5, 
<2 x i32> %arg6, <2 x i32> %arg7, <2 x i32> %arg8, <3 x i32> %arg9, <2 x i32> 
%arg10, <2 x i32> %arg11, <2 x i32> %arg12, <3 x float> %arg13, float %arg14, 
float %arg15, float %arg16, float %arg17, i32 %arg18, i32 %arg19, float %arg20, 
i32 %arg21) #0 {
-; SDAG-LABEL: main:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:s_mov_b32 s16, exec_lo
-; SDAG-NEXT:v_mov_b32_e32 v14, v2
-; SDAG-NEXT:s_mov_b32 s0, s5
-; SDAG-NEXT:s_wqm_b32 exec_lo, exec_lo
-; SDAG-NEXT:s_mov_b32 s1, 0
-; SDAG-NEXT:s_mov_b32 m0, s7
-; SDAG-NEXT:s_clause 0x1
-; SDAG-NEXT:s_load_dwordx8 s[8:15], s[0:1], 0x400
-; SDAG-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x430
-; SDAG-NEXT:v_interp_p1_f32_e32 v2, v0, attr0.x
-; SDAG-NEXT:v_interp_p1_f32_e32 v3, v0, attr0.y
-; SDAG-NEXT:s_mov_b32 s4, s6
-; SDAG-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x
-; SDAG-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y
-; SDAG-NEXT:s_and_b32 exec_lo, exec_lo, s16
-; SDAG-NEXT:s_waitcnt lgkmcnt(0)
-; SDAG-NEXT:image_sample v[0:3], v[2:3], s[8:15], s[0:3] dmask:0xf 
dim:SQ_RSRC_IMG_2D
-; SDAG-NEXT:s_waitcnt vmcnt(0)
-; SDAG-NEXT:v_fma_legacy_f32 v0, v0, 2.0, -1.0
-; SDAG-NEXT:v_fma_legacy_f32 v1, v1, 2.0, -1.0
-; SDAG-NEXT:; return to shader part epilog
-;
-; GISEL-LABEL: main:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:s_mov_b32 s16, exec_lo
-; GISEL-NEXT:s_mov_b32 s4, s6
-; GISEL-NEXT:s_mov_b32 m0, s7
-; GISEL-NEXT:s_wqm_b32 exec_lo, exec_lo
-; GISEL-NEXT:s_add_u32 s0, s5, 0x400
-; GISEL-NEXT:s_mov_b32 s1, 0
-; GISEL-NEXT:v_interp_p1_f32_e32 v3, v0, attr0.y
-; GISEL-NEXT:s_load_dwordx8 s[8:15], s[0:1], 0x0
-; GISEL-NEXT:s_add_u32 s0, s5, 0x430
-; GISEL-NEXT:v_mov_b32_e32 v14, v2
-; GISEL-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x0
-; GISEL-NEXT:v_interp_p1_f32_e32 v2, v0, attr0.x
-; GISEL-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y
-; GISEL-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x
-; GISEL-NEXT:s_and_b32 exec_lo, exec_lo, s16
-; GISEL-NEXT:s_waitcnt lgkmcnt(0)
-; GISEL-NEXT:image_sample v[0:3], v[2:3], s[8:15], s[0:3] dmask:0xf 
dim:SQ_RSRC_IMG_2D
-; GISEL-NEXT:s_waitcnt vmcnt(0)
-; GISEL-NEXT:v_fma_legacy_f32 v0, v0, 2.0, -1.0
-; GISEL-NEXT:v_fma_legacy_f32 v1, 

[llvm-branch-commits] [llvm] 49dce85 - [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC.

2021-01-19 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-19T10:39:56Z
New Revision: 49dce85584e34ee7fb973da9ba617169fd0f103c

URL: 
https://github.com/llvm/llvm-project/commit/49dce85584e34ee7fb973da9ba617169fd0f103c
DIFF: 
https://github.com/llvm/llvm-project/commit/49dce85584e34ee7fb973da9ba617169fd0f103c.diff

LOG: [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC.

Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00

Added: 


Modified: 
llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp 
b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
index 574fba62f5f3..fcca32abdd5a 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
@@ -958,10 +958,9 @@ void AMDGPUInstPrinter::printSDWADstUnused(const MCInst 
*MI, unsigned OpNo,
   }
 }
 
-template 
 void AMDGPUInstPrinter::printExpSrcN(const MCInst *MI, unsigned OpNo,
- const MCSubtargetInfo ,
- raw_ostream ) {
+ const MCSubtargetInfo , raw_ostream 
,
+ unsigned N) {
   unsigned Opc = MI->getOpcode();
   int EnIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::en);
   unsigned En = MI->getOperand(EnIdx).getImm();
@@ -969,12 +968,8 @@ void AMDGPUInstPrinter::printExpSrcN(const MCInst *MI, 
unsigned OpNo,
   int ComprIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::compr);
 
   // If compr is set, print as src0, src0, src1, src1
-  if (MI->getOperand(ComprIdx).getImm()) {
-if (N == 1 || N == 2)
-  --OpNo;
-else if (N == 3)
-  OpNo -= 2;
-  }
+  if (MI->getOperand(ComprIdx).getImm())
+OpNo = OpNo - N + N / 2;
 
   if (En & (1 << N))
 printRegOperand(MI->getOperand(OpNo).getReg(), O, MRI);
@@ -985,25 +980,25 @@ void AMDGPUInstPrinter::printExpSrcN(const MCInst *MI, 
unsigned OpNo,
 void AMDGPUInstPrinter::printExpSrc0(const MCInst *MI, unsigned OpNo,
  const MCSubtargetInfo ,
  raw_ostream ) {
-  printExpSrcN<0>(MI, OpNo, STI, O);
+  printExpSrcN(MI, OpNo, STI, O, 0);
 }
 
 void AMDGPUInstPrinter::printExpSrc1(const MCInst *MI, unsigned OpNo,
  const MCSubtargetInfo ,
  raw_ostream ) {
-  printExpSrcN<1>(MI, OpNo, STI, O);
+  printExpSrcN(MI, OpNo, STI, O, 1);
 }
 
 void AMDGPUInstPrinter::printExpSrc2(const MCInst *MI, unsigned OpNo,
  const MCSubtargetInfo ,
  raw_ostream ) {
-  printExpSrcN<2>(MI, OpNo, STI, O);
+  printExpSrcN(MI, OpNo, STI, O, 2);
 }
 
 void AMDGPUInstPrinter::printExpSrc3(const MCInst *MI, unsigned OpNo,
  const MCSubtargetInfo ,
  raw_ostream ) {
-  printExpSrcN<3>(MI, OpNo, STI, O);
+  printExpSrcN(MI, OpNo, STI, O, 3);
 }
 
 void AMDGPUInstPrinter::printExpTgt(const MCInst *MI, unsigned OpNo,

diff  --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h 
b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
index 64ccb9092ec4..8d13aa682211 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
@@ -179,10 +179,8 @@ class AMDGPUInstPrinter : public MCInstPrinter {
   void printDefaultVccOperand(unsigned OpNo, const MCSubtargetInfo ,
   raw_ostream );
 
-
-  template 
-  void printExpSrcN(const MCInst *MI, unsigned OpNo,
-const MCSubtargetInfo , raw_ostream );
+  void printExpSrcN(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo 
,
+raw_ostream , unsigned N);
   void printExpSrc0(const MCInst *MI, unsigned OpNo,
 const MCSubtargetInfo , raw_ostream );
   void printExpSrc1(const MCInst *MI, unsigned OpNo,



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 868da2e - [SelectionDAG] Remove an early-out from computeKnownBits for smin/smax

2021-01-14 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-14T18:15:17Z
New Revision: 868da2ea939baf8c71a6dcb878cf6094ede9486e

URL: 
https://github.com/llvm/llvm-project/commit/868da2ea939baf8c71a6dcb878cf6094ede9486e
DIFF: 
https://github.com/llvm/llvm-project/commit/868da2ea939baf8c71a6dcb878cf6094ede9486e.diff

LOG: [SelectionDAG] Remove an early-out from computeKnownBits for smin/smax

Even if we know nothing about LHS, it can still be useful to know that
smax(LHS, RHS) >= RHS and smin(LHS, RHS) <= RHS.

Differential Revision: https://reviews.llvm.org/D87145

Added: 


Modified: 
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
llvm/test/CodeGen/X86/known-bits-vector.ll

Removed: 




diff  --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 7084ab68524b5..82da553954d2f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -3416,7 +3416,6 @@ KnownBits SelectionDAG::computeKnownBits(SDValue Op, 
const APInt ,
 }
 
 Known = computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);
-if (Known.isUnknown()) break; // Early-out
 Known2 = computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
 if (IsMax)
   Known = KnownBits::smax(Known, Known2);

diff  --git a/llvm/test/CodeGen/X86/known-bits-vector.ll 
b/llvm/test/CodeGen/X86/known-bits-vector.ll
index 3b6912a9d9461..05bf984101abc 100644
--- a/llvm/test/CodeGen/X86/known-bits-vector.ll
+++ b/llvm/test/CodeGen/X86/known-bits-vector.ll
@@ -435,11 +435,7 @@ define <4 x float> @knownbits_smax_smin_shuffle_uitofp(<4 
x i32> %a0) {
 ; X32-NEXT:vpminsd {{\.LCPI.*}}, %xmm0, %xmm0
 ; X32-NEXT:vpmaxsd {{\.LCPI.*}}, %xmm0, %xmm0
 ; X32-NEXT:vpshufd {{.*#+}} xmm0 = xmm0[0,0,3,3]
-; X32-NEXT:vpblendw {{.*#+}} xmm1 = 
xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
-; X32-NEXT:vpsrld $16, %xmm0, %xmm0
-; X32-NEXT:vpblendw {{.*#+}} xmm0 = 
xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
-; X32-NEXT:vsubps {{\.LCPI.*}}, %xmm0, %xmm0
-; X32-NEXT:vaddps %xmm0, %xmm1, %xmm0
+; X32-NEXT:vcvtdq2ps %xmm0, %xmm0
 ; X32-NEXT:retl
 ;
 ; X64-LABEL: knownbits_smax_smin_shuffle_uitofp:
@@ -447,11 +443,7 @@ define <4 x float> @knownbits_smax_smin_shuffle_uitofp(<4 
x i32> %a0) {
 ; X64-NEXT:vpminsd {{.*}}(%rip), %xmm0, %xmm0
 ; X64-NEXT:vpmaxsd {{.*}}(%rip), %xmm0, %xmm0
 ; X64-NEXT:vpshufd {{.*#+}} xmm0 = xmm0[0,0,3,3]
-; X64-NEXT:vpblendw {{.*#+}} xmm1 = 
xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
-; X64-NEXT:vpsrld $16, %xmm0, %xmm0
-; X64-NEXT:vpblendw {{.*#+}} xmm0 = 
xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
-; X64-NEXT:vsubps {{.*}}(%rip), %xmm0, %xmm0
-; X64-NEXT:vaddps %xmm0, %xmm1, %xmm0
+; X64-NEXT:vcvtdq2ps %xmm0, %xmm0
 ; X64-NEXT:retq
   %1 = call <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32> %a0, <4 x i32> )
   %2 = call <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32> %1, <4 x i32> )



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 90b310f - [Support] Simplify KnownBits::icmp helpers. NFC.

2021-01-14 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-14T14:02:43Z
New Revision: 90b310f6caf0b356075c70407c338b3c751eebb3

URL: 
https://github.com/llvm/llvm-project/commit/90b310f6caf0b356075c70407c338b3c751eebb3
DIFF: 
https://github.com/llvm/llvm-project/commit/90b310f6caf0b356075c70407c338b3c751eebb3.diff

LOG: [Support] Simplify KnownBits::icmp helpers. NFC.

Remove some special cases that aren't really any simpler than the
general case.

Differential Revision: https://reviews.llvm.org/D94595

Added: 


Modified: 
llvm/lib/Support/KnownBits.cpp

Removed: 




diff  --git a/llvm/lib/Support/KnownBits.cpp b/llvm/lib/Support/KnownBits.cpp
index 0147d21d153a..0f36c6a9ef1d 100644
--- a/llvm/lib/Support/KnownBits.cpp
+++ b/llvm/lib/Support/KnownBits.cpp
@@ -271,9 +271,6 @@ KnownBits KnownBits::ashr(const KnownBits , const 
KnownBits ) {
 Optional KnownBits::eq(const KnownBits , const KnownBits ) {
   if (LHS.isConstant() && RHS.isConstant())
 return Optional(LHS.getConstant() == RHS.getConstant());
-  if (LHS.getMaxValue().ult(RHS.getMinValue()) ||
-  LHS.getMinValue().ugt(RHS.getMaxValue()))
-return Optional(false);
   if (LHS.One.intersects(RHS.Zero) || RHS.One.intersects(LHS.Zero))
 return Optional(false);
   return None;
@@ -286,8 +283,6 @@ Optional KnownBits::ne(const KnownBits , const 
KnownBits ) {
 }
 
 Optional KnownBits::ugt(const KnownBits , const KnownBits ) {
-  if (LHS.isConstant() && RHS.isConstant())
-return Optional(LHS.getConstant().ugt(RHS.getConstant()));
   // LHS >u RHS -> false if umax(LHS) <= umax(RHS)
   if (LHS.getMaxValue().ule(RHS.getMinValue()))
 return Optional(false);
@@ -312,8 +307,6 @@ Optional KnownBits::ule(const KnownBits , const 
KnownBits ) {
 }
 
 Optional KnownBits::sgt(const KnownBits , const KnownBits ) {
-  if (LHS.isConstant() && RHS.isConstant())
-return Optional(LHS.getConstant().sgt(RHS.getConstant()));
   // LHS >s RHS -> false if smax(LHS) <= smax(RHS)
   if (LHS.getSignedMaxValue().sle(RHS.getSignedMinValue()))
 return Optional(false);



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 517196e - [Analysis, CodeGen] Make use of KnownBits::makeConstant. NFC.

2021-01-14 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-14T14:02:43Z
New Revision: 517196e569129677be32d6ebcfa57bac552268a4

URL: 
https://github.com/llvm/llvm-project/commit/517196e569129677be32d6ebcfa57bac552268a4
DIFF: 
https://github.com/llvm/llvm-project/commit/517196e569129677be32d6ebcfa57bac552268a4.diff

LOG: [Analysis,CodeGen] Make use of KnownBits::makeConstant. NFC.

Differential Revision: https://reviews.llvm.org/D94588

Added: 


Modified: 
llvm/lib/Analysis/ValueTracking.cpp
llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Removed: 




diff  --git a/llvm/lib/Analysis/ValueTracking.cpp 
b/llvm/lib/Analysis/ValueTracking.cpp
index b138caa05610..61c992d0eedf 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -1337,8 +1337,8 @@ static void computeKnownBitsFromOperator(const Operator 
*I,
 AccConstIndices += IndexConst.sextOrTrunc(BitWidth);
 continue;
   } else {
-ScalingFactor.Zero = ~TypeSizeInBytes;
-ScalingFactor.One = TypeSizeInBytes;
+ScalingFactor =
+KnownBits::makeConstant(APInt(IndexBitWidth, TypeSizeInBytes));
   }
   IndexBits = KnownBits::computeForMul(IndexBits, ScalingFactor);
 
@@ -1353,9 +1353,7 @@ static void computeKnownBitsFromOperator(const Operator 
*I,
   /*Add=*/true, /*NSW=*/false, Known, IndexBits);
 }
 if (!Known.isUnknown() && !AccConstIndices.isNullValue()) {
-  KnownBits Index(BitWidth);
-  Index.Zero = ~AccConstIndices;
-  Index.One = AccConstIndices;
+  KnownBits Index = KnownBits::makeConstant(AccConstIndices);
   Known = KnownBits::computeForAddSub(
   /*Add=*/true, /*NSW=*/false, Known, Index);
 }
@@ -1818,8 +1816,7 @@ void computeKnownBits(const Value *V, const APInt 
,
   const APInt *C;
   if (match(V, m_APInt(C))) {
 // We know all of the bits for a scalar constant or a splat vector 
constant!
-Known.One = *C;
-Known.Zero = ~Known.One;
+Known = KnownBits::makeConstant(*C);
 return;
   }
   // Null and aggregate-zero are all-zeros.

diff  --git a/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp 
b/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
index 64c7fb486493..aac7a73e858f 100644
--- a/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
@@ -217,8 +217,7 @@ void GISelKnownBits::computeKnownBitsImpl(Register R, 
KnownBits ,
 auto CstVal = getConstantVRegVal(R, MRI);
 if (!CstVal)
   break;
-Known.One = *CstVal;
-Known.Zero = ~Known.One;
+Known = KnownBits::makeConstant(*CstVal);
 break;
   }
   case TargetOpcode::G_FRAME_INDEX: {

diff  --git a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp 
b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
index 0b830f462c90..32a4f60df097 100644
--- a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
@@ -458,8 +458,7 @@ void FunctionLoweringInfo::ComputePHILiveOutRegInfo(const 
PHINode *PN) {
   if (ConstantInt *CI = dyn_cast(V)) {
 APInt Val = CI->getValue().zextOrTrunc(BitWidth);
 DestLOI.NumSignBits = Val.getNumSignBits();
-DestLOI.Known.Zero = ~Val;
-DestLOI.Known.One = Val;
+DestLOI.Known = KnownBits::makeConstant(Val);
   } else {
 assert(ValueMap.count(V) && "V should have been placed in ValueMap when 
its"
 "CopyToReg node was created.");

diff  --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index e080408bbe42..7084ab68524b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -3134,13 +3134,10 @@ KnownBits SelectionDAG::computeKnownBits(SDValue Op, 
const APInt ,
   }
 } else if (BitWidth == CstTy->getPrimitiveSizeInBits()) {
   if (auto *CInt = dyn_cast(Cst)) {
-const APInt  = CInt->getValue();
-Known.One = Value;
-Known.Zero = ~Value;
+Known = KnownBits::makeConstant(CInt->getValue());
   } else if (auto *CFP = dyn_cast(Cst)) {
-APInt Value = CFP->getValueAPF().bitcastToAPInt();
-Known.One = Value;
-Known.Zero = ~Value;
+Known =
+KnownBits::makeConstant(CFP->getValueAPF().bitcastToAPInt());
   }
 }
   }

diff  --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 173e45a4b18e..6ae0a39962b3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -912,15 +912,14 @@ bool 

[llvm-branch-commits] [llvm] a1cba5b - [SelectionDAG] Make use of KnownBits::commonBits. NFC.

2021-01-14 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-14T14:02:43Z
New Revision: a1cba5b7a1fb09d2d4082967e2466a5a89ed698a

URL: 
https://github.com/llvm/llvm-project/commit/a1cba5b7a1fb09d2d4082967e2466a5a89ed698a
DIFF: 
https://github.com/llvm/llvm-project/commit/a1cba5b7a1fb09d2d4082967e2466a5a89ed698a.diff

LOG: [SelectionDAG] Make use of KnownBits::commonBits. NFC.

Differential Revision: https://reviews.llvm.org/D94587

Added: 


Modified: 
llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp 
b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
index 669bca966a7d..0b830f462c90 100644
--- a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
@@ -509,8 +509,7 @@ void FunctionLoweringInfo::ComputePHILiveOutRegInfo(const 
PHINode *PN) {
   return;
 }
 DestLOI.NumSignBits = std::min(DestLOI.NumSignBits, SrcLOI->NumSignBits);
-DestLOI.Known.Zero &= SrcLOI->Known.Zero;
-DestLOI.Known.One &= SrcLOI->Known.One;
+DestLOI.Known = KnownBits::commonBits(DestLOI.Known, SrcLOI->Known);
   }
 }
 

diff  --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 7ea0b09ef9c9..173e45a4b18e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -1016,10 +1016,8 @@ bool TargetLowering::SimplifyDemandedBits(
  Depth + 1))
   return true;
 
-if (!!DemandedVecElts) {
-  Known.One &= KnownVec.One;
-  Known.Zero &= KnownVec.Zero;
-}
+if (!!DemandedVecElts)
+  Known = KnownBits::commonBits(Known, KnownVec);
 
 return false;
   }
@@ -1044,14 +1042,10 @@ bool TargetLowering::SimplifyDemandedBits(
 
 Known.Zero.setAllBits();
 Known.One.setAllBits();
-if (!!DemandedSubElts) {
-  Known.One &= KnownSub.One;
-  Known.Zero &= KnownSub.Zero;
-}
-if (!!DemandedSrcElts) {
-  Known.One &= KnownSrc.One;
-  Known.Zero &= KnownSrc.Zero;
-}
+if (!!DemandedSubElts)
+  Known = KnownBits::commonBits(Known, KnownSub);
+if (!!DemandedSrcElts)
+  Known = KnownBits::commonBits(Known, KnownSrc);
 
 // Attempt to avoid multi-use src if we don't need anything from it.
 if (!DemandedBits.isAllOnesValue() || !DemandedSubElts.isAllOnesValue() ||
@@ -1108,10 +1102,8 @@ bool TargetLowering::SimplifyDemandedBits(
Known2, TLO, Depth + 1))
 return true;
   // Known bits are shared by every demanded subvector element.
-  if (!!DemandedSubElts) {
-Known.One &= Known2.One;
-Known.Zero &= Known2.Zero;
-  }
+  if (!!DemandedSubElts)
+Known = KnownBits::commonBits(Known, Known2);
 }
 break;
   }
@@ -1149,15 +1141,13 @@ bool TargetLowering::SimplifyDemandedBits(
 if (SimplifyDemandedBits(Op0, DemandedBits, DemandedLHS, Known2, TLO,
  Depth + 1))
   return true;
-Known.One &= Known2.One;
-Known.Zero &= Known2.Zero;
+Known = KnownBits::commonBits(Known, Known2);
   }
   if (!!DemandedRHS) {
 if (SimplifyDemandedBits(Op1, DemandedBits, DemandedRHS, Known2, TLO,
  Depth + 1))
   return true;
-Known.One &= Known2.One;
-Known.Zero &= Known2.Zero;
+Known = KnownBits::commonBits(Known, Known2);
   }
 
   // Attempt to avoid multi-use ops if we don't need anything from them.
@@ -1384,8 +1374,7 @@ bool TargetLowering::SimplifyDemandedBits(
   return true;
 
 // Only known if known in both the LHS and RHS.
-Known.One &= Known2.One;
-Known.Zero &= Known2.Zero;
+Known = KnownBits::commonBits(Known, Known2);
 break;
   case ISD::SELECT_CC:
 if (SimplifyDemandedBits(Op.getOperand(3), DemandedBits, Known, TLO,
@@ -1402,8 +1391,7 @@ bool TargetLowering::SimplifyDemandedBits(
   return true;
 
 // Only known if known in both the LHS and RHS.
-Known.One &= Known2.One;
-Known.Zero &= Known2.Zero;
+Known = KnownBits::commonBits(Known, Known2);
 break;
   case ISD::SETCC: {
 SDValue Op0 = Op.getOperand(0);



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] f264f9a - [SlotIndexes] Fix and simplify basic block splitting

2021-01-12 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-12T10:50:14Z
New Revision: f264f9ad7df538357dfc8c5f318c5c8b0df3d99f

URL: 
https://github.com/llvm/llvm-project/commit/f264f9ad7df538357dfc8c5f318c5c8b0df3d99f
DIFF: 
https://github.com/llvm/llvm-project/commit/f264f9ad7df538357dfc8c5f318c5c8b0df3d99f.diff

LOG: [SlotIndexes] Fix and simplify basic block splitting

Remove the InsertionPoint argument from SlotIndexes::insertMBBInMaps
because it was confusing: what does it mean to insert a new block
between two instructions, in the middle of an existing block?

Instead, support the case that MachineBasicBlock::splitAt really needs,
where the new block contains some instructions that are already in the
maps because they have been moved there from the tail of the previous
block.

In all other use cases the new block is empty.

Based on work by Carl Ritson!

Differential Revision: https://reviews.llvm.org/D94311

Added: 


Modified: 
llvm/include/llvm/CodeGen/LiveIntervals.h
llvm/include/llvm/CodeGen/SlotIndexes.h
llvm/lib/CodeGen/MachineBasicBlock.cpp
llvm/unittests/MI/LiveIntervalTest.cpp

Removed: 




diff  --git a/llvm/include/llvm/CodeGen/LiveIntervals.h 
b/llvm/include/llvm/CodeGen/LiveIntervals.h
index 1a6b59a8959e..fa08166791b0 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervals.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervals.h
@@ -256,9 +256,8 @@ class VirtRegMap;
   return Indexes->getMBBFromIndex(index);
 }
 
-void insertMBBInMaps(MachineBasicBlock *MBB,
- MachineInstr *InsertionPoint = nullptr) {
-  Indexes->insertMBBInMaps(MBB, InsertionPoint);
+void insertMBBInMaps(MachineBasicBlock *MBB) {
+  Indexes->insertMBBInMaps(MBB);
   assert(unsigned(MBB->getNumber()) == RegMaskBlocks.size() &&
  "Blocks must be added in order.");
   RegMaskBlocks.push_back(std::make_pair(RegMaskSlots.size(), 0));

diff  --git a/llvm/include/llvm/CodeGen/SlotIndexes.h 
b/llvm/include/llvm/CodeGen/SlotIndexes.h
index 19eab7ae5e35..b2133de93ea2 100644
--- a/llvm/include/llvm/CodeGen/SlotIndexes.h
+++ b/llvm/include/llvm/CodeGen/SlotIndexes.h
@@ -604,38 +604,27 @@ class raw_ostream;
 }
 
 /// Add the given MachineBasicBlock into the maps.
-/// If \p InsertionPoint is specified then the block will be placed
-/// before the given machine instr, otherwise it will be placed
-/// before the next block in MachineFunction insertion order.
-void insertMBBInMaps(MachineBasicBlock *mbb,
- MachineInstr *InsertionPoint = nullptr) {
-  MachineFunction::iterator nextMBB =
-std::next(MachineFunction::iterator(mbb));
-
-  IndexListEntry *startEntry = nullptr;
-  IndexListEntry *endEntry = nullptr;
-  IndexList::iterator newItr;
-  if (InsertionPoint) {
-startEntry = createEntry(nullptr, 0);
-endEntry = getInstructionIndex(*InsertionPoint).listEntry();
-newItr = indexList.insert(endEntry->getIterator(), startEntry);
-  } else if (nextMBB == mbb->getParent()->end()) {
-startEntry = ();
-endEntry = createEntry(nullptr, 0);
-newItr = indexList.insertAfter(startEntry->getIterator(), endEntry);
-  } else {
-startEntry = createEntry(nullptr, 0);
-endEntry = getMBBStartIdx(&*nextMBB).listEntry();
-newItr = indexList.insert(endEntry->getIterator(), startEntry);
-  }
+/// If it contains any instructions then they must already be in the maps.
+/// This is used after a block has been split by moving some suffix of its
+/// instructions into a newly created block.
+void insertMBBInMaps(MachineBasicBlock *mbb) {
+  assert(mbb != >getParent()->front() &&
+ "Can't insert a new block at the beginning of a function.");
+  auto prevMBB = std::prev(MachineFunction::iterator(mbb));
+
+  // Create a new entry to be used for the start of mbb and the end of
+  // prevMBB.
+  IndexListEntry *startEntry = createEntry(nullptr, 0);
+  IndexListEntry *endEntry = getMBBEndIdx(&*prevMBB).listEntry();
+  IndexListEntry *insEntry =
+  mbb->empty() ? endEntry
+   : getInstructionIndex(mbb->front()).listEntry();
+  IndexList::iterator newItr =
+  indexList.insert(insEntry->getIterator(), startEntry);
 
   SlotIndex startIdx(startEntry, SlotIndex::Slot_Block);
   SlotIndex endIdx(endEntry, SlotIndex::Slot_Block);
 
-  MachineFunction::iterator prevMBB(mbb);
-  assert(prevMBB != mbb->getParent()->end() &&
- "Can't insert a new block at the beginning of a function.");
-  --prevMBB;
   MBBRanges[prevMBB->getNumber()].second = startIdx;
 
   assert(unsigned(mbb->getNumber()) == MBBRanges.size() &&

diff  --git a/llvm/lib/CodeGen/MachineBasicBlock.cpp 
b/llvm/lib/CodeGen/MachineBasicBlock.cpp
index c7b404e075e1..fded4b15e67b 100644
--- 

[llvm-branch-commits] [llvm] 6dcf920 - [AMDGPU] Fix a urem combine test to test what it was supposed to

2021-01-11 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-11T13:32:34Z
New Revision: 6dcf9207df11f5cdb0126e5c5632e93532642ed9

URL: 
https://github.com/llvm/llvm-project/commit/6dcf9207df11f5cdb0126e5c5632e93532642ed9
DIFF: 
https://github.com/llvm/llvm-project/commit/6dcf9207df11f5cdb0126e5c5632e93532642ed9.diff

LOG: [AMDGPU] Fix a urem combine test to test what it was supposed to

Added: 


Modified: 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-urem-pow-2.mir

Removed: 




diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-urem-pow-2.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-urem-pow-2.mir
index f92e32dab08f..da6c8480b25e 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-urem-pow-2.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-urem-pow-2.mir
@@ -48,12 +48,14 @@ body: |
 
 ; GCN-LABEL: name: urem_s32_var_const2
 ; GCN: liveins: $vgpr0
-; GCN: %const:_(s32) = G_CONSTANT i32 1
+; GCN: %var:_(s32) = COPY $vgpr0
+; GCN: %const:_(s32) = G_CONSTANT i32 2
 ; GCN: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 -1
 ; GCN: [[ADD:%[0-9]+]]:_(s32) = G_ADD %const, [[C]]
-; GCN: $vgpr0 = COPY [[ADD]](s32)
+; GCN: %rem:_(s32) = G_AND %var, [[ADD]]
+; GCN: $vgpr0 = COPY %rem(s32)
 %var:_(s32) = COPY $vgpr0
-%const:_(s32) = G_CONSTANT i32 1
+%const:_(s32) = G_CONSTANT i32 2
 %rem:_(s32) = G_UREM %var, %const
 $vgpr0 = COPY %rem
 ...



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 3914beb - [AMDGPU] Handle v_fmac_legacy_f32 in SIFoldOperands

2021-01-05 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-05T11:55:33Z
New Revision: 3914bebe91f6b557e61d6d74117762f9043593e0

URL: 
https://github.com/llvm/llvm-project/commit/3914bebe91f6b557e61d6d74117762f9043593e0
DIFF: 
https://github.com/llvm/llvm-project/commit/3914bebe91f6b557e61d6d74117762f9043593e0.diff

LOG: [AMDGPU] Handle v_fmac_legacy_f32 in SIFoldOperands

Convert it to v_fma_legacy_f32 if it is profitable to do so, just like
other mac instructions that are converted to their mad equivalents.

Differential Revision: https://reviews.llvm.org/D94010

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 6dc01c3d3c21..892dc1feb298 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -140,6 +140,8 @@ static unsigned macToMad(unsigned Opc) {
 return AMDGPU::V_FMA_F32;
   case AMDGPU::V_FMAC_F16_e64:
 return AMDGPU::V_FMA_F16_gfx9;
+  case AMDGPU::V_FMAC_LEGACY_F32_e64:
+return AMDGPU::V_FMA_LEGACY_F32;
   }
   return AMDGPU::INSTRUCTION_LIST_END;
 }

diff  --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
index 8bfb81d86ace..e641d12444cc 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
@@ -70,16 +70,10 @@ define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, 
float, float, float,
 ; SDAG-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y
 ; SDAG-NEXT:s_and_b32 exec_lo, exec_lo, s16
 ; SDAG-NEXT:s_waitcnt lgkmcnt(0)
-; SDAG-NEXT:image_sample v[7:10], v[2:3], s[8:15], s[0:3] dmask:0xf 
dim:SQ_RSRC_IMG_2D
-; SDAG-NEXT:v_mov_b32_e32 v4, -1.0
-; SDAG-NEXT:v_mov_b32_e32 v5, -1.0
+; SDAG-NEXT:image_sample v[0:3], v[2:3], s[8:15], s[0:3] dmask:0xf 
dim:SQ_RSRC_IMG_2D
 ; SDAG-NEXT:s_waitcnt vmcnt(0)
-; SDAG-NEXT:v_fmac_legacy_f32_e64 v4, v7, 2.0
-; SDAG-NEXT:v_fmac_legacy_f32_e64 v5, v8, 2.0
-; SDAG-NEXT:v_mov_b32_e32 v2, v9
-; SDAG-NEXT:v_mov_b32_e32 v3, v10
-; SDAG-NEXT:v_mov_b32_e32 v0, v4
-; SDAG-NEXT:v_mov_b32_e32 v1, v5
+; SDAG-NEXT:v_fma_legacy_f32 v0, v0, 2.0, -1.0
+; SDAG-NEXT:v_fma_legacy_f32 v1, v1, 2.0, -1.0
 ; SDAG-NEXT:; return to shader part epilog
 ;
 ; GISEL-LABEL: main:
@@ -100,16 +94,10 @@ define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, 
float, float, float,
 ; GISEL-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x
 ; GISEL-NEXT:s_and_b32 exec_lo, exec_lo, s16
 ; GISEL-NEXT:s_waitcnt lgkmcnt(0)
-; GISEL-NEXT:image_sample v[7:10], v[2:3], s[8:15], s[0:3] dmask:0xf 
dim:SQ_RSRC_IMG_2D
-; GISEL-NEXT:v_mov_b32_e32 v4, -1.0
-; GISEL-NEXT:v_mov_b32_e32 v5, -1.0
+; GISEL-NEXT:image_sample v[0:3], v[2:3], s[8:15], s[0:3] dmask:0xf 
dim:SQ_RSRC_IMG_2D
 ; GISEL-NEXT:s_waitcnt vmcnt(0)
-; GISEL-NEXT:v_fmac_legacy_f32_e64 v4, v7, 2.0
-; GISEL-NEXT:v_fmac_legacy_f32_e64 v5, v8, 2.0
-; GISEL-NEXT:v_mov_b32_e32 v2, v9
-; GISEL-NEXT:v_mov_b32_e32 v3, v10
-; GISEL-NEXT:v_mov_b32_e32 v0, v4
-; GISEL-NEXT:v_mov_b32_e32 v1, v5
+; GISEL-NEXT:v_fma_legacy_f32 v0, v0, 2.0, -1.0
+; GISEL-NEXT:v_fma_legacy_f32 v1, v1, 2.0, -1.0
 ; GISEL-NEXT:; return to shader part epilog
   %i = bitcast <2 x i32> %arg7 to <2 x float>
   %i22 = extractelement <2 x float> %i, i32 0



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 639a50e - [AMDGPU] Precommit test case for D94010

2021-01-05 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-05T11:55:14Z
New Revision: 639a50e2f138ed3e647b00809a2871a1b9ae9012

URL: 
https://github.com/llvm/llvm-project/commit/639a50e2f138ed3e647b00809a2871a1b9ae9012
DIFF: 
https://github.com/llvm/llvm-project/commit/639a50e2f138ed3e647b00809a2871a1b9ae9012.diff

LOG: [AMDGPU] Precommit test case for D94010

Added: 


Modified: 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll

Removed: 




diff  --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
index 27ba74c3f557..8bfb81d86ace 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 < %s | FileCheck 
-check-prefix=GCN %s
-; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 < %s | 
FileCheck -check-prefix=GCN %s
+; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | FileCheck 
-check-prefixes=GCN,SDAG %s
+; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | 
FileCheck -check-prefixes=GCN,GISEL %s
 
 define float @v_fma(float %a, float %b, float %c)  {
 ; GCN-LABEL: v_fma:
@@ -51,5 +51,98 @@ define float @v_fneg_fma(float %a, float %b, float %c)  {
   ret float %fma
 }
 
+define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, 
float, float, float, float, float, float, float, float, float, float, float }> 
@main(<4 x i32> addrspace(6)* inreg noalias align 32 
dereferenceable(18446744073709551615) %arg, <8 x i32> addrspace(6)* inreg 
noalias align 32 dereferenceable(18446744073709551615) %arg1, <4 x i32> 
addrspace(6)* inreg noalias align 32 dereferenceable(18446744073709551615) 
%arg2, <8 x i32> addrspace(6)* inreg noalias align 32 
dereferenceable(18446744073709551615) %arg3, i32 inreg %arg4, i32 inreg %arg5, 
<2 x i32> %arg6, <2 x i32> %arg7, <2 x i32> %arg8, <3 x i32> %arg9, <2 x i32> 
%arg10, <2 x i32> %arg11, <2 x i32> %arg12, <3 x float> %arg13, float %arg14, 
float %arg15, float %arg16, float %arg17, i32 %arg18, i32 %arg19, float %arg20, 
i32 %arg21) #0 {
+; SDAG-LABEL: main:
+; SDAG:   ; %bb.0:
+; SDAG-NEXT:s_mov_b32 s16, exec_lo
+; SDAG-NEXT:v_mov_b32_e32 v14, v2
+; SDAG-NEXT:s_mov_b32 s0, s5
+; SDAG-NEXT:s_wqm_b32 exec_lo, exec_lo
+; SDAG-NEXT:s_mov_b32 s1, 0
+; SDAG-NEXT:s_mov_b32 m0, s7
+; SDAG-NEXT:s_clause 0x1
+; SDAG-NEXT:s_load_dwordx8 s[8:15], s[0:1], 0x400
+; SDAG-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x430
+; SDAG-NEXT:v_interp_p1_f32_e32 v2, v0, attr0.x
+; SDAG-NEXT:v_interp_p1_f32_e32 v3, v0, attr0.y
+; SDAG-NEXT:s_mov_b32 s4, s6
+; SDAG-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x
+; SDAG-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y
+; SDAG-NEXT:s_and_b32 exec_lo, exec_lo, s16
+; SDAG-NEXT:s_waitcnt lgkmcnt(0)
+; SDAG-NEXT:image_sample v[7:10], v[2:3], s[8:15], s[0:3] dmask:0xf 
dim:SQ_RSRC_IMG_2D
+; SDAG-NEXT:v_mov_b32_e32 v4, -1.0
+; SDAG-NEXT:v_mov_b32_e32 v5, -1.0
+; SDAG-NEXT:s_waitcnt vmcnt(0)
+; SDAG-NEXT:v_fmac_legacy_f32_e64 v4, v7, 2.0
+; SDAG-NEXT:v_fmac_legacy_f32_e64 v5, v8, 2.0
+; SDAG-NEXT:v_mov_b32_e32 v2, v9
+; SDAG-NEXT:v_mov_b32_e32 v3, v10
+; SDAG-NEXT:v_mov_b32_e32 v0, v4
+; SDAG-NEXT:v_mov_b32_e32 v1, v5
+; SDAG-NEXT:; return to shader part epilog
+;
+; GISEL-LABEL: main:
+; GISEL:   ; %bb.0:
+; GISEL-NEXT:s_mov_b32 s16, exec_lo
+; GISEL-NEXT:s_mov_b32 s4, s6
+; GISEL-NEXT:s_mov_b32 m0, s7
+; GISEL-NEXT:s_wqm_b32 exec_lo, exec_lo
+; GISEL-NEXT:s_add_u32 s0, s5, 0x400
+; GISEL-NEXT:s_mov_b32 s1, 0
+; GISEL-NEXT:v_interp_p1_f32_e32 v3, v0, attr0.y
+; GISEL-NEXT:s_load_dwordx8 s[8:15], s[0:1], 0x0
+; GISEL-NEXT:s_add_u32 s0, s5, 0x430
+; GISEL-NEXT:v_mov_b32_e32 v14, v2
+; GISEL-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GISEL-NEXT:v_interp_p1_f32_e32 v2, v0, attr0.x
+; GISEL-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y
+; GISEL-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x
+; GISEL-NEXT:s_and_b32 exec_lo, exec_lo, s16
+; GISEL-NEXT:s_waitcnt lgkmcnt(0)
+; GISEL-NEXT:image_sample v[7:10], v[2:3], s[8:15], s[0:3] dmask:0xf 
dim:SQ_RSRC_IMG_2D
+; GISEL-NEXT:v_mov_b32_e32 v4, -1.0
+; GISEL-NEXT:v_mov_b32_e32 v5, -1.0
+; GISEL-NEXT:s_waitcnt vmcnt(0)
+; GISEL-NEXT:v_fmac_legacy_f32_e64 v4, v7, 2.0
+; GISEL-NEXT:v_fmac_legacy_f32_e64 v5, v8, 2.0
+; GISEL-NEXT:v_mov_b32_e32 v2, v9
+; GISEL-NEXT:v_mov_b32_e32 v3, v10
+; GISEL-NEXT:v_mov_b32_e32 v0, v4
+; GISEL-NEXT:v_mov_b32_e32 v1, v5
+; GISEL-NEXT:; return to shader part epilog
+  %i = bitcast <2 x i32> %arg7 to <2 x float>
+  %i22 = extractelement <2 x float> %i, i32 0
+  %i23 = extractelement <2 x float> %i, i32 1
+  %i24 = 

[llvm-branch-commits] [llvm] 4e6054a - [AMDGPU] Split out new helper function macToMad in SIFoldOperands. NFC.

2021-01-05 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-05T11:54:48Z
New Revision: 4e6054a86c0cb0697913007c99b59f3f65c9d04b

URL: 
https://github.com/llvm/llvm-project/commit/4e6054a86c0cb0697913007c99b59f3f65c9d04b
DIFF: 
https://github.com/llvm/llvm-project/commit/4e6054a86c0cb0697913007c99b59f3f65c9d04b.diff

LOG: [AMDGPU] Split out new helper function macToMad in SIFoldOperands. NFC.

Differential Revision: https://reviews.llvm.org/D94009

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index d86527df5c3c..6dc01c3d3c21 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -129,6 +129,21 @@ char SIFoldOperands::ID = 0;
 
 char ::SIFoldOperandsID = SIFoldOperands::ID;
 
+// Map multiply-accumulate opcode to corresponding multiply-add opcode if any.
+static unsigned macToMad(unsigned Opc) {
+  switch (Opc) {
+  case AMDGPU::V_MAC_F32_e64:
+return AMDGPU::V_MAD_F32;
+  case AMDGPU::V_MAC_F16_e64:
+return AMDGPU::V_MAD_F16;
+  case AMDGPU::V_FMAC_F32_e64:
+return AMDGPU::V_FMA_F32;
+  case AMDGPU::V_FMAC_F16_e64:
+return AMDGPU::V_FMA_F16_gfx9;
+  }
+  return AMDGPU::INSTRUCTION_LIST_END;
+}
+
 // Wrapper around isInlineConstant that understands special cases when
 // instruction types are replaced during operand folding.
 static bool isInlineConstantIfFolded(const SIInstrInfo *TII,
@@ -139,31 +154,18 @@ static bool isInlineConstantIfFolded(const SIInstrInfo 
*TII,
 return true;
 
   unsigned Opc = UseMI.getOpcode();
-  switch (Opc) {
-  case AMDGPU::V_MAC_F32_e64:
-  case AMDGPU::V_MAC_F16_e64:
-  case AMDGPU::V_FMAC_F32_e64:
-  case AMDGPU::V_FMAC_F16_e64: {
+  unsigned NewOpc = macToMad(Opc);
+  if (NewOpc != AMDGPU::INSTRUCTION_LIST_END) {
 // Special case for mac. Since this is replaced with mad when folded into
 // src2, we need to check the legality for the final instruction.
 int Src2Idx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2);
 if (static_cast(OpNo) == Src2Idx) {
-  bool IsFMA = Opc == AMDGPU::V_FMAC_F32_e64 ||
-   Opc == AMDGPU::V_FMAC_F16_e64;
-  bool IsF32 = Opc == AMDGPU::V_MAC_F32_e64 ||
-   Opc == AMDGPU::V_FMAC_F32_e64;
-
-  unsigned Opc = IsFMA ?
-(IsF32 ? AMDGPU::V_FMA_F32 : AMDGPU::V_FMA_F16_gfx9) :
-(IsF32 ? AMDGPU::V_MAD_F32 : AMDGPU::V_MAD_F16);
-  const MCInstrDesc  = TII->get(Opc);
+  const MCInstrDesc  = TII->get(NewOpc);
   return TII->isInlineConstant(OpToFold, MadDesc.OpInfo[OpNo].OperandType);
 }
-return false;
-  }
-  default:
-return false;
   }
+
+  return false;
 }
 
 // TODO: Add heuristic that the frame index might not fit in the addressing 
mode
@@ -346,17 +348,8 @@ static bool 
tryAddToFoldList(SmallVectorImpl ,
   if (!TII->isOperandLegal(*MI, OpNo, OpToFold)) {
 // Special case for v_mac_{f16, f32}_e64 if we are trying to fold into src2
 unsigned Opc = MI->getOpcode();
-if ((Opc == AMDGPU::V_MAC_F32_e64 || Opc == AMDGPU::V_MAC_F16_e64 ||
- Opc == AMDGPU::V_FMAC_F32_e64 || Opc == AMDGPU::V_FMAC_F16_e64) &&
-(int)OpNo == AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2)) {
-  bool IsFMA = Opc == AMDGPU::V_FMAC_F32_e64 ||
-   Opc == AMDGPU::V_FMAC_F16_e64;
-  bool IsF32 = Opc == AMDGPU::V_MAC_F32_e64 ||
-   Opc == AMDGPU::V_FMAC_F32_e64;
-  unsigned NewOpc = IsFMA ?
-(IsF32 ? AMDGPU::V_FMA_F32 : AMDGPU::V_FMA_F16_gfx9) :
-(IsF32 ? AMDGPU::V_MAD_F32 : AMDGPU::V_MAD_F16);
-
+unsigned NewOpc = macToMad(Opc);
+if (NewOpc != AMDGPU::INSTRUCTION_LIST_END) {
   // Check if changing this to a v_mad_{f16, f32} instruction will allow us
   // to fold the operand.
   MI->setDesc(TII->get(NewOpc));



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 07e92e6 - [AMDGPU] Make use of HasSMemRealTime predicate. NFC.

2020-12-14 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2020-12-14T16:34:57Z
New Revision: 07e92e6b6002d95d438d24eaabf4452ad6e4ef8f

URL: 
https://github.com/llvm/llvm-project/commit/07e92e6b6002d95d438d24eaabf4452ad6e4ef8f
DIFF: 
https://github.com/llvm/llvm-project/commit/07e92e6b6002d95d438d24eaabf4452ad6e4ef8f.diff

LOG: [AMDGPU] Make use of HasSMemRealTime predicate. NFC.

We have this subtarget feature so it makes sense to use it here. This is
NFC because it's always defined by default on GFX8+.

Differential Revision: https://reviews.llvm.org/D93202

Added: 


Modified: 
llvm/lib/Target/AMDGPU/AMDGPU.td
llvm/lib/Target/AMDGPU/SMInstructions.td

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/AMDGPU.td 
b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 77063f370976..42d134de9229 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1264,6 +1264,9 @@ def HasGetWaveIdInst : 
Predicate<"Subtarget->hasGetWaveIdInst()">,
 def HasMAIInsts : Predicate<"Subtarget->hasMAIInsts()">,
   AssemblerPredicate<(all_of FeatureMAIInsts)>;
 
+def HasSMemRealTime : Predicate<"Subtarget->hasSMemRealTime()">,
+  AssemblerPredicate<(all_of FeatureSMemRealTime)>;
+
 def HasSMemTimeInst : Predicate<"Subtarget->hasSMemTimeInst()">,
   AssemblerPredicate<(all_of FeatureSMemTimeInst)>;
 

diff  --git a/llvm/lib/Target/AMDGPU/SMInstructions.td 
b/llvm/lib/Target/AMDGPU/SMInstructions.td
index 70bf215c03f3..5b8896c21832 100644
--- a/llvm/lib/Target/AMDGPU/SMInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -332,7 +332,6 @@ let OtherPredicates = [HasScalarStores] in {
 def S_DCACHE_WB : SM_Inval_Pseudo <"s_dcache_wb", int_amdgcn_s_dcache_wb>;
 def S_DCACHE_WB_VOL : SM_Inval_Pseudo <"s_dcache_wb_vol", 
int_amdgcn_s_dcache_wb_vol>;
 } // End OtherPredicates = [HasScalarStores]
-def S_MEMREALTIME   : SM_Time_Pseudo <"s_memrealtime", 
int_amdgcn_s_memrealtime>;
 
 defm S_ATC_PROBE: SM_Pseudo_Probe <"s_atc_probe", SReg_64>;
 let is_buffer = 1 in {
@@ -340,6 +339,9 @@ defm S_ATC_PROBE_BUFFER : SM_Pseudo_Probe 
<"s_atc_probe_buffer", SReg_128>;
 }
 } // SubtargetPredicate = isGFX8Plus
 
+let SubtargetPredicate = HasSMemRealTime in
+def S_MEMREALTIME   : SM_Time_Pseudo <"s_memrealtime", 
int_amdgcn_s_memrealtime>;
+
 let SubtargetPredicate = isGFX10Plus in
 def S_GL1_INV : SM_Inval_Pseudo<"s_gl1_inv">;
 let SubtargetPredicate = HasGetWaveIdInst in



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 4f25e53 - [AMDGPU] Make use of emitRemovedIntrinsicError. NFC.

2020-12-11 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2020-12-11T14:02:14Z
New Revision: 4f25e5398211c603e765ab6c30ab35ad286d505f

URL: 
https://github.com/llvm/llvm-project/commit/4f25e5398211c603e765ab6c30ab35ad286d505f
DIFF: 
https://github.com/llvm/llvm-project/commit/4f25e5398211c603e765ab6c30ab35ad286d505f.diff

LOG: [AMDGPU] Make use of emitRemovedIntrinsicError. NFC.

Change-Id: I482bbf528255f2eacd3878ddfe7edb9a8f63d5c2

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SIISelLowering.cpp

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1accee5ccd2a..5fb1924bdd9f 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -6588,11 +6588,7 @@ SDValue 
SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
 if (Subtarget->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS)
   return SDValue();
 
-DiagnosticInfoUnsupported BadIntrin(
-  MF.getFunction(), "intrinsic not supported on subtarget",
-  DL.getDebugLoc());
-  DAG.getContext()->diagnose(BadIntrin);
-  return DAG.getUNDEF(VT);
+return emitRemovedIntrinsicError(DAG, DL, VT);
   }
   case Intrinsic::amdgcn_ldexp:
 return DAG.getNode(AMDGPUISD::LDEXP, DL, VT,



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 03663e4 - [AMDGPU] Add occupancy level tests for GFX10.3. NFC.

2020-12-08 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2020-12-08T14:15:01Z
New Revision: 03663e4130d700c6c8ea28b357fcac4d31b617f7

URL: 
https://github.com/llvm/llvm-project/commit/03663e4130d700c6c8ea28b357fcac4d31b617f7
DIFF: 
https://github.com/llvm/llvm-project/commit/03663e4130d700c6c8ea28b357fcac4d31b617f7.diff

LOG: [AMDGPU] Add occupancy level tests for GFX10.3. NFC.

getMaxWavesPerEU and getVGPRAllocGranule both changed in GFX10.3 and
they both affect the occupancy calculation.

Differential Revision: https://reviews.llvm.org/D92839

Added: 


Modified: 
llvm/test/CodeGen/AMDGPU/occupancy-levels.ll

Removed: 




diff  --git a/llvm/test/CodeGen/AMDGPU/occupancy-levels.ll 
b/llvm/test/CodeGen/AMDGPU/occupancy-levels.ll
index db70c3d9387d..25e0376dd7ee 100644
--- a/llvm/test/CodeGen/AMDGPU/occupancy-levels.ll
+++ b/llvm/test/CodeGen/AMDGPU/occupancy-levels.ll
@@ -1,18 +1,21 @@
 ; RUN: llc -march=amdgcn -mcpu=gfx900 < %s | FileCheck 
--check-prefixes=GCN,GFX9 %s
-; RUN: llc -march=amdgcn -mcpu=gfx1010 < %s | FileCheck 
--check-prefixes=GCN,GFX1010,GFX1010W32 %s
-; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize64 < %s | 
FileCheck --check-prefixes=GCN,GFX1010,GFX1010W64 %s
+; RUN: llc -march=amdgcn -mcpu=gfx1010 < %s | FileCheck 
--check-prefixes=GCN,GFX10,GFX10W32,GFX1010,GFX1010W32 %s
+; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize64 < %s | 
FileCheck --check-prefixes=GCN,GFX10,GFX10W64,GFX1010,GFX1010W64 %s
+; RUN: llc -march=amdgcn -mcpu=gfx1030 < %s | FileCheck 
--check-prefixes=GCN,GFX10,GFX10W32,GFX1030,GFX1030W32 %s
+; RUN: llc -march=amdgcn -mcpu=gfx1030 -mattr=+wavefrontsize64 < %s | 
FileCheck --check-prefixes=GCN,GFX10,GFX10W64,GFX1030,GFX1030W64 %s
 
 ; GCN-LABEL: {{^}}max_occupancy:
 ; GFX9:   ; Occupancy: 10
 ; GFX1010:; Occupancy: 20
+; GFX1030:; Occupancy: 16
 define amdgpu_kernel void @max_occupancy() {
   ret void
 }
 
 ; GCN-LABEL: {{^}}limited_occupancy_3:
 ; GFX9:   ; Occupancy: 3
-; GFX1010W64: ; Occupancy: 3
-; GFX1010W32: ; Occupancy: 4
+; GFX10W64:   ; Occupancy: 3
+; GFX10W32:   ; Occupancy: 4
 define amdgpu_kernel void @limited_occupancy_3() #0 {
   ret void
 }
@@ -20,6 +23,7 @@ define amdgpu_kernel void @limited_occupancy_3() #0 {
 ; GCN-LABEL: {{^}}limited_occupancy_18:
 ; GFX9:   ; Occupancy: 10
 ; GFX1010:; Occupancy: 18
+; GFX1030:; Occupancy: 16
 define amdgpu_kernel void @limited_occupancy_18() #1 {
   ret void
 }
@@ -27,6 +31,7 @@ define amdgpu_kernel void @limited_occupancy_18() #1 {
 ; GCN-LABEL: {{^}}limited_occupancy_19:
 ; GFX9:   ; Occupancy: 10
 ; GFX1010:; Occupancy: 18
+; GFX1030:; Occupancy: 16
 define amdgpu_kernel void @limited_occupancy_19() #2 {
   ret void
 }
@@ -34,6 +39,7 @@ define amdgpu_kernel void @limited_occupancy_19() #2 {
 ; GCN-LABEL: {{^}}used_24_vgprs:
 ; GFX9:   ; Occupancy: 10
 ; GFX1010:; Occupancy: 20
+; GFX1030:; Occupancy: 16
 define amdgpu_kernel void @used_24_vgprs() {
   call void asm sideeffect "", "~{v23}" ()
   ret void
@@ -43,6 +49,7 @@ define amdgpu_kernel void @used_24_vgprs() {
 ; GFX9:   ; Occupancy: 9
 ; GFX1010W64: ; Occupancy: 18
 ; GFX1010W32: ; Occupancy: 20
+; GFX1030:; Occupancy: 16
 define amdgpu_kernel void @used_28_vgprs() {
   call void asm sideeffect "", "~{v27}" ()
   ret void
@@ -50,8 +57,9 @@ define amdgpu_kernel void @used_28_vgprs() {
 
 ; GCN-LABEL: {{^}}used_32_vgprs:
 ; GFX9:   ; Occupancy: 8
-; GFX1010W64: ; Occupancy: 16
+; GFX10W64:   ; Occupancy: 16
 ; GFX1010W32: ; Occupancy: 20
+; GFX1030W32: ; Occupancy: 16
 define amdgpu_kernel void @used_32_vgprs() {
   call void asm sideeffect "", "~{v31}" ()
   ret void
@@ -61,6 +69,8 @@ define amdgpu_kernel void @used_32_vgprs() {
 ; GFX9:   ; Occupancy: 7
 ; GFX1010W64: ; Occupancy: 14
 ; GFX1010W32: ; Occupancy: 20
+; GFX1030W64: ; Occupancy: 12
+; GFX1030W32: ; Occupancy: 16
 define amdgpu_kernel void @used_36_vgprs() {
   call void asm sideeffect "", "~{v35}" ()
   ret void
@@ -68,8 +78,9 @@ define amdgpu_kernel void @used_36_vgprs() {
 
 ; GCN-LABEL: {{^}}used_40_vgprs:
 ; GFX9:   ; Occupancy: 6
-; GFX1010W64: ; Occupancy: 12
+; GFX10W64:   ; Occupancy: 12
 ; GFX1010W32: ; Occupancy: 20
+; GFX1030W32: ; Occupancy: 16
 define amdgpu_kernel void @used_40_vgprs() {
   call void asm sideeffect "", "~{v39}" ()
   ret void
@@ -79,6 +90,8 @@ define amdgpu_kernel void @used_40_vgprs() {
 ; GFX9:   ; Occupancy: 5
 ; GFX1010W64: ; Occupancy: 11
 ; GFX1010W32: ; Occupancy: 20
+; GFX1030W64: ; Occupancy: 10
+; GFX1030W32: ; Occupancy: 16
 define amdgpu_kernel void @used_44_vgprs() {
   call void asm sideeffect "", "~{v43}" ()
   ret void
@@ -86,8 +99,9 @@ define amdgpu_kernel void @used_44_vgprs() {
 
 ; GCN-LABEL: {{^}}used_48_vgprs:
 ; GFX9:   ; Occupancy: 5
-; GFX1010W64: ; Occupancy: 10
+; GFX10W64:   ; Occupancy: 10
 ; GFX1010W32: ; Occupancy: 20
+; GFX1030W32: ; Occupancy: 16
 define 

[llvm-branch-commits] [llvm] 0f32e81 - [TableGen] Remove unused class RecordValResolver. NFC.

2020-12-03 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2020-12-03T13:36:58Z
New Revision: 0f32e81407d33ab8886081db5d8ed2c7407a15e8

URL: 
https://github.com/llvm/llvm-project/commit/0f32e81407d33ab8886081db5d8ed2c7407a15e8
DIFF: 
https://github.com/llvm/llvm-project/commit/0f32e81407d33ab8886081db5d8ed2c7407a15e8.diff

LOG: [TableGen] Remove unused class RecordValResolver. NFC.

Differential Revision: https://reviews.llvm.org/D92477

Added: 


Modified: 
llvm/include/llvm/TableGen/Record.h

Removed: 




diff  --git a/llvm/include/llvm/TableGen/Record.h 
b/llvm/include/llvm/TableGen/Record.h
index a26367a6fcc6..20b786dc6e42 100644
--- a/llvm/include/llvm/TableGen/Record.h
+++ b/llvm/include/llvm/TableGen/Record.h
@@ -2032,25 +2032,6 @@ class RecordResolver final : public Resolver {
   bool keepUnsetBits() const override { return true; }
 };
 
-/// Resolve all references to a specific RecordVal.
-//
-// TODO: This is used for resolving references to template arguments, in a
-//   rather inefficient way. Change those uses to resolve all template
-//   arguments simultaneously and get rid of this class.
-class RecordValResolver final : public Resolver {
-  const RecordVal *RV;
-
-public:
-  explicit RecordValResolver(Record , const RecordVal *RV)
-  : Resolver(), RV(RV) {}
-
-  Init *resolve(Init *VarName) override {
-if (VarName == RV->getNameInit())
-  return RV->getValue();
-return nullptr;
-  }
-};
-
 /// Delegate resolving to a sub-resolver, but shadow some variable names.
 class ShadowResolver final : public Resolver {
   Resolver 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 839c963 - [AMDGPU] Simplify some generation checks. NFC.

2020-12-01 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2020-12-01T10:15:32Z
New Revision: 839c9635edce4f6ed348b154a4e755ff8263d366

URL: 
https://github.com/llvm/llvm-project/commit/839c9635edce4f6ed348b154a4e755ff8263d366
DIFF: 
https://github.com/llvm/llvm-project/commit/839c9635edce4f6ed348b154a4e755ff8263d366.diff

LOG: [AMDGPU] Simplify some generation checks. NFC.

Added: 


Modified: 
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp 
b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index b8b747ea8f99..d1e5fe59e910 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -4866,7 +4866,7 @@ bool AMDGPUAsmParser::subtargetHasRegister(const 
MCRegisterInfo ,
   case AMDGPU::SRC_PRIVATE_BASE:
   case AMDGPU::SRC_PRIVATE_LIMIT:
   case AMDGPU::SRC_POPS_EXITING_WAVE_ID:
-return !isCI() && !isSI() && !isVI();
+return isGFX9Plus();
   case AMDGPU::TBA:
   case AMDGPU::TBA_LO:
   case AMDGPU::TBA_HI:
@@ -4877,7 +4877,7 @@ bool AMDGPUAsmParser::subtargetHasRegister(const 
MCRegisterInfo ,
   case AMDGPU::XNACK_MASK:
   case AMDGPU::XNACK_MASK_LO:
   case AMDGPU::XNACK_MASK_HI:
-return !isCI() && !isSI() && !isGFX10Plus() && hasXNACK();
+return (isVI() || isGFX9()) && hasXNACK();
   case AMDGPU::SGPR_NULL:
 return isGFX10Plus();
   default:



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] e20efa3 - [LegacyPM] Simplify PMTopLevelManager::collectLastUses. NFC.

2020-11-30 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2020-11-30T10:36:19Z
New Revision: e20efa3dd5c75a79a47d40335aee0f63261f9c5b

URL: 
https://github.com/llvm/llvm-project/commit/e20efa3dd5c75a79a47d40335aee0f63261f9c5b
DIFF: 
https://github.com/llvm/llvm-project/commit/e20efa3dd5c75a79a47d40335aee0f63261f9c5b.diff

LOG: [LegacyPM] Simplify PMTopLevelManager::collectLastUses. NFC.

Added: 


Modified: 
llvm/lib/IR/LegacyPassManager.cpp

Removed: 




diff  --git a/llvm/lib/IR/LegacyPassManager.cpp 
b/llvm/lib/IR/LegacyPassManager.cpp
index 8fd35ef975e2..544c56a789a3 100644
--- a/llvm/lib/IR/LegacyPassManager.cpp
+++ b/llvm/lib/IR/LegacyPassManager.cpp
@@ -685,16 +685,12 @@ PMTopLevelManager::setLastUser(ArrayRef 
AnalysisPasses, Pass *P) {
 /// Collect passes whose last user is P
 void PMTopLevelManager::collectLastUses(SmallVectorImpl ,
 Pass *P) {
-  DenseMap >::iterator DMI =
-InversedLastUser.find(P);
+  auto DMI = InversedLastUser.find(P);
   if (DMI == InversedLastUser.end())
 return;
 
-  SmallPtrSet  = DMI->second;
-  for (Pass *LUP : LU) {
-LastUses.push_back(LUP);
-  }
-
+  auto  = DMI->second;
+  LastUses.append(LU.begin(), LU.end());
 }
 
 AnalysisUsage *PMTopLevelManager::findAnalysisUsage(Pass *P) {



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 68ed644 - [LegacyPM] Avoid a redundant map lookup in setLastUser. NFC.

2020-11-27 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2020-11-27T10:42:01Z
New Revision: 68ed6447855632b954b55f63807481eaa44705df

URL: 
https://github.com/llvm/llvm-project/commit/68ed6447855632b954b55f63807481eaa44705df
DIFF: 
https://github.com/llvm/llvm-project/commit/68ed6447855632b954b55f63807481eaa44705df.diff

LOG: [LegacyPM] Avoid a redundant map lookup in setLastUser. NFC.

As a bonus this makes it (IMO) obvious that the iterator is not
invalidated, so remove the comment explaining that.

Added: 


Modified: 
llvm/lib/IR/LegacyPassManager.cpp

Removed: 




diff  --git a/llvm/lib/IR/LegacyPassManager.cpp 
b/llvm/lib/IR/LegacyPassManager.cpp
index bb2661d36b56..8fd35ef975e2 100644
--- a/llvm/lib/IR/LegacyPassManager.cpp
+++ b/llvm/lib/IR/LegacyPassManager.cpp
@@ -675,11 +675,9 @@ PMTopLevelManager::setLastUser(ArrayRef 
AnalysisPasses, Pass *P) {
 
 // If AP is the last user of other passes then make P last user of
 // such passes.
-for (auto LU : LastUser) {
+for (auto  : LastUser) {
   if (LU.second == AP)
-// DenseMap iterator is not invalidated here because
-// this is just updating existing entries.
-LastUser[LU.first] = P;
+LU.second = P;
 }
   }
 }



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 0d9166f - [LegacyPM] Remove unused undocumented parameter. NFC.

2020-11-27 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2020-11-27T10:41:38Z
New Revision: 0d9166ff79578c7e98cef8c554e1342ece8efee6

URL: 
https://github.com/llvm/llvm-project/commit/0d9166ff79578c7e98cef8c554e1342ece8efee6
DIFF: 
https://github.com/llvm/llvm-project/commit/0d9166ff79578c7e98cef8c554e1342ece8efee6.diff

LOG: [LegacyPM] Remove unused undocumented parameter. NFC.

The Direction parameter to AnalysisResolver::getAnalysisIfAvailable has
never been documented or used for anything.

Added: 


Modified: 
llvm/include/llvm/PassAnalysisSupport.h
llvm/lib/IR/LegacyPassManager.cpp
llvm/lib/IR/Pass.cpp

Removed: 




diff  --git a/llvm/include/llvm/PassAnalysisSupport.h 
b/llvm/include/llvm/PassAnalysisSupport.h
index 84df171d38d8..4e28466c4968 100644
--- a/llvm/include/llvm/PassAnalysisSupport.h
+++ b/llvm/include/llvm/PassAnalysisSupport.h
@@ -183,7 +183,7 @@ class AnalysisResolver {
   }
 
   /// Return analysis result or null if it doesn't exist.
-  Pass *getAnalysisIfAvailable(AnalysisID ID, bool Direction) const;
+  Pass *getAnalysisIfAvailable(AnalysisID ID) const;
 
 private:
   /// This keeps track of which passes implements the interfaces that are
@@ -207,7 +207,7 @@ AnalysisType *Pass::getAnalysisIfAvailable() const {
 
   const void *PI = ::ID;
 
-  Pass *ResultPass = Resolver->getAnalysisIfAvailable(PI, true);
+  Pass *ResultPass = Resolver->getAnalysisIfAvailable(PI);
   if (!ResultPass) return nullptr;
 
   // Because the AnalysisType may not be a subclass of pass (for

diff  --git a/llvm/lib/IR/LegacyPassManager.cpp 
b/llvm/lib/IR/LegacyPassManager.cpp
index 7f94d42d6ecd..bb2661d36b56 100644
--- a/llvm/lib/IR/LegacyPassManager.cpp
+++ b/llvm/lib/IR/LegacyPassManager.cpp
@@ -1392,8 +1392,8 @@ PMDataManager::~PMDataManager() {
 
//===--===//
 // NOTE: Is this the right place to define this method ?
 // getAnalysisIfAvailable - Return analysis result or null if it doesn't exist.
-Pass *AnalysisResolver::getAnalysisIfAvailable(AnalysisID ID, bool dir) const {
-  return PM.findAnalysisPass(ID, dir);
+Pass *AnalysisResolver::getAnalysisIfAvailable(AnalysisID ID) const {
+  return PM.findAnalysisPass(ID, true);
 }
 
 std::tuple

diff  --git a/llvm/lib/IR/Pass.cpp b/llvm/lib/IR/Pass.cpp
index a815da2bdc51..0750501a92c4 100644
--- a/llvm/lib/IR/Pass.cpp
+++ b/llvm/lib/IR/Pass.cpp
@@ -62,7 +62,7 @@ bool ModulePass::skipModule(Module ) const {
 }
 
 bool Pass::mustPreserveAnalysisID(char ) const {
-  return Resolver->getAnalysisIfAvailable(, true) != nullptr;
+  return Resolver->getAnalysisIfAvailable() != nullptr;
 }
 
 // dumpPassStructure - Implement the -debug-pass=Structure option



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 4f87d30 - [AMDGPU] Introduce and use isGFX10Plus. NFC.

2020-11-26 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2020-11-26T09:02:36Z
New Revision: 4f87d30a06dd08cec45cb595e9dbed6345c9a7c5

URL: 
https://github.com/llvm/llvm-project/commit/4f87d30a06dd08cec45cb595e9dbed6345c9a7c5
DIFF: 
https://github.com/llvm/llvm-project/commit/4f87d30a06dd08cec45cb595e9dbed6345c9a7c5.diff

LOG: [AMDGPU] Introduce and use isGFX10Plus. NFC.

It's more future-proof to use isGFX10Plus from the start, on the
assumption that future architectures will be based on current
architectures.

Also make use of the existing isGFX9Plus in a few places.

Differential Revision: https://reviews.llvm.org/D92092

Added: 


Modified: 
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
llvm/lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 8148d0487802..137f6896c87b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -338,7 +338,7 @@ bool AMDGPUAsmPrinter::doFinalization(Module ) {
   // causing stale data in caches. Arguably this should be done by the linker,
   // which is why this isn't done for Mesa.
   const MCSubtargetInfo  = *getGlobalSTI();
-  if (AMDGPU::isGFX10(STI) &&
+  if (AMDGPU::isGFX10Plus(STI) &&
   (STI.getTargetTriple().getOS() == Triple::AMDHSA ||
STI.getTargetTriple().getOS() == Triple::AMDPAL)) {
 OutStreamer->SwitchSection(getObjFileLowering().getTextSection());

diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index 37a79ce4fa37..20b7c7849397 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -1485,7 +1485,7 @@ bool AMDGPUInstructionSelector::selectImageIntrinsic(
   const AMDGPU::MIMGMIPMappingInfo *MIPMappingInfo =
   AMDGPU::getMIMGMIPMappingInfo(Intr->BaseOpcode);
   unsigned IntrOpcode = Intr->BaseOpcode;
-  const bool IsGFX10 = STI.getGeneration() >= AMDGPUSubtarget::GFX10;
+  const bool IsGFX10Plus = AMDGPU::isGFX10Plus(STI);
 
   const unsigned ArgOffset = MI.getNumExplicitDefs() + 1;
 
@@ -1603,12 +1603,12 @@ bool AMDGPUInstructionSelector::selectImageIntrinsic(
 GLC = true; // TODO no-return optimization
 if (!parseCachePolicy(
 MI.getOperand(ArgOffset + Intr->CachePolicyIndex).getImm(), 
nullptr,
-, IsGFX10 ?  : nullptr))
+, IsGFX10Plus ?  : nullptr))
   return false;
   } else {
 if (!parseCachePolicy(
 MI.getOperand(ArgOffset + Intr->CachePolicyIndex).getImm(), ,
-, IsGFX10 ?  : nullptr))
+, IsGFX10Plus ?  : nullptr))
   return false;
   }
 
@@ -1641,7 +1641,7 @@ bool AMDGPUInstructionSelector::selectImageIntrinsic(
 ++NumVDataDwords;
 
   int Opcode = -1;
-  if (IsGFX10) {
+  if (IsGFX10Plus) {
 Opcode = AMDGPU::getMIMGOpcode(IntrOpcode,
UseNSA ? AMDGPU::MIMGEncGfx10NSA
   : AMDGPU::MIMGEncGfx10Default,
@@ -1693,22 +1693,22 @@ bool AMDGPUInstructionSelector::selectImageIntrinsic(
 
   MIB.addImm(DMask); // dmask
 
-  if (IsGFX10)
+  if (IsGFX10Plus)
 MIB.addImm(DimInfo->Encoding);
   MIB.addImm(Unorm);
-  if (IsGFX10)
+  if (IsGFX10Plus)
 MIB.addImm(DLC);
 
   MIB.addImm(GLC);
   MIB.addImm(SLC);
   MIB.addImm(IsA16 &&  // a16 or r128
  STI.hasFeature(AMDGPU::FeatureR128A16) ? -1 : 0);
-  if (IsGFX10)
+  if (IsGFX10Plus)
 MIB.addImm(IsA16 ? -1 : 0);
 
   MIB.addImm(TFE); // tfe
   MIB.addImm(LWE); // lwe
-  if (!IsGFX10)
+  if (!IsGFX10Plus)
 MIB.addImm(DimInfo->DA ? -1 : 0);
   if (BaseOpcode->HasD16)
 MIB.addImm(IsD16 ? -1 : 0);

diff  --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp 
b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 4f05ba5ab576..b8b747ea8f99 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -1232,6 +1232,8 @@ class AMDGPUAsmParser : public MCTargetAsmParser {
 return AMDGPU::isGFX10(getSTI());
   }
 
+  bool isGFX10Plus() const { return AMDGPU::isGFX10Plus(getSTI()); }
+
   bool isGFX10_BEncoding() const {
 return AMDGPU::isGFX10_BEncoding(getSTI());
   }
@@ -1248,9 +1250,7 @@ class AMDGPUAsmParser : public MCTargetAsmParser {
 return !isVI() && !isGFX9();
   }
 
-  bool 

[llvm-branch-commits] [llvm] 000400c - Fix speling in comments. NFC.

2020-11-23 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2020-11-23T14:43:24Z
New Revision: 000400ca0aeb32e347eefd110a4ed58ebc23d333

URL: 
https://github.com/llvm/llvm-project/commit/000400ca0aeb32e347eefd110a4ed58ebc23d333
DIFF: 
https://github.com/llvm/llvm-project/commit/000400ca0aeb32e347eefd110a4ed58ebc23d333.diff

LOG: Fix speling in comments. NFC.

Added: 


Modified: 
llvm/include/llvm/ADT/DenseMap.h
llvm/lib/Analysis/GlobalsModRef.cpp
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
llvm/lib/Target/AMDGPU/SIDefines.h

Removed: 




diff  --git a/llvm/include/llvm/ADT/DenseMap.h 
b/llvm/include/llvm/ADT/DenseMap.h
index 34d397cc9793..42e4fc84175c 100644
--- a/llvm/include/llvm/ADT/DenseMap.h
+++ b/llvm/include/llvm/ADT/DenseMap.h
@@ -954,7 +954,7 @@ class SmallDenseMap
   std::swap(*LHSB, *RHSB);
   continue;
 }
-// Swap separately and handle any assymetry.
+// Swap separately and handle any asymmetry.
 std::swap(LHSB->getFirst(), RHSB->getFirst());
 if (hasLHSValue) {
   ::new (>getSecond()) ValueT(std::move(LHSB->getSecond()));

diff  --git a/llvm/lib/Analysis/GlobalsModRef.cpp 
b/llvm/lib/Analysis/GlobalsModRef.cpp
index 37a345885b33..1a42c69b8b66 100644
--- a/llvm/lib/Analysis/GlobalsModRef.cpp
+++ b/llvm/lib/Analysis/GlobalsModRef.cpp
@@ -44,7 +44,7 @@ STATISTIC(NumIndirectGlobalVars, "Number of indirect global 
objects");
 // An option to enable unsafe alias results from the GlobalsModRef analysis.
 // When enabled, GlobalsModRef will provide no-alias results which in extremely
 // rare cases may not be conservatively correct. In particular, in the face of
-// transforms which cause assymetry between how effective getUnderlyingObject
+// transforms which cause asymmetry between how effective getUnderlyingObject
 // is for two pointers, it may produce incorrect results.
 //
 // These unsafe results have been returned by GMR for many years without

diff  --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 351f532ad4a3..cbbb0755b124 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -1649,7 +1649,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction 
,
 // If the prologue didn't contain any SEH opcodes and didn't set the
 // MF.hasWinCFI() flag, assume the epilogue won't either, and skip the
 // EpilogStart - to avoid generating CFI for functions that don't need it.
-// (And as we didn't generate any prologue at all, it would be assymetrical
+// (And as we didn't generate any prologue at all, it would be asymmetrical
 // to the epilogue.) By the end of the function, we assert that
 // HasWinCFI is equal to MF.hasWinCFI(), to verify this assumption.
 HasWinCFI = true;

diff  --git a/llvm/lib/Target/AMDGPU/SIDefines.h 
b/llvm/lib/Target/AMDGPU/SIDefines.h
index 0abd96dc4607..65c486ef73e2 100644
--- a/llvm/lib/Target/AMDGPU/SIDefines.h
+++ b/llvm/lib/Target/AMDGPU/SIDefines.h
@@ -33,7 +33,7 @@ enum : uint64_t {
   VOP2 = 1 << 8,
   VOPC = 1 << 9,
 
- // TODO: Should this be spilt into VOP3 a and b?
+  // TODO: Should this be spilt into VOP3 a and b?
   VOP3 = 1 << 10,
   VOP3P = 1 << 12,
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits