[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)
jayfoad wrote: Too late to backport - no more 18.x releases are planned. https://github.com/llvm/llvm-project/pull/90582 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/90582 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)
jayfoad wrote: > Fixed encoding of AMDGPU instructions I don't think the release notes should say that. It makes it sound like all encodings were wrong. https://github.com/llvm/llvm-project/pull/91034 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)
https://github.com/jayfoad approved this pull request. https://github.com/llvm/llvm-project/pull/91034 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815) (PR #90204)
jayfoad wrote: > Hi @jayfoad (or anyone else). If you would like to add a note about this fix > in the release notes (completely optional). Please reply to this comment with > a one or two sentence description of the fix. When you are done, please add > the release:note label to this PR. I don't think this fix is particularly noteworthy. Would there already be a list of bugs fixed in the release notes? https://github.com/llvm/llvm-project/pull/90204 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)
jayfoad wrote: > Let's not backport this yet since @pendingchaos has pointed out a problem > with #90201. Fixed by #90710 which I have added to this PR. https://github.com/llvm/llvm-project/pull/90582 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)
https://github.com/jayfoad ready_for_review https://github.com/llvm/llvm-project/pull/90582 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/90582 >From 17b75a9517891d662e677a357713c920bb79c43c Mon Sep 17 00:00:00 2001 From: David Stuttard Date: Tue, 30 Apr 2024 10:41:51 +0100 Subject: [PATCH 1/2] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) image_msaa_load is actually encoded as a VSAMPLE instruction and requires the appropriate waitcnt variant. --- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 8 -- .../AMDGPU/llvm.amdgcn.image.msaa.load.ll | 26 +-- 2 files changed, 19 insertions(+), 15 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index 6ecb1c8bf6e1db..97c55e4d9e41c2 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -187,8 +187,12 @@ VmemType getVmemType(const MachineInstr ) { const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(Inst.getOpcode()); const AMDGPU::MIMGBaseOpcodeInfo *BaseInfo = AMDGPU::getMIMGBaseOpcodeInfo(Info->BaseOpcode); - return BaseInfo->BVH ? VMEM_BVH - : BaseInfo->Sampler ? VMEM_SAMPLER : VMEM_NOSAMPLER; + // The test for MSAA here is because gfx12+ image_msaa_load is actually + // encoded as VSAMPLE and requires the appropriate s_waitcnt variant for that. + // Pre-gfx12 doesn't care since all vmem types result in the same s_waitcnt. + return BaseInfo->BVH ? VMEM_BVH + : BaseInfo->Sampler || BaseInfo->MSAA ? VMEM_SAMPLER + : VMEM_NOSAMPLER; } unsigned (AMDGPU::Waitcnt , InstCounterType T) { diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll index 1348315e72e7bc..8da48551855570 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll @@ -12,7 +12,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg %rsrc, i32 %s, i32 %t, ; GFX12-LABEL: load_2dmsaa: ; GFX12: ; %bb.0: ; %main_body ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2], s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_2D_MSAA unorm ; encoding: [0x06,0x20,0x46,0xe4,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x00] -; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf] +; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf] ; GFX12-NEXT:; return to shader part epilog main_body: %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2dmsaa.v4f32.i32(i32 1, i32 %s, i32 %t, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0) @@ -32,7 +32,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa_both(<8 x i32> inreg %rsrc, ptr addrsp ; GFX12: ; %bb.0: ; %main_body ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2], s[0:7] dmask:0x2 dim:SQ_RSRC_IMG_2D_MSAA unorm tfe lwe ; encoding: [0x0e,0x20,0x86,0xe4,0x00,0x01,0x00,0x00,0x00,0x01,0x02,0x00] ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e] -; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf] +; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf] ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: [0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00] ; GFX12-NEXT:; return to shader part epilog main_body: @@ -53,7 +53,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> inreg %rsrc, i32 %s, i3 ; GFX12-LABEL: load_2darraymsaa: ; GFX12: ; %bb.0: ; %main_body ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2, v3], s[0:7] dmask:0x4 dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm ; encoding: [0x07,0x20,0x06,0xe5,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03] -; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf] +; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf] ; GFX12-NEXT:; return to shader part epilog main_body: %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2darraymsaa.v4f32.i32(i32 4, i32 %s, i32 %t, i32 %slice, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0) @@ -73,7 +73,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa_tfe(<8 x i32> inreg %rsrc, ptr ad ; GFX12: ; %bb.0: ; %main_body ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2, v3], s[0:7] dmask:0x8 dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm tfe ; encoding: [0x0f,0x20,0x06,0xe6,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03] ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e] -; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf] +; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf] ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: [0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00] ; GFX12-NEXT:; return to shader part epilog main_body: @@ -94,7 +94,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa_glc(<8 x i32> inreg %rsrc, i32 %s, i32 ; GFX12-LABEL: load_2dmsaa_glc: ; GFX12: ;
[llvm-branch-commits] [llvm] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) (PR #90719)
https://github.com/jayfoad milestoned https://github.com/llvm/llvm-project/pull/90719 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) (PR #90719)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/90719 Code to determine if a waitcnt is required before a barrier instruction only considered S_BARRIER. gfx12 adds barrier_signal/wait so need to enhance the existing code to look for a barrier start (which is just an S_BARRIER for earlier architectures). >From e31113098e4669850f3ff924bead9e0fb9618f20 Mon Sep 17 00:00:00 2001 From: David Stuttard Date: Wed, 1 May 2024 11:37:13 +0100 Subject: [PATCH] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) Code to determine if a waitcnt is required before a barrier instruction only considered S_BARRIER. gfx12 adds barrier_signal/wait so need to enhance the existing code to look for a barrier start (which is just an S_BARRIER for earlier architectures). --- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 2 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h | 11 ++ .../CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll | 2 ++ .../AMDGPU/llvm.amdgcn.s.barrier.wait.ll | 22 +++ 4 files changed, 36 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index 6ecb1c8bf6e1db..7a3198612f86fc 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -1832,7 +1832,7 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr , // not, we need to ensure the subtarget is capable of backing off barrier // instructions in case there are any outstanding memory operations that may // cause an exception. Otherwise, insert an explicit S_WAITCNT 0 here. - if (MI.getOpcode() == AMDGPU::S_BARRIER && + if (TII->isBarrierStart(MI.getOpcode()) && !ST->hasAutoWaitcntBeforeBarrier() && !ST->supportsBackOffBarrier()) { Wait = Wait.combined( AMDGPU::Waitcnt::allZero(ST->hasExtendedWaitCounts(), ST->hasVscnt())); diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h index 1c9dacc09f8154..626d903c0c6958 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h @@ -908,6 +908,17 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo { return MI.getDesc().TSFlags & SIInstrFlags::IsNeverUniform; } + // Check to see if opcode is for a barrier start. Pre gfx12 this is just the + // S_BARRIER, but after support for S_BARRIER_SIGNAL* / S_BARRIER_WAIT we want + // to check for the barrier start (S_BARRIER_SIGNAL*) + bool isBarrierStart(unsigned Opcode) const { +return Opcode == AMDGPU::S_BARRIER || + Opcode == AMDGPU::S_BARRIER_SIGNAL_M0 || + Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_M0 || + Opcode == AMDGPU::S_BARRIER_SIGNAL_IMM || + Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_IMM; + } + static bool doesNotReadTiedSource(const MachineInstr ) { return MI.getDesc().TSFlags & SIInstrFlags::TiedSourceNotRead; } diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll index a7d3115af29bff..47c021769aa56f 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll @@ -96,6 +96,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) %out, i32 %size) #0 { ; VARIANT4-NEXT:s_wait_kmcnt 0x0 ; VARIANT4-NEXT:v_xad_u32 v1, v0, -1, s2 ; VARIANT4-NEXT:global_store_b32 v3, v0, s[0:1] +; VARIANT4-NEXT:s_wait_storecnt 0x0 ; VARIANT4-NEXT:s_barrier_signal -1 ; VARIANT4-NEXT:s_barrier_wait -1 ; VARIANT4-NEXT:v_ashrrev_i32_e32 v2, 31, v1 @@ -142,6 +143,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) %out, i32 %size) #0 { ; VARIANT6-NEXT:v_dual_mov_b32 v4, s1 :: v_dual_mov_b32 v3, s0 ; VARIANT6-NEXT:v_sub_nc_u32_e32 v1, s2, v0 ; VARIANT6-NEXT:global_store_b32 v5, v0, s[0:1] +; VARIANT6-NEXT:s_wait_storecnt 0x0 ; VARIANT6-NEXT:s_barrier_signal -1 ; VARIANT6-NEXT:s_barrier_wait -1 ; VARIANT6-NEXT:v_ashrrev_i32_e32 v2, 31, v1 diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll index 4ab5e97964a857..38a34ec6daf73c 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll @@ -12,6 +12,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr addrspace(1) %out) #0 { ; GCN-NEXT:v_sub_nc_u32_e32 v0, v1, v0 ; GCN-NEXT:s_wait_kmcnt 0x0 ; GCN-NEXT:global_store_b32 v3, v2, s[0:1] +; GCN-NEXT:s_wait_storecnt 0x0 ; GCN-NEXT:s_barrier_signal -1 ; GCN-NEXT:s_barrier_wait -1 ; GCN-NEXT:global_store_b32 v3, v0, s[0:1] @@ -28,6 +29,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr addrspace(1) %out) #0 { ; GLOBAL-ISEL-NEXT:v_sub_nc_u32_e32 v0, v1, v0 ; GLOBAL-ISEL-NEXT:s_wait_kmcnt 0x0 ; GLOBAL-ISEL-NEXT:
[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)
https://github.com/jayfoad converted_to_draft https://github.com/llvm/llvm-project/pull/90582 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)
jayfoad wrote: Let's not backport this yet since @pendingchaos has pointed out a problem with #90201. https://github.com/llvm/llvm-project/pull/90582 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)
https://github.com/jayfoad milestoned https://github.com/llvm/llvm-project/pull/90582 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/90582 image_msaa_load is actually encoded as a VSAMPLE instruction and requires the appropriate waitcnt variant. >From 17b75a9517891d662e677a357713c920bb79c43c Mon Sep 17 00:00:00 2001 From: David Stuttard Date: Tue, 30 Apr 2024 10:41:51 +0100 Subject: [PATCH] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) image_msaa_load is actually encoded as a VSAMPLE instruction and requires the appropriate waitcnt variant. --- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 8 -- .../AMDGPU/llvm.amdgcn.image.msaa.load.ll | 26 +-- 2 files changed, 19 insertions(+), 15 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index 6ecb1c8bf6e1db..97c55e4d9e41c2 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -187,8 +187,12 @@ VmemType getVmemType(const MachineInstr ) { const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(Inst.getOpcode()); const AMDGPU::MIMGBaseOpcodeInfo *BaseInfo = AMDGPU::getMIMGBaseOpcodeInfo(Info->BaseOpcode); - return BaseInfo->BVH ? VMEM_BVH - : BaseInfo->Sampler ? VMEM_SAMPLER : VMEM_NOSAMPLER; + // The test for MSAA here is because gfx12+ image_msaa_load is actually + // encoded as VSAMPLE and requires the appropriate s_waitcnt variant for that. + // Pre-gfx12 doesn't care since all vmem types result in the same s_waitcnt. + return BaseInfo->BVH ? VMEM_BVH + : BaseInfo->Sampler || BaseInfo->MSAA ? VMEM_SAMPLER + : VMEM_NOSAMPLER; } unsigned (AMDGPU::Waitcnt , InstCounterType T) { diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll index 1348315e72e7bc..8da48551855570 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll @@ -12,7 +12,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg %rsrc, i32 %s, i32 %t, ; GFX12-LABEL: load_2dmsaa: ; GFX12: ; %bb.0: ; %main_body ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2], s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_2D_MSAA unorm ; encoding: [0x06,0x20,0x46,0xe4,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x00] -; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf] +; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf] ; GFX12-NEXT:; return to shader part epilog main_body: %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2dmsaa.v4f32.i32(i32 1, i32 %s, i32 %t, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0) @@ -32,7 +32,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa_both(<8 x i32> inreg %rsrc, ptr addrsp ; GFX12: ; %bb.0: ; %main_body ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2], s[0:7] dmask:0x2 dim:SQ_RSRC_IMG_2D_MSAA unorm tfe lwe ; encoding: [0x0e,0x20,0x86,0xe4,0x00,0x01,0x00,0x00,0x00,0x01,0x02,0x00] ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e] -; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf] +; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf] ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: [0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00] ; GFX12-NEXT:; return to shader part epilog main_body: @@ -53,7 +53,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> inreg %rsrc, i32 %s, i3 ; GFX12-LABEL: load_2darraymsaa: ; GFX12: ; %bb.0: ; %main_body ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2, v3], s[0:7] dmask:0x4 dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm ; encoding: [0x07,0x20,0x06,0xe5,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03] -; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf] +; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf] ; GFX12-NEXT:; return to shader part epilog main_body: %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2darraymsaa.v4f32.i32(i32 4, i32 %s, i32 %t, i32 %slice, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0) @@ -73,7 +73,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa_tfe(<8 x i32> inreg %rsrc, ptr ad ; GFX12: ; %bb.0: ; %main_body ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2, v3], s[0:7] dmask:0x8 dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm tfe ; encoding: [0x0f,0x20,0x06,0xe6,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03] ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e] -; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf] +; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf] ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: [0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00] ; GFX12-NEXT:; return to shader part epilog main_body: @@ -94,7 +94,7 @@ define amdgpu_ps <4 x float>
[llvm-branch-commits] [llvm] b544217 - [AMDGPU] Fix setting nontemporal in memory legalizer (#83815)
Author: Mirko BrkuĊĦanin Date: 2024-04-26T13:35:58+01:00 New Revision: b544217fb31ffafb9b072de53a28c71acc169cf8 URL: https://github.com/llvm/llvm-project/commit/b544217fb31ffafb9b072de53a28c71acc169cf8 DIFF: https://github.com/llvm/llvm-project/commit/b544217fb31ffafb9b072de53a28c71acc169cf8.diff LOG: [AMDGPU] Fix setting nontemporal in memory legalizer (#83815) Iterator MI can advance in insertWait() but we need original instruction to set temporal hint. Just move it before handling volatile. Added: Modified: llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll llvm/test/CodeGen/AMDGPU/memory-legalizer-global-nontemporal.ll llvm/test/CodeGen/AMDGPU/memory-legalizer-local-nontemporal.ll llvm/test/CodeGen/AMDGPU/memory-legalizer-private-nontemporal.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp index 84b9330ef9633e..50d8bfa8750818 100644 --- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp +++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp @@ -2358,6 +2358,11 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal( bool Changed = false; + if (IsNonTemporal) { +// Set non-temporal hint for all cache levels. +Changed |= setTH(MI, AMDGPU::CPol::TH_NT); + } + if (IsVolatile) { Changed |= setScope(MI, AMDGPU::CPol::SCOPE_SYS); @@ -2370,11 +2375,6 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal( Position::AFTER); } - if (IsNonTemporal) { -// Set non-temporal hint for all cache levels. -Changed |= setTH(MI, AMDGPU::CPol::TH_NT); - } - return Changed; } diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll index a59c0394bebe20..ca7486536cf556 100644 --- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll +++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll @@ -582,5 +582,170 @@ entry: ret void } +define amdgpu_kernel void @flat_nontemporal_volatile_load( +; GFX7-LABEL: flat_nontemporal_volatile_load: +; GFX7: ; %bb.0: ; %entry +; GFX7-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0 +; GFX7-NEXT:s_waitcnt lgkmcnt(0) +; GFX7-NEXT:v_mov_b32_e32 v0, s0 +; GFX7-NEXT:v_mov_b32_e32 v1, s1 +; GFX7-NEXT:flat_load_dword v2, v[0:1] glc +; GFX7-NEXT:s_waitcnt vmcnt(0) +; GFX7-NEXT:v_mov_b32_e32 v0, s2 +; GFX7-NEXT:v_mov_b32_e32 v1, s3 +; GFX7-NEXT:s_waitcnt lgkmcnt(0) +; GFX7-NEXT:flat_store_dword v[0:1], v2 +; GFX7-NEXT:s_endpgm +; +; GFX10-WGP-LABEL: flat_nontemporal_volatile_load: +; GFX10-WGP: ; %bb.0: ; %entry +; GFX10-WGP-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0 +; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0) +; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s0 +; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s1 +; GFX10-WGP-NEXT:flat_load_dword v2, v[0:1] glc dlc +; GFX10-WGP-NEXT:s_waitcnt vmcnt(0) +; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s2 +; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s3 +; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0) +; GFX10-WGP-NEXT:flat_store_dword v[0:1], v2 +; GFX10-WGP-NEXT:s_endpgm +; +; GFX10-CU-LABEL: flat_nontemporal_volatile_load: +; GFX10-CU: ; %bb.0: ; %entry +; GFX10-CU-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0 +; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0) +; GFX10-CU-NEXT:v_mov_b32_e32 v0, s0 +; GFX10-CU-NEXT:v_mov_b32_e32 v1, s1 +; GFX10-CU-NEXT:flat_load_dword v2, v[0:1] glc dlc +; GFX10-CU-NEXT:s_waitcnt vmcnt(0) +; GFX10-CU-NEXT:v_mov_b32_e32 v0, s2 +; GFX10-CU-NEXT:v_mov_b32_e32 v1, s3 +; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0) +; GFX10-CU-NEXT:flat_store_dword v[0:1], v2 +; GFX10-CU-NEXT:s_endpgm +; +; SKIP-CACHE-INV-LABEL: flat_nontemporal_volatile_load: +; SKIP-CACHE-INV: ; %bb.0: ; %entry +; SKIP-CACHE-INV-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x0 +; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0) +; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s0 +; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s1 +; SKIP-CACHE-INV-NEXT:flat_load_dword v2, v[0:1] glc +; SKIP-CACHE-INV-NEXT:s_waitcnt vmcnt(0) +; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s2 +; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s3 +; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0) +; SKIP-CACHE-INV-NEXT:flat_store_dword v[0:1], v2 +; SKIP-CACHE-INV-NEXT:s_endpgm +; +; GFX90A-NOTTGSPLIT-LABEL: flat_nontemporal_volatile_load: +; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry +; GFX90A-NOTTGSPLIT-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0 +; GFX90A-NOTTGSPLIT-NEXT:s_waitcnt lgkmcnt(0) +; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v0, s0 +; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v1, s1 +; GFX90A-NOTTGSPLIT-NEXT:flat_load_dword v2, v[0:1] glc +; GFX90A-NOTTGSPLIT-NEXT:s_waitcnt vmcnt(0)
[llvm-branch-commits] [llvm] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815) (PR #90204)
https://github.com/jayfoad milestoned https://github.com/llvm/llvm-project/pull/90204 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815) (PR #90204)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/90204 Iterator MI can advance in insertWait() but we need original instruction to set temporal hint. Just move it before handling volatile. >From b544217fb31ffafb9b072de53a28c71acc169cf8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mirko=20Brku=C5=A1anin?= Date: Mon, 4 Mar 2024 15:05:31 +0100 Subject: [PATCH] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815) Iterator MI can advance in insertWait() but we need original instruction to set temporal hint. Just move it before handling volatile. --- llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | 10 +- .../memory-legalizer-flat-nontemporal.ll | 165 ++ .../memory-legalizer-global-nontemporal.ll| 158 ++ .../memory-legalizer-local-nontemporal.ll | 179 +++ .../memory-legalizer-private-nontemporal.ll | 203 ++ 5 files changed, 710 insertions(+), 5 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp index 84b9330ef9633e..50d8bfa8750818 100644 --- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp +++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp @@ -2358,6 +2358,11 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal( bool Changed = false; + if (IsNonTemporal) { +// Set non-temporal hint for all cache levels. +Changed |= setTH(MI, AMDGPU::CPol::TH_NT); + } + if (IsVolatile) { Changed |= setScope(MI, AMDGPU::CPol::SCOPE_SYS); @@ -2370,11 +2375,6 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal( Position::AFTER); } - if (IsNonTemporal) { -// Set non-temporal hint for all cache levels. -Changed |= setTH(MI, AMDGPU::CPol::TH_NT); - } - return Changed; } diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll index a59c0394bebe20..ca7486536cf556 100644 --- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll +++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll @@ -582,5 +582,170 @@ entry: ret void } +define amdgpu_kernel void @flat_nontemporal_volatile_load( +; GFX7-LABEL: flat_nontemporal_volatile_load: +; GFX7: ; %bb.0: ; %entry +; GFX7-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0 +; GFX7-NEXT:s_waitcnt lgkmcnt(0) +; GFX7-NEXT:v_mov_b32_e32 v0, s0 +; GFX7-NEXT:v_mov_b32_e32 v1, s1 +; GFX7-NEXT:flat_load_dword v2, v[0:1] glc +; GFX7-NEXT:s_waitcnt vmcnt(0) +; GFX7-NEXT:v_mov_b32_e32 v0, s2 +; GFX7-NEXT:v_mov_b32_e32 v1, s3 +; GFX7-NEXT:s_waitcnt lgkmcnt(0) +; GFX7-NEXT:flat_store_dword v[0:1], v2 +; GFX7-NEXT:s_endpgm +; +; GFX10-WGP-LABEL: flat_nontemporal_volatile_load: +; GFX10-WGP: ; %bb.0: ; %entry +; GFX10-WGP-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0 +; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0) +; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s0 +; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s1 +; GFX10-WGP-NEXT:flat_load_dword v2, v[0:1] glc dlc +; GFX10-WGP-NEXT:s_waitcnt vmcnt(0) +; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s2 +; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s3 +; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0) +; GFX10-WGP-NEXT:flat_store_dword v[0:1], v2 +; GFX10-WGP-NEXT:s_endpgm +; +; GFX10-CU-LABEL: flat_nontemporal_volatile_load: +; GFX10-CU: ; %bb.0: ; %entry +; GFX10-CU-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0 +; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0) +; GFX10-CU-NEXT:v_mov_b32_e32 v0, s0 +; GFX10-CU-NEXT:v_mov_b32_e32 v1, s1 +; GFX10-CU-NEXT:flat_load_dword v2, v[0:1] glc dlc +; GFX10-CU-NEXT:s_waitcnt vmcnt(0) +; GFX10-CU-NEXT:v_mov_b32_e32 v0, s2 +; GFX10-CU-NEXT:v_mov_b32_e32 v1, s3 +; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0) +; GFX10-CU-NEXT:flat_store_dword v[0:1], v2 +; GFX10-CU-NEXT:s_endpgm +; +; SKIP-CACHE-INV-LABEL: flat_nontemporal_volatile_load: +; SKIP-CACHE-INV: ; %bb.0: ; %entry +; SKIP-CACHE-INV-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x0 +; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0) +; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s0 +; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s1 +; SKIP-CACHE-INV-NEXT:flat_load_dword v2, v[0:1] glc +; SKIP-CACHE-INV-NEXT:s_waitcnt vmcnt(0) +; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s2 +; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s3 +; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0) +; SKIP-CACHE-INV-NEXT:flat_store_dword v[0:1], v2 +; SKIP-CACHE-INV-NEXT:s_endpgm +; +; GFX90A-NOTTGSPLIT-LABEL: flat_nontemporal_volatile_load: +; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry +; GFX90A-NOTTGSPLIT-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0 +; GFX90A-NOTTGSPLIT-NEXT:s_waitcnt lgkmcnt(0) +; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v0, s0 +; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v1, s1 +; GFX90A-NOTTGSPLIT-NEXT:flat_load_dword v2, v[0:1] glc +;
[llvm-branch-commits] [llvm] release/18.x: Convert many LivePhysRegs uses to LiveRegUnits (PR #84118)
https://github.com/jayfoad requested changes to this pull request. > this isn't fixing any known correctness issue Exactly. I don't think there is any reason to backport this. https://github.com/llvm/llvm-project/pull/84118 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)
@@ -6883,6 +6883,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr , return true; } +bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr , + MachineIRBuilder ) const { + // With architected SGPRs, waveIDinGroup is in TTMP8[29:25]. + if (!ST.hasArchitectedSGPRs()) +return false; + LLT S32 = LLT::scalar(32); + Register DstReg = MI.getOperand(0).getReg(); + Register TTMP8 = + getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8, jayfoad wrote: True, 66c710ec9dcdbdec6cadd89b972d8945983dc92f improved this to avoid adding liveins. I wasn't going to bother backporting that since I didn't think it was required for correctness. But I have cherry-picked it into this PR now. https://github.com/llvm/llvm-project/pull/79839 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/79839 >From c265c8527285075a58b2425198dbd4cca8b69477 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Thu, 25 Jan 2024 07:48:06 + Subject: [PATCH 1/2] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) This is only valid on targets with architected SGPRs. --- llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 4 ++ .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 19 ++ llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h | 1 + llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 14 + llvm/lib/Target/AMDGPU/SIISelLowering.h | 1 + .../CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll | 61 +++ 6 files changed, 100 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index 9eb1ac8e27befb..c5f43d17d1c148 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -2777,6 +2777,10 @@ class AMDGPULoadTr: def int_amdgcn_global_load_tr : AMDGPULoadTr; +// i32 @llvm.amdgcn.wave.id() +def int_amdgcn_wave_id : + DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>; + //===--===// // Deep learning intrinsics. //===--===// diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index 615685822f91ee..e98ede88a7e2db 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -6883,6 +6883,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr , return true; } +bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr , + MachineIRBuilder ) const { + // With architected SGPRs, waveIDinGroup is in TTMP8[29:25]. + if (!ST.hasArchitectedSGPRs()) +return false; + LLT S32 = LLT::scalar(32); + Register DstReg = MI.getOperand(0).getReg(); + Register TTMP8 = + getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8, + AMDGPU::SReg_32RegClass, B.getDebugLoc(), S32); + auto LSB = B.buildConstant(S32, 25); + auto Width = B.buildConstant(S32, 5); + B.buildUbfx(DstReg, TTMP8, LSB, Width); + MI.eraseFromParent(); + return true; +} + bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper , MachineInstr ) const { MachineIRBuilder = Helper.MIRBuilder; @@ -7005,6 +7022,8 @@ bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper , case Intrinsic::amdgcn_workgroup_id_z: return legalizePreloadedArgIntrin(MI, MRI, B, AMDGPUFunctionArgInfo::WORKGROUP_ID_Z); + case Intrinsic::amdgcn_wave_id: +return legalizeWaveID(MI, B); case Intrinsic::amdgcn_lds_kernel_id: return legalizePreloadedArgIntrin(MI, MRI, B, AMDGPUFunctionArgInfo::LDS_KERNEL_ID); diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h index 56aabd4f6ab71b..ecbe42681c6690 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h @@ -212,6 +212,7 @@ class AMDGPULegalizerInfo final : public LegalizerInfo { bool legalizeFPTruncRound(MachineInstr , MachineIRBuilder ) const; bool legalizeStackSave(MachineInstr , MachineIRBuilder ) const; + bool legalizeWaveID(MachineInstr , MachineIRBuilder ) const; bool legalizeImageIntrinsic( MachineInstr , MachineIRBuilder , diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index d60f511302613e..c5ad9da88ec2b3 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -7920,6 +7920,18 @@ SDValue SITargetLowering::lowerSBuffer(EVT VT, SDLoc DL, SDValue Rsrc, return Loads[0]; } +SDValue SITargetLowering::lowerWaveID(SelectionDAG , SDValue Op) const { + // With architected SGPRs, waveIDinGroup is in TTMP8[29:25]. + if (!Subtarget->hasArchitectedSGPRs()) +return {}; + SDLoc SL(Op); + MVT VT = MVT::i32; + SDValue TTMP8 = CreateLiveInRegister(DAG, ::SReg_32RegClass, + AMDGPU::TTMP8, VT, SL); + return DAG.getNode(AMDGPUISD::BFE_U32, SL, VT, TTMP8, + DAG.getConstant(25, SL, VT), DAG.getConstant(5, SL, VT)); +} + SDValue SITargetLowering::lowerWorkitemID(SelectionDAG , SDValue Op, unsigned Dim, const ArgDescriptor ) const { @@ -8090,6 +8102,8 @@ SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op, case Intrinsic::amdgcn_workgroup_id_z: return getPreloadedValue(DAG, *MFI,
[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)
jayfoad wrote: > jayfoad closed this by deleting the head repository 3 hours ago Sorry. Recreated as #79839 https://github.com/llvm/llvm-project/pull/79689 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)
https://github.com/jayfoad milestoned https://github.com/llvm/llvm-project/pull/79839 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/79839 This just missed the branch creation and is the last piece of functionality required to get AMDGPU GFX12 support working in the 18.x release. >From c265c8527285075a58b2425198dbd4cca8b69477 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Thu, 25 Jan 2024 07:48:06 + Subject: [PATCH] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) This is only valid on targets with architected SGPRs. --- llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 4 ++ .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 19 ++ llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h | 1 + llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 14 + llvm/lib/Target/AMDGPU/SIISelLowering.h | 1 + .../CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll | 61 +++ 6 files changed, 100 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index 9eb1ac8e27befb1..c5f43d17d1c1481 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -2777,6 +2777,10 @@ class AMDGPULoadTr: def int_amdgcn_global_load_tr : AMDGPULoadTr; +// i32 @llvm.amdgcn.wave.id() +def int_amdgcn_wave_id : + DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>; + //===--===// // Deep learning intrinsics. //===--===// diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index 615685822f91eeb..e98ede88a7e2db9 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -6883,6 +6883,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr , return true; } +bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr , + MachineIRBuilder ) const { + // With architected SGPRs, waveIDinGroup is in TTMP8[29:25]. + if (!ST.hasArchitectedSGPRs()) +return false; + LLT S32 = LLT::scalar(32); + Register DstReg = MI.getOperand(0).getReg(); + Register TTMP8 = + getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8, + AMDGPU::SReg_32RegClass, B.getDebugLoc(), S32); + auto LSB = B.buildConstant(S32, 25); + auto Width = B.buildConstant(S32, 5); + B.buildUbfx(DstReg, TTMP8, LSB, Width); + MI.eraseFromParent(); + return true; +} + bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper , MachineInstr ) const { MachineIRBuilder = Helper.MIRBuilder; @@ -7005,6 +7022,8 @@ bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper , case Intrinsic::amdgcn_workgroup_id_z: return legalizePreloadedArgIntrin(MI, MRI, B, AMDGPUFunctionArgInfo::WORKGROUP_ID_Z); + case Intrinsic::amdgcn_wave_id: +return legalizeWaveID(MI, B); case Intrinsic::amdgcn_lds_kernel_id: return legalizePreloadedArgIntrin(MI, MRI, B, AMDGPUFunctionArgInfo::LDS_KERNEL_ID); diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h index 56aabd4f6ab71b6..ecbe42681c6690c 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h @@ -212,6 +212,7 @@ class AMDGPULegalizerInfo final : public LegalizerInfo { bool legalizeFPTruncRound(MachineInstr , MachineIRBuilder ) const; bool legalizeStackSave(MachineInstr , MachineIRBuilder ) const; + bool legalizeWaveID(MachineInstr , MachineIRBuilder ) const; bool legalizeImageIntrinsic( MachineInstr , MachineIRBuilder , diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index d60f511302613e1..c5ad9da88ec2b31 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -7920,6 +7920,18 @@ SDValue SITargetLowering::lowerSBuffer(EVT VT, SDLoc DL, SDValue Rsrc, return Loads[0]; } +SDValue SITargetLowering::lowerWaveID(SelectionDAG , SDValue Op) const { + // With architected SGPRs, waveIDinGroup is in TTMP8[29:25]. + if (!Subtarget->hasArchitectedSGPRs()) +return {}; + SDLoc SL(Op); + MVT VT = MVT::i32; + SDValue TTMP8 = CreateLiveInRegister(DAG, ::SReg_32RegClass, + AMDGPU::TTMP8, VT, SL); + return DAG.getNode(AMDGPUISD::BFE_U32, SL, VT, TTMP8, + DAG.getConstant(25, SL, VT), DAG.getConstant(5, SL, VT)); +} + SDValue SITargetLowering::lowerWorkitemID(SelectionDAG , SDValue Op, unsigned Dim, const ArgDescriptor ) const { @@ -8090,6 +8102,8
[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/79689 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)
jayfoad wrote: @tstellar does this backport PR look OK? I created it with `gh pr create -f -B release/18.x` and I wasn't sure if I had to edit anything, apart from adding the release milestone. https://github.com/llvm/llvm-project/pull/79689 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)
https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/79689 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)
https://github.com/jayfoad milestoned https://github.com/llvm/llvm-project/pull/79689 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/79689 This is only valid on targets with architected SGPRs. >From c5949b09b05e7417d0494b2301781b84d22b95ef Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Thu, 25 Jan 2024 07:48:06 + Subject: [PATCH] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) This is only valid on targets with architected SGPRs. --- llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 4 ++ .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 19 ++ llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h | 1 + llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 14 + llvm/lib/Target/AMDGPU/SIISelLowering.h | 1 + .../CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll | 61 +++ 6 files changed, 100 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index 9eb1ac8e27befb..c5f43d17d1c148 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -2777,6 +2777,10 @@ class AMDGPULoadTr: def int_amdgcn_global_load_tr : AMDGPULoadTr; +// i32 @llvm.amdgcn.wave.id() +def int_amdgcn_wave_id : + DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>; + //===--===// // Deep learning intrinsics. //===--===// diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index 32921bb248caf0..118c8b7c66690f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -6848,6 +6848,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr , return true; } +bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr , + MachineIRBuilder ) const { + // With architected SGPRs, waveIDinGroup is in TTMP8[29:25]. + if (!ST.hasArchitectedSGPRs()) +return false; + LLT S32 = LLT::scalar(32); + Register DstReg = MI.getOperand(0).getReg(); + Register TTMP8 = + getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8, + AMDGPU::SReg_32RegClass, B.getDebugLoc(), S32); + auto LSB = B.buildConstant(S32, 25); + auto Width = B.buildConstant(S32, 5); + B.buildUbfx(DstReg, TTMP8, LSB, Width); + MI.eraseFromParent(); + return true; +} + bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper , MachineInstr ) const { MachineIRBuilder = Helper.MIRBuilder; @@ -6970,6 +6987,8 @@ bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper , case Intrinsic::amdgcn_workgroup_id_z: return legalizePreloadedArgIntrin(MI, MRI, B, AMDGPUFunctionArgInfo::WORKGROUP_ID_Z); + case Intrinsic::amdgcn_wave_id: +return legalizeWaveID(MI, B); case Intrinsic::amdgcn_lds_kernel_id: return legalizePreloadedArgIntrin(MI, MRI, B, AMDGPUFunctionArgInfo::LDS_KERNEL_ID); diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h index 56aabd4f6ab71b..ecbe42681c6690 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h @@ -212,6 +212,7 @@ class AMDGPULegalizerInfo final : public LegalizerInfo { bool legalizeFPTruncRound(MachineInstr , MachineIRBuilder ) const; bool legalizeStackSave(MachineInstr , MachineIRBuilder ) const; + bool legalizeWaveID(MachineInstr , MachineIRBuilder ) const; bool legalizeImageIntrinsic( MachineInstr , MachineIRBuilder , diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index d35b76c8ad54eb..9cbcf0012ea878 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -7890,6 +7890,18 @@ SDValue SITargetLowering::lowerSBuffer(EVT VT, SDLoc DL, SDValue Rsrc, return Loads[0]; } +SDValue SITargetLowering::lowerWaveID(SelectionDAG , SDValue Op) const { + // With architected SGPRs, waveIDinGroup is in TTMP8[29:25]. + if (!Subtarget->hasArchitectedSGPRs()) +return {}; + SDLoc SL(Op); + MVT VT = MVT::i32; + SDValue TTMP8 = CreateLiveInRegister(DAG, ::SReg_32RegClass, + AMDGPU::TTMP8, VT, SL); + return DAG.getNode(AMDGPUISD::BFE_U32, SL, VT, TTMP8, + DAG.getConstant(25, SL, VT), DAG.getConstant(5, SL, VT)); +} + SDValue SITargetLowering::lowerWorkitemID(SelectionDAG , SDValue Op, unsigned Dim, const ArgDescriptor ) const { @@ -8060,6 +8072,8 @@ SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op, case
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#79451 (PR #79457)
jayfoad wrote: > @jayfoad What do you think about merging this PR to the release branch? LGTM, but it was me that requested it. https://github.com/llvm/llvm-project/pull/79457 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 14eea6b - [LegacyPM] Update InversedLastUser on the fly. NFC.
Author: Jay Foad Date: 2021-01-22T09:48:54Z New Revision: 14eea6b0ecddfe7d1c68754a8bfb7c21cde82df8 URL: https://github.com/llvm/llvm-project/commit/14eea6b0ecddfe7d1c68754a8bfb7c21cde82df8 DIFF: https://github.com/llvm/llvm-project/commit/14eea6b0ecddfe7d1c68754a8bfb7c21cde82df8.diff LOG: [LegacyPM] Update InversedLastUser on the fly. NFC. This speeds up setLastUser enough to give a 5% to 10% speed up on trivial invocations of opt and llc, as measured by: perf stat -r 100 opt -S -o /dev/null -O3 /dev/null perf stat -r 100 llc -march=amdgcn /dev/null -filetype null Don't dump last use information unless -debug-pass=Details to avoid printing lots of spam that will break some existing lit tests. Before this patch, dumping last use information was broken anyway, because it used InversedLastUser before it had been populated. Differential Revision: https://reviews.llvm.org/D92309 Added: Modified: llvm/include/llvm/IR/LegacyPassManagers.h llvm/lib/IR/LegacyPassManager.cpp Removed: diff --git a/llvm/include/llvm/IR/LegacyPassManagers.h b/llvm/include/llvm/IR/LegacyPassManagers.h index 498e736a0100..f4fae184e428 100644 --- a/llvm/include/llvm/IR/LegacyPassManagers.h +++ b/llvm/include/llvm/IR/LegacyPassManagers.h @@ -230,11 +230,11 @@ class PMTopLevelManager { // Map to keep track of last user of the analysis pass. // LastUser->second is the last user of Lastuser->first. + // This is kept in sync with InversedLastUser. DenseMap LastUser; // Map to keep track of passes that are last used by a pass. - // This inverse map is initialized at PM->run() based on - // LastUser map. + // This is kept in sync with LastUser. DenseMap > InversedLastUser; /// Immutable passes are managed by top level manager. diff --git a/llvm/lib/IR/LegacyPassManager.cpp b/llvm/lib/IR/LegacyPassManager.cpp index 5575bc469a87..4547c3a01239 100644 --- a/llvm/lib/IR/LegacyPassManager.cpp +++ b/llvm/lib/IR/LegacyPassManager.cpp @@ -568,7 +568,12 @@ PMTopLevelManager::setLastUser(ArrayRef AnalysisPasses, Pass *P) { PDepth = P->getResolver()->getPMDataManager().getDepth(); for (Pass *AP : AnalysisPasses) { -LastUser[AP] = P; +// Record P as the new last user of AP. +auto = LastUser[AP]; +if (LastUserOfAP) + InversedLastUser[LastUserOfAP].erase(AP); +LastUserOfAP = P; +InversedLastUser[P].insert(AP); if (P == AP) continue; @@ -598,13 +603,13 @@ PMTopLevelManager::setLastUser(ArrayRef AnalysisPasses, Pass *P) { if (P->getResolver()) setLastUser(LastPMUses, P->getResolver()->getPMDataManager().getAsPass()); - // If AP is the last user of other passes then make P last user of // such passes. -for (auto : LastUser) { - if (LU.second == AP) -LU.second = P; -} +auto = InversedLastUser[AP]; +for (Pass *L : LastUsedByAP) + LastUser[L] = P; +InversedLastUser[P].insert(LastUsedByAP.begin(), LastUsedByAP.end()); +LastUsedByAP.clear(); } } @@ -850,11 +855,6 @@ void PMTopLevelManager::initializeAllAnalysisInfo() { // Initailize other pass managers for (PMDataManager *IPM : IndirectPassManagers) IPM->initializeAnalysisInfo(); - - for (auto LU : LastUser) { -SmallPtrSet = InversedLastUser[LU.second]; -L.insert(LU.first); - } } /// Destructor @@ -1151,6 +1151,8 @@ Pass *PMDataManager::findAnalysisPass(AnalysisID AID, bool SearchParent) { // Print list of passes that are last used by P. void PMDataManager::dumpLastUses(Pass *P, unsigned Offset) const{ + if (PassDebugging < Details) +return; SmallVector LUses; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] c0b3c5a - [AMDGPU][GlobalISel] Run SIAddImgInit
Author: Jay Foad Date: 2021-01-21T15:54:54Z New Revision: c0b3c5a06451aad4351e35c74ccf2fe5da917a41 URL: https://github.com/llvm/llvm-project/commit/c0b3c5a06451aad4351e35c74ccf2fe5da917a41 DIFF: https://github.com/llvm/llvm-project/commit/c0b3c5a06451aad4351e35c74ccf2fe5da917a41.diff LOG: [AMDGPU][GlobalISel] Run SIAddImgInit This pass is required to get correct codegen for image instructions with the tfe or lwe bits set. Differential Revision: https://reviews.llvm.org/D95132 Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.2d.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.2darraymsaa.a16.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.2darraymsaa.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.3d.a16.ll llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.3d.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp index 58c436836d19..7d8e8486602b 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -1109,6 +1109,10 @@ bool GCNPassConfig::addRegBankSelect() { bool GCNPassConfig::addGlobalInstructionSelect() { addPass(new InstructionSelect()); + // TODO: Fix instruction selection to do the right thing for image + // instructions with tfe or lwe in the first place, instead of running a + // separate pass to fix them up? + addPass(createSIAddIMGInitPass()); return false; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll index 36f3e63598ca..99ab3580b91d 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll @@ -655,6 +655,7 @@ define amdgpu_ps <4 x half> @load_1d_v4f16_xyzw(<8 x i32> inreg %rsrc, i32 %s) { define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> inreg %rsrc, i32 %s) { ; GFX8-UNPACKED-LABEL: load_1d_f16_tfe_dmask_x: ; GFX8-UNPACKED: ; %bb.0: +; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v1, 0 ; GFX8-UNPACKED-NEXT:s_mov_b32 s0, s2 ; GFX8-UNPACKED-NEXT:s_mov_b32 s1, s3 ; GFX8-UNPACKED-NEXT:s_mov_b32 s2, s4 @@ -663,13 +664,15 @@ define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> inreg %rsrc, i32 %s) { ; GFX8-UNPACKED-NEXT:s_mov_b32 s5, s7 ; GFX8-UNPACKED-NEXT:s_mov_b32 s6, s8 ; GFX8-UNPACKED-NEXT:s_mov_b32 s7, s9 -; GFX8-UNPACKED-NEXT:image_load v[0:1], v0, s[0:7] dmask:0x1 unorm tfe d16 +; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v2, v1 +; GFX8-UNPACKED-NEXT:image_load v[1:2], v0, s[0:7] dmask:0x1 unorm tfe d16 ; GFX8-UNPACKED-NEXT:s_waitcnt vmcnt(0) -; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v0, v1 +; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v0, v2 ; GFX8-UNPACKED-NEXT:; return to shader part epilog ; ; GFX8-PACKED-LABEL: load_1d_f16_tfe_dmask_x: ; GFX8-PACKED: ; %bb.0: +; GFX8-PACKED-NEXT:v_mov_b32_e32 v1, 0 ; GFX8-PACKED-NEXT:s_mov_b32 s0, s2 ; GFX8-PACKED-NEXT:s_mov_b32 s1, s3 ; GFX8-PACKED-NEXT:s_mov_b32 s2, s4 @@ -678,13 +681,15 @@ define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> inreg %rsrc, i32 %s) { ; GFX8-PACKED-NEXT:s_mov_b32 s5, s7 ; GFX8-PACKED-NEXT:s_mov_b32 s6, s8 ; GFX8-PACKED-NEXT:s_mov_b32 s7, s9 -; GFX8-PACKED-NEXT:image_load v[0:1], v0, s[0:7] dmask:0x1 unorm tfe d16 +; GFX8-PACKED-NEXT:v_mov_b32_e32 v2, v1 +; GFX8-PACKED-NEXT:image_load v[1:2], v0, s[0:7] dmask:0x1 unorm tfe d16 ; GFX8-PACKED-NEXT:s_waitcnt vmcnt(0) -; GFX8-PACKED-NEXT:v_mov_b32_e32 v0, v1 +; GFX8-PACKED-NEXT:v_mov_b32_e32 v0, v2 ; GFX8-PACKED-NEXT:; return to shader part epilog ; ; GFX9-LABEL: load_1d_f16_tfe_dmask_x: ; GFX9: ; %bb.0: +; GFX9-NEXT:v_mov_b32_e32 v1, 0 ; GFX9-NEXT:s_mov_b32 s0, s2 ; GFX9-NEXT:s_mov_b32 s1, s3 ; GFX9-NEXT:s_mov_b32 s2, s4 @@ -693,13 +698,15 @@ define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> inreg %rsrc, i32 %s) { ; GFX9-NEXT:s_mov_b32 s5, s7 ; GFX9-NEXT:s_mov_b32 s6, s8 ; GFX9-NEXT:s_mov_b32 s7, s9 -; GFX9-NEXT:image_load v[0:1], v0, s[0:7] dmask:0x1 unorm tfe d16 +; GFX9-NEXT:v_mov_b32_e32 v2, v1 +; GFX9-NEXT:image_load v[1:2], v0, s[0:7] dmask:0x1 unorm tfe d16 ; GFX9-NEXT:s_waitcnt vmcnt(0) -; GFX9-NEXT:v_mov_b32_e32 v0, v1 +; GFX9-NEXT:v_mov_b32_e32 v0, v2 ; GFX9-NEXT:; return to shader part epilog ; ; GFX10-LABEL: load_1d_f16_tfe_dmask_x: ; GFX10: ; %bb.0: +; GFX10-NEXT:v_mov_b32_e32 v1, 0 ; GFX10-NEXT:
[llvm-branch-commits] [llvm] 18cb744 - [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.
Author: Jay Foad Date: 2021-01-19T18:47:14Z New Revision: 18cb7441b69a22565dcc340bac0e58bc9f301439 URL: https://github.com/llvm/llvm-project/commit/18cb7441b69a22565dcc340bac0e58bc9f301439 DIFF: https://github.com/llvm/llvm-project/commit/18cb7441b69a22565dcc340bac0e58bc9f301439.diff LOG: [AMDGPU] Simpler names for arch-specific ttmp registers. NFC. Rename the *_gfx9_gfx10 ttmp registers to *_gfx9plus for simplicity, and use the corresponding isGFX9Plus predicate to decide when to use them instead of the old *_vi versions. Differential Revision: https://reviews.llvm.org/D94975 Added: Modified: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp llvm/lib/Target/AMDGPU/SIDefines.h llvm/lib/Target/AMDGPU/SIRegisterInfo.td llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp Removed: diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp index 7f68174e506d..08b340c8fd66 100644 --- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp +++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp @@ -997,8 +997,8 @@ unsigned AMDGPUDisassembler::getTtmpClassId(const OpWidthTy Width) const { int AMDGPUDisassembler::getTTmpIdx(unsigned Val) const { using namespace AMDGPU::EncValues; - unsigned TTmpMin = isGFX9Plus() ? TTMP_GFX9_GFX10_MIN : TTMP_VI_MIN; - unsigned TTmpMax = isGFX9Plus() ? TTMP_GFX9_GFX10_MAX : TTMP_VI_MAX; + unsigned TTmpMin = isGFX9Plus() ? TTMP_GFX9PLUS_MIN : TTMP_VI_MIN; + unsigned TTmpMax = isGFX9Plus() ? TTMP_GFX9PLUS_MAX : TTMP_VI_MAX; return (TTmpMin <= Val && Val <= TTmpMax)? Val - TTmpMin : -1; } diff --git a/llvm/lib/Target/AMDGPU/SIDefines.h b/llvm/lib/Target/AMDGPU/SIDefines.h index b9a2bcf81903..f7555f0453bb 100644 --- a/llvm/lib/Target/AMDGPU/SIDefines.h +++ b/llvm/lib/Target/AMDGPU/SIDefines.h @@ -247,8 +247,8 @@ enum : unsigned { SGPR_MAX_GFX10 = 105, TTMP_VI_MIN = 112, TTMP_VI_MAX = 123, - TTMP_GFX9_GFX10_MIN = 108, - TTMP_GFX9_GFX10_MAX = 123, + TTMP_GFX9PLUS_MIN = 108, + TTMP_GFX9PLUS_MAX = 123, INLINE_INTEGER_C_MIN = 128, INLINE_INTEGER_C_POSITIVE_MAX = 192, // 64 INLINE_INTEGER_C_MAX = 208, diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td index 378fc5df21e5..92390f1f3297 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td @@ -246,9 +246,9 @@ def TMA : RegisterWithSubRegs<"tma", [TMA_LO, TMA_HI]> { } foreach Index = 0...15 in { - defm TTMP#Index#_vi : SIRegLoHi16<"ttmp"#Index, !add(112, Index)>; - defm TTMP#Index#_gfx9_gfx10 : SIRegLoHi16<"ttmp"#Index, !add(108, Index)>; - defm TTMP#Index : SIRegLoHi16<"ttmp"#Index, 0>; + defm TTMP#Index#_vi : SIRegLoHi16<"ttmp"#Index, !add(112, Index)>; + defm TTMP#Index#_gfx9plus : SIRegLoHi16<"ttmp"#Index, !add(108, Index)>; + defm TTMP#Index : SIRegLoHi16<"ttmp"#Index, 0>; } multiclass FLAT_SCR_LOHI_m ci_e, bits<16> vi_e> { @@ -419,8 +419,8 @@ class TmpRegTuples.ret>; foreach Index = {0, 2, 4, 6, 8, 10, 12, 14} in { - def TTMP#Index#_TTMP#!add(Index,1)#_vi : TmpRegTuples<"_vi", 2, Index>; - def TTMP#Index#_TTMP#!add(Index,1)#_gfx9_gfx10 : TmpRegTuples<"_gfx9_gfx10", 2, Index>; + def TTMP#Index#_TTMP#!add(Index,1)#_vi : TmpRegTuples<"_vi", 2, Index>; + def TTMP#Index#_TTMP#!add(Index,1)#_gfx9plus : TmpRegTuples<"_gfx9plus", 2, Index>; } foreach Index = {0, 4, 8, 12} in { @@ -429,7 +429,7 @@ foreach Index = {0, 4, 8, 12} in { _TTMP#!add(Index,3)#_vi : TmpRegTuples<"_vi", 4, Index>; def TTMP#Index#_TTMP#!add(Index,1)# _TTMP#!add(Index,2)# - _TTMP#!add(Index,3)#_gfx9_gfx10 : TmpRegTuples<"_gfx9_gfx10", 4, Index>; + _TTMP#!add(Index,3)#_gfx9plus : TmpRegTuples<"_gfx9plus", 4, Index>; } foreach Index = {0, 4, 8} in { @@ -446,7 +446,7 @@ foreach Index = {0, 4, 8} in { _TTMP#!add(Index,4)# _TTMP#!add(Index,5)# _TTMP#!add(Index,6)# - _TTMP#!add(Index,7)#_gfx9_gfx10 : TmpRegTuples<"_gfx9_gfx10", 8, Index>; + _TTMP#!add(Index,7)#_gfx9plus : TmpRegTuples<"_gfx9plus", 8, Index>; } def TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TTMP12_TTMP13_TTMP14_TTMP15_vi : @@ -456,12 +456,12 @@ def TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TT TTMP8_vi, TTMP9_vi, TTMP10_vi, TTMP11_vi, TTMP12_vi, TTMP13_vi, TTMP14_vi, TTMP15_vi]>; -def TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TTMP12_TTMP13_TTMP14_TTMP15_gfx9_gfx10 : +def
[llvm-branch-commits] [llvm] 0808c70 - [AMDGPU] Fix test case for D94010
Author: Jay Foad Date: 2021-01-19T16:46:47Z New Revision: 0808c7009a06773e78772c7b74d254fd3572f0ea URL: https://github.com/llvm/llvm-project/commit/0808c7009a06773e78772c7b74d254fd3572f0ea DIFF: https://github.com/llvm/llvm-project/commit/0808c7009a06773e78772c7b74d254fd3572f0ea.diff LOG: [AMDGPU] Fix test case for D94010 Added: Modified: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll Removed: diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll index 8df0215a6fe2..5c333f0ce97d 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | FileCheck -check-prefixes=GCN,SDAG %s -; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | FileCheck -check-prefixes=GCN,GISEL %s +; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | FileCheck -check-prefix=GCN %s +; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | FileCheck -check-prefix=GCN %s define float @v_fma(float %a, float %b, float %c) { ; GCN-LABEL: v_fma: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] de2f942 - [AMDGPU] Simplify test case for D94010
Author: Jay Foad Date: 2021-01-19T16:36:43Z New Revision: de2f9423995d52a5457752256815dc54d317c8d1 URL: https://github.com/llvm/llvm-project/commit/de2f9423995d52a5457752256815dc54d317c8d1 DIFF: https://github.com/llvm/llvm-project/commit/de2f9423995d52a5457752256815dc54d317c8d1.diff LOG: [AMDGPU] Simplify test case for D94010 Added: Modified: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll Removed: diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll index 03584312e2af..8df0215a6fe2 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll @@ -10,7 +10,6 @@ define float @v_fma(float %a, float %b, float %c) { ; GCN-NEXT:v_fmac_legacy_f32_e64 v2, v0, v1 ; GCN-NEXT:v_mov_b32_e32 v0, v2 ; GCN-NEXT:s_setpc_b64 s[30:31] -; %fma = call float @llvm.amdgcn.fma.legacy(float %a, float %b, float %c) ret float %fma } @@ -22,7 +21,6 @@ define float @v_fabs_fma(float %a, float %b, float %c) { ; GCN-NEXT:s_waitcnt_vscnt null, 0x0 ; GCN-NEXT:v_fma_legacy_f32 v0, |v0|, v1, v2 ; GCN-NEXT:s_setpc_b64 s[30:31] -; %fabs.a = call float @llvm.fabs.f32(float %a) %fma = call float @llvm.amdgcn.fma.legacy(float %fabs.a, float %b, float %c) ret float %fma @@ -35,7 +33,6 @@ define float @v_fneg_fabs_fma(float %a, float %b, float %c) { ; GCN-NEXT:s_waitcnt_vscnt null, 0x0 ; GCN-NEXT:v_fma_legacy_f32 v0, v0, -|v1|, v2 ; GCN-NEXT:s_setpc_b64 s[30:31] -; %fabs.b = call float @llvm.fabs.f32(float %b) %neg.fabs.b = fneg float %fabs.b %fma = call float @llvm.amdgcn.fma.legacy(float %a, float %neg.fabs.b, float %c) @@ -49,92 +46,21 @@ define float @v_fneg_fma(float %a, float %b, float %c) { ; GCN-NEXT:s_waitcnt_vscnt null, 0x0 ; GCN-NEXT:v_fma_legacy_f32 v0, v0, v1, -v2 ; GCN-NEXT:s_setpc_b64 s[30:31] -; %neg.c = fneg float %c %fma = call float @llvm.amdgcn.fma.legacy(float %a, float %b, float %neg.c) ret float %fma } -define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @main(<4 x i32> addrspace(6)* inreg noalias align 32 dereferenceable(18446744073709551615) %arg, <8 x i32> addrspace(6)* inreg noalias align 32 dereferenceable(18446744073709551615) %arg1, <4 x i32> addrspace(6)* inreg noalias align 32 dereferenceable(18446744073709551615) %arg2, <8 x i32> addrspace(6)* inreg noalias align 32 dereferenceable(18446744073709551615) %arg3, i32 inreg %arg4, i32 inreg %arg5, <2 x i32> %arg6, <2 x i32> %arg7, <2 x i32> %arg8, <3 x i32> %arg9, <2 x i32> %arg10, <2 x i32> %arg11, <2 x i32> %arg12, <3 x float> %arg13, float %arg14, float %arg15, float %arg16, float %arg17, i32 %arg18, i32 %arg19, float %arg20, i32 %arg21) #0 { -; SDAG-LABEL: main: -; SDAG: ; %bb.0: -; SDAG-NEXT:s_mov_b32 s16, exec_lo -; SDAG-NEXT:v_mov_b32_e32 v14, v2 -; SDAG-NEXT:s_mov_b32 s0, s5 -; SDAG-NEXT:s_wqm_b32 exec_lo, exec_lo -; SDAG-NEXT:s_mov_b32 s1, 0 -; SDAG-NEXT:s_mov_b32 m0, s7 -; SDAG-NEXT:s_clause 0x1 -; SDAG-NEXT:s_load_dwordx8 s[8:15], s[0:1], 0x400 -; SDAG-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x430 -; SDAG-NEXT:v_interp_p1_f32_e32 v2, v0, attr0.x -; SDAG-NEXT:v_interp_p1_f32_e32 v3, v0, attr0.y -; SDAG-NEXT:s_mov_b32 s4, s6 -; SDAG-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x -; SDAG-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y -; SDAG-NEXT:s_and_b32 exec_lo, exec_lo, s16 -; SDAG-NEXT:s_waitcnt lgkmcnt(0) -; SDAG-NEXT:image_sample v[0:3], v[2:3], s[8:15], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D -; SDAG-NEXT:s_waitcnt vmcnt(0) -; SDAG-NEXT:v_fma_legacy_f32 v0, v0, 2.0, -1.0 -; SDAG-NEXT:v_fma_legacy_f32 v1, v1, 2.0, -1.0 -; SDAG-NEXT:; return to shader part epilog -; -; GISEL-LABEL: main: -; GISEL: ; %bb.0: -; GISEL-NEXT:s_mov_b32 s16, exec_lo -; GISEL-NEXT:s_mov_b32 s4, s6 -; GISEL-NEXT:s_mov_b32 m0, s7 -; GISEL-NEXT:s_wqm_b32 exec_lo, exec_lo -; GISEL-NEXT:s_add_u32 s0, s5, 0x400 -; GISEL-NEXT:s_mov_b32 s1, 0 -; GISEL-NEXT:v_interp_p1_f32_e32 v3, v0, attr0.y -; GISEL-NEXT:s_load_dwordx8 s[8:15], s[0:1], 0x0 -; GISEL-NEXT:s_add_u32 s0, s5, 0x430 -; GISEL-NEXT:v_mov_b32_e32 v14, v2 -; GISEL-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x0 -; GISEL-NEXT:v_interp_p1_f32_e32 v2, v0, attr0.x -; GISEL-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y -; GISEL-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x -; GISEL-NEXT:s_and_b32 exec_lo, exec_lo, s16 -; GISEL-NEXT:s_waitcnt lgkmcnt(0) -; GISEL-NEXT:image_sample v[0:3], v[2:3], s[8:15], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D -; GISEL-NEXT:s_waitcnt vmcnt(0) -; GISEL-NEXT:v_fma_legacy_f32 v0, v0, 2.0, -1.0 -; GISEL-NEXT:v_fma_legacy_f32 v1,
[llvm-branch-commits] [llvm] 49dce85 - [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC.
Author: Jay Foad Date: 2021-01-19T10:39:56Z New Revision: 49dce85584e34ee7fb973da9ba617169fd0f103c URL: https://github.com/llvm/llvm-project/commit/49dce85584e34ee7fb973da9ba617169fd0f103c DIFF: https://github.com/llvm/llvm-project/commit/49dce85584e34ee7fb973da9ba617169fd0f103c.diff LOG: [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC. Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00 Added: Modified: llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h Removed: diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp index 574fba62f5f3..fcca32abdd5a 100644 --- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp @@ -958,10 +958,9 @@ void AMDGPUInstPrinter::printSDWADstUnused(const MCInst *MI, unsigned OpNo, } } -template void AMDGPUInstPrinter::printExpSrcN(const MCInst *MI, unsigned OpNo, - const MCSubtargetInfo , - raw_ostream ) { + const MCSubtargetInfo , raw_ostream , + unsigned N) { unsigned Opc = MI->getOpcode(); int EnIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::en); unsigned En = MI->getOperand(EnIdx).getImm(); @@ -969,12 +968,8 @@ void AMDGPUInstPrinter::printExpSrcN(const MCInst *MI, unsigned OpNo, int ComprIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::compr); // If compr is set, print as src0, src0, src1, src1 - if (MI->getOperand(ComprIdx).getImm()) { -if (N == 1 || N == 2) - --OpNo; -else if (N == 3) - OpNo -= 2; - } + if (MI->getOperand(ComprIdx).getImm()) +OpNo = OpNo - N + N / 2; if (En & (1 << N)) printRegOperand(MI->getOperand(OpNo).getReg(), O, MRI); @@ -985,25 +980,25 @@ void AMDGPUInstPrinter::printExpSrcN(const MCInst *MI, unsigned OpNo, void AMDGPUInstPrinter::printExpSrc0(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo , raw_ostream ) { - printExpSrcN<0>(MI, OpNo, STI, O); + printExpSrcN(MI, OpNo, STI, O, 0); } void AMDGPUInstPrinter::printExpSrc1(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo , raw_ostream ) { - printExpSrcN<1>(MI, OpNo, STI, O); + printExpSrcN(MI, OpNo, STI, O, 1); } void AMDGPUInstPrinter::printExpSrc2(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo , raw_ostream ) { - printExpSrcN<2>(MI, OpNo, STI, O); + printExpSrcN(MI, OpNo, STI, O, 2); } void AMDGPUInstPrinter::printExpSrc3(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo , raw_ostream ) { - printExpSrcN<3>(MI, OpNo, STI, O); + printExpSrcN(MI, OpNo, STI, O, 3); } void AMDGPUInstPrinter::printExpTgt(const MCInst *MI, unsigned OpNo, diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h index 64ccb9092ec4..8d13aa682211 100644 --- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h +++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h @@ -179,10 +179,8 @@ class AMDGPUInstPrinter : public MCInstPrinter { void printDefaultVccOperand(unsigned OpNo, const MCSubtargetInfo , raw_ostream ); - - template - void printExpSrcN(const MCInst *MI, unsigned OpNo, -const MCSubtargetInfo , raw_ostream ); + void printExpSrcN(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo , +raw_ostream , unsigned N); void printExpSrc0(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo , raw_ostream ); void printExpSrc1(const MCInst *MI, unsigned OpNo, ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 868da2e - [SelectionDAG] Remove an early-out from computeKnownBits for smin/smax
Author: Jay Foad Date: 2021-01-14T18:15:17Z New Revision: 868da2ea939baf8c71a6dcb878cf6094ede9486e URL: https://github.com/llvm/llvm-project/commit/868da2ea939baf8c71a6dcb878cf6094ede9486e DIFF: https://github.com/llvm/llvm-project/commit/868da2ea939baf8c71a6dcb878cf6094ede9486e.diff LOG: [SelectionDAG] Remove an early-out from computeKnownBits for smin/smax Even if we know nothing about LHS, it can still be useful to know that smax(LHS, RHS) >= RHS and smin(LHS, RHS) <= RHS. Differential Revision: https://reviews.llvm.org/D87145 Added: Modified: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/test/CodeGen/X86/known-bits-vector.ll Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 7084ab68524b5..82da553954d2f 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -3416,7 +3416,6 @@ KnownBits SelectionDAG::computeKnownBits(SDValue Op, const APInt , } Known = computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1); -if (Known.isUnknown()) break; // Early-out Known2 = computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1); if (IsMax) Known = KnownBits::smax(Known, Known2); diff --git a/llvm/test/CodeGen/X86/known-bits-vector.ll b/llvm/test/CodeGen/X86/known-bits-vector.ll index 3b6912a9d9461..05bf984101abc 100644 --- a/llvm/test/CodeGen/X86/known-bits-vector.ll +++ b/llvm/test/CodeGen/X86/known-bits-vector.ll @@ -435,11 +435,7 @@ define <4 x float> @knownbits_smax_smin_shuffle_uitofp(<4 x i32> %a0) { ; X32-NEXT:vpminsd {{\.LCPI.*}}, %xmm0, %xmm0 ; X32-NEXT:vpmaxsd {{\.LCPI.*}}, %xmm0, %xmm0 ; X32-NEXT:vpshufd {{.*#+}} xmm0 = xmm0[0,0,3,3] -; X32-NEXT:vpblendw {{.*#+}} xmm1 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7] -; X32-NEXT:vpsrld $16, %xmm0, %xmm0 -; X32-NEXT:vpblendw {{.*#+}} xmm0 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7] -; X32-NEXT:vsubps {{\.LCPI.*}}, %xmm0, %xmm0 -; X32-NEXT:vaddps %xmm0, %xmm1, %xmm0 +; X32-NEXT:vcvtdq2ps %xmm0, %xmm0 ; X32-NEXT:retl ; ; X64-LABEL: knownbits_smax_smin_shuffle_uitofp: @@ -447,11 +443,7 @@ define <4 x float> @knownbits_smax_smin_shuffle_uitofp(<4 x i32> %a0) { ; X64-NEXT:vpminsd {{.*}}(%rip), %xmm0, %xmm0 ; X64-NEXT:vpmaxsd {{.*}}(%rip), %xmm0, %xmm0 ; X64-NEXT:vpshufd {{.*#+}} xmm0 = xmm0[0,0,3,3] -; X64-NEXT:vpblendw {{.*#+}} xmm1 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7] -; X64-NEXT:vpsrld $16, %xmm0, %xmm0 -; X64-NEXT:vpblendw {{.*#+}} xmm0 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7] -; X64-NEXT:vsubps {{.*}}(%rip), %xmm0, %xmm0 -; X64-NEXT:vaddps %xmm0, %xmm1, %xmm0 +; X64-NEXT:vcvtdq2ps %xmm0, %xmm0 ; X64-NEXT:retq %1 = call <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32> %a0, <4 x i32> ) %2 = call <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32> %1, <4 x i32> ) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 90b310f - [Support] Simplify KnownBits::icmp helpers. NFC.
Author: Jay Foad Date: 2021-01-14T14:02:43Z New Revision: 90b310f6caf0b356075c70407c338b3c751eebb3 URL: https://github.com/llvm/llvm-project/commit/90b310f6caf0b356075c70407c338b3c751eebb3 DIFF: https://github.com/llvm/llvm-project/commit/90b310f6caf0b356075c70407c338b3c751eebb3.diff LOG: [Support] Simplify KnownBits::icmp helpers. NFC. Remove some special cases that aren't really any simpler than the general case. Differential Revision: https://reviews.llvm.org/D94595 Added: Modified: llvm/lib/Support/KnownBits.cpp Removed: diff --git a/llvm/lib/Support/KnownBits.cpp b/llvm/lib/Support/KnownBits.cpp index 0147d21d153a..0f36c6a9ef1d 100644 --- a/llvm/lib/Support/KnownBits.cpp +++ b/llvm/lib/Support/KnownBits.cpp @@ -271,9 +271,6 @@ KnownBits KnownBits::ashr(const KnownBits , const KnownBits ) { Optional KnownBits::eq(const KnownBits , const KnownBits ) { if (LHS.isConstant() && RHS.isConstant()) return Optional(LHS.getConstant() == RHS.getConstant()); - if (LHS.getMaxValue().ult(RHS.getMinValue()) || - LHS.getMinValue().ugt(RHS.getMaxValue())) -return Optional(false); if (LHS.One.intersects(RHS.Zero) || RHS.One.intersects(LHS.Zero)) return Optional(false); return None; @@ -286,8 +283,6 @@ Optional KnownBits::ne(const KnownBits , const KnownBits ) { } Optional KnownBits::ugt(const KnownBits , const KnownBits ) { - if (LHS.isConstant() && RHS.isConstant()) -return Optional(LHS.getConstant().ugt(RHS.getConstant())); // LHS >u RHS -> false if umax(LHS) <= umax(RHS) if (LHS.getMaxValue().ule(RHS.getMinValue())) return Optional(false); @@ -312,8 +307,6 @@ Optional KnownBits::ule(const KnownBits , const KnownBits ) { } Optional KnownBits::sgt(const KnownBits , const KnownBits ) { - if (LHS.isConstant() && RHS.isConstant()) -return Optional(LHS.getConstant().sgt(RHS.getConstant())); // LHS >s RHS -> false if smax(LHS) <= smax(RHS) if (LHS.getSignedMaxValue().sle(RHS.getSignedMinValue())) return Optional(false); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 517196e - [Analysis, CodeGen] Make use of KnownBits::makeConstant. NFC.
Author: Jay Foad Date: 2021-01-14T14:02:43Z New Revision: 517196e569129677be32d6ebcfa57bac552268a4 URL: https://github.com/llvm/llvm-project/commit/517196e569129677be32d6ebcfa57bac552268a4 DIFF: https://github.com/llvm/llvm-project/commit/517196e569129677be32d6ebcfa57bac552268a4.diff LOG: [Analysis,CodeGen] Make use of KnownBits::makeConstant. NFC. Differential Revision: https://reviews.llvm.org/D94588 Added: Modified: llvm/lib/Analysis/ValueTracking.cpp llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Removed: diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp index b138caa05610..61c992d0eedf 100644 --- a/llvm/lib/Analysis/ValueTracking.cpp +++ b/llvm/lib/Analysis/ValueTracking.cpp @@ -1337,8 +1337,8 @@ static void computeKnownBitsFromOperator(const Operator *I, AccConstIndices += IndexConst.sextOrTrunc(BitWidth); continue; } else { -ScalingFactor.Zero = ~TypeSizeInBytes; -ScalingFactor.One = TypeSizeInBytes; +ScalingFactor = +KnownBits::makeConstant(APInt(IndexBitWidth, TypeSizeInBytes)); } IndexBits = KnownBits::computeForMul(IndexBits, ScalingFactor); @@ -1353,9 +1353,7 @@ static void computeKnownBitsFromOperator(const Operator *I, /*Add=*/true, /*NSW=*/false, Known, IndexBits); } if (!Known.isUnknown() && !AccConstIndices.isNullValue()) { - KnownBits Index(BitWidth); - Index.Zero = ~AccConstIndices; - Index.One = AccConstIndices; + KnownBits Index = KnownBits::makeConstant(AccConstIndices); Known = KnownBits::computeForAddSub( /*Add=*/true, /*NSW=*/false, Known, Index); } @@ -1818,8 +1816,7 @@ void computeKnownBits(const Value *V, const APInt , const APInt *C; if (match(V, m_APInt(C))) { // We know all of the bits for a scalar constant or a splat vector constant! -Known.One = *C; -Known.Zero = ~Known.One; +Known = KnownBits::makeConstant(*C); return; } // Null and aggregate-zero are all-zeros. diff --git a/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp b/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp index 64c7fb486493..aac7a73e858f 100644 --- a/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp +++ b/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp @@ -217,8 +217,7 @@ void GISelKnownBits::computeKnownBitsImpl(Register R, KnownBits , auto CstVal = getConstantVRegVal(R, MRI); if (!CstVal) break; -Known.One = *CstVal; -Known.Zero = ~Known.One; +Known = KnownBits::makeConstant(*CstVal); break; } case TargetOpcode::G_FRAME_INDEX: { diff --git a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp index 0b830f462c90..32a4f60df097 100644 --- a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp @@ -458,8 +458,7 @@ void FunctionLoweringInfo::ComputePHILiveOutRegInfo(const PHINode *PN) { if (ConstantInt *CI = dyn_cast(V)) { APInt Val = CI->getValue().zextOrTrunc(BitWidth); DestLOI.NumSignBits = Val.getNumSignBits(); -DestLOI.Known.Zero = ~Val; -DestLOI.Known.One = Val; +DestLOI.Known = KnownBits::makeConstant(Val); } else { assert(ValueMap.count(V) && "V should have been placed in ValueMap when its" "CopyToReg node was created."); diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index e080408bbe42..7084ab68524b 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -3134,13 +3134,10 @@ KnownBits SelectionDAG::computeKnownBits(SDValue Op, const APInt , } } else if (BitWidth == CstTy->getPrimitiveSizeInBits()) { if (auto *CInt = dyn_cast(Cst)) { -const APInt = CInt->getValue(); -Known.One = Value; -Known.Zero = ~Value; +Known = KnownBits::makeConstant(CInt->getValue()); } else if (auto *CFP = dyn_cast(Cst)) { -APInt Value = CFP->getValueAPF().bitcastToAPInt(); -Known.One = Value; -Known.Zero = ~Value; +Known = +KnownBits::makeConstant(CFP->getValueAPF().bitcastToAPInt()); } } } diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index 173e45a4b18e..6ae0a39962b3 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -912,15 +912,14 @@ bool
[llvm-branch-commits] [llvm] a1cba5b - [SelectionDAG] Make use of KnownBits::commonBits. NFC.
Author: Jay Foad Date: 2021-01-14T14:02:43Z New Revision: a1cba5b7a1fb09d2d4082967e2466a5a89ed698a URL: https://github.com/llvm/llvm-project/commit/a1cba5b7a1fb09d2d4082967e2466a5a89ed698a DIFF: https://github.com/llvm/llvm-project/commit/a1cba5b7a1fb09d2d4082967e2466a5a89ed698a.diff LOG: [SelectionDAG] Make use of KnownBits::commonBits. NFC. Differential Revision: https://reviews.llvm.org/D94587 Added: Modified: llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp index 669bca966a7d..0b830f462c90 100644 --- a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp @@ -509,8 +509,7 @@ void FunctionLoweringInfo::ComputePHILiveOutRegInfo(const PHINode *PN) { return; } DestLOI.NumSignBits = std::min(DestLOI.NumSignBits, SrcLOI->NumSignBits); -DestLOI.Known.Zero &= SrcLOI->Known.Zero; -DestLOI.Known.One &= SrcLOI->Known.One; +DestLOI.Known = KnownBits::commonBits(DestLOI.Known, SrcLOI->Known); } } diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index 7ea0b09ef9c9..173e45a4b18e 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -1016,10 +1016,8 @@ bool TargetLowering::SimplifyDemandedBits( Depth + 1)) return true; -if (!!DemandedVecElts) { - Known.One &= KnownVec.One; - Known.Zero &= KnownVec.Zero; -} +if (!!DemandedVecElts) + Known = KnownBits::commonBits(Known, KnownVec); return false; } @@ -1044,14 +1042,10 @@ bool TargetLowering::SimplifyDemandedBits( Known.Zero.setAllBits(); Known.One.setAllBits(); -if (!!DemandedSubElts) { - Known.One &= KnownSub.One; - Known.Zero &= KnownSub.Zero; -} -if (!!DemandedSrcElts) { - Known.One &= KnownSrc.One; - Known.Zero &= KnownSrc.Zero; -} +if (!!DemandedSubElts) + Known = KnownBits::commonBits(Known, KnownSub); +if (!!DemandedSrcElts) + Known = KnownBits::commonBits(Known, KnownSrc); // Attempt to avoid multi-use src if we don't need anything from it. if (!DemandedBits.isAllOnesValue() || !DemandedSubElts.isAllOnesValue() || @@ -1108,10 +1102,8 @@ bool TargetLowering::SimplifyDemandedBits( Known2, TLO, Depth + 1)) return true; // Known bits are shared by every demanded subvector element. - if (!!DemandedSubElts) { -Known.One &= Known2.One; -Known.Zero &= Known2.Zero; - } + if (!!DemandedSubElts) +Known = KnownBits::commonBits(Known, Known2); } break; } @@ -1149,15 +1141,13 @@ bool TargetLowering::SimplifyDemandedBits( if (SimplifyDemandedBits(Op0, DemandedBits, DemandedLHS, Known2, TLO, Depth + 1)) return true; -Known.One &= Known2.One; -Known.Zero &= Known2.Zero; +Known = KnownBits::commonBits(Known, Known2); } if (!!DemandedRHS) { if (SimplifyDemandedBits(Op1, DemandedBits, DemandedRHS, Known2, TLO, Depth + 1)) return true; -Known.One &= Known2.One; -Known.Zero &= Known2.Zero; +Known = KnownBits::commonBits(Known, Known2); } // Attempt to avoid multi-use ops if we don't need anything from them. @@ -1384,8 +1374,7 @@ bool TargetLowering::SimplifyDemandedBits( return true; // Only known if known in both the LHS and RHS. -Known.One &= Known2.One; -Known.Zero &= Known2.Zero; +Known = KnownBits::commonBits(Known, Known2); break; case ISD::SELECT_CC: if (SimplifyDemandedBits(Op.getOperand(3), DemandedBits, Known, TLO, @@ -1402,8 +1391,7 @@ bool TargetLowering::SimplifyDemandedBits( return true; // Only known if known in both the LHS and RHS. -Known.One &= Known2.One; -Known.Zero &= Known2.Zero; +Known = KnownBits::commonBits(Known, Known2); break; case ISD::SETCC: { SDValue Op0 = Op.getOperand(0); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] f264f9a - [SlotIndexes] Fix and simplify basic block splitting
Author: Jay Foad Date: 2021-01-12T10:50:14Z New Revision: f264f9ad7df538357dfc8c5f318c5c8b0df3d99f URL: https://github.com/llvm/llvm-project/commit/f264f9ad7df538357dfc8c5f318c5c8b0df3d99f DIFF: https://github.com/llvm/llvm-project/commit/f264f9ad7df538357dfc8c5f318c5c8b0df3d99f.diff LOG: [SlotIndexes] Fix and simplify basic block splitting Remove the InsertionPoint argument from SlotIndexes::insertMBBInMaps because it was confusing: what does it mean to insert a new block between two instructions, in the middle of an existing block? Instead, support the case that MachineBasicBlock::splitAt really needs, where the new block contains some instructions that are already in the maps because they have been moved there from the tail of the previous block. In all other use cases the new block is empty. Based on work by Carl Ritson! Differential Revision: https://reviews.llvm.org/D94311 Added: Modified: llvm/include/llvm/CodeGen/LiveIntervals.h llvm/include/llvm/CodeGen/SlotIndexes.h llvm/lib/CodeGen/MachineBasicBlock.cpp llvm/unittests/MI/LiveIntervalTest.cpp Removed: diff --git a/llvm/include/llvm/CodeGen/LiveIntervals.h b/llvm/include/llvm/CodeGen/LiveIntervals.h index 1a6b59a8959e..fa08166791b0 100644 --- a/llvm/include/llvm/CodeGen/LiveIntervals.h +++ b/llvm/include/llvm/CodeGen/LiveIntervals.h @@ -256,9 +256,8 @@ class VirtRegMap; return Indexes->getMBBFromIndex(index); } -void insertMBBInMaps(MachineBasicBlock *MBB, - MachineInstr *InsertionPoint = nullptr) { - Indexes->insertMBBInMaps(MBB, InsertionPoint); +void insertMBBInMaps(MachineBasicBlock *MBB) { + Indexes->insertMBBInMaps(MBB); assert(unsigned(MBB->getNumber()) == RegMaskBlocks.size() && "Blocks must be added in order."); RegMaskBlocks.push_back(std::make_pair(RegMaskSlots.size(), 0)); diff --git a/llvm/include/llvm/CodeGen/SlotIndexes.h b/llvm/include/llvm/CodeGen/SlotIndexes.h index 19eab7ae5e35..b2133de93ea2 100644 --- a/llvm/include/llvm/CodeGen/SlotIndexes.h +++ b/llvm/include/llvm/CodeGen/SlotIndexes.h @@ -604,38 +604,27 @@ class raw_ostream; } /// Add the given MachineBasicBlock into the maps. -/// If \p InsertionPoint is specified then the block will be placed -/// before the given machine instr, otherwise it will be placed -/// before the next block in MachineFunction insertion order. -void insertMBBInMaps(MachineBasicBlock *mbb, - MachineInstr *InsertionPoint = nullptr) { - MachineFunction::iterator nextMBB = -std::next(MachineFunction::iterator(mbb)); - - IndexListEntry *startEntry = nullptr; - IndexListEntry *endEntry = nullptr; - IndexList::iterator newItr; - if (InsertionPoint) { -startEntry = createEntry(nullptr, 0); -endEntry = getInstructionIndex(*InsertionPoint).listEntry(); -newItr = indexList.insert(endEntry->getIterator(), startEntry); - } else if (nextMBB == mbb->getParent()->end()) { -startEntry = (); -endEntry = createEntry(nullptr, 0); -newItr = indexList.insertAfter(startEntry->getIterator(), endEntry); - } else { -startEntry = createEntry(nullptr, 0); -endEntry = getMBBStartIdx(&*nextMBB).listEntry(); -newItr = indexList.insert(endEntry->getIterator(), startEntry); - } +/// If it contains any instructions then they must already be in the maps. +/// This is used after a block has been split by moving some suffix of its +/// instructions into a newly created block. +void insertMBBInMaps(MachineBasicBlock *mbb) { + assert(mbb != >getParent()->front() && + "Can't insert a new block at the beginning of a function."); + auto prevMBB = std::prev(MachineFunction::iterator(mbb)); + + // Create a new entry to be used for the start of mbb and the end of + // prevMBB. + IndexListEntry *startEntry = createEntry(nullptr, 0); + IndexListEntry *endEntry = getMBBEndIdx(&*prevMBB).listEntry(); + IndexListEntry *insEntry = + mbb->empty() ? endEntry + : getInstructionIndex(mbb->front()).listEntry(); + IndexList::iterator newItr = + indexList.insert(insEntry->getIterator(), startEntry); SlotIndex startIdx(startEntry, SlotIndex::Slot_Block); SlotIndex endIdx(endEntry, SlotIndex::Slot_Block); - MachineFunction::iterator prevMBB(mbb); - assert(prevMBB != mbb->getParent()->end() && - "Can't insert a new block at the beginning of a function."); - --prevMBB; MBBRanges[prevMBB->getNumber()].second = startIdx; assert(unsigned(mbb->getNumber()) == MBBRanges.size() && diff --git a/llvm/lib/CodeGen/MachineBasicBlock.cpp b/llvm/lib/CodeGen/MachineBasicBlock.cpp index c7b404e075e1..fded4b15e67b 100644 ---
[llvm-branch-commits] [llvm] 6dcf920 - [AMDGPU] Fix a urem combine test to test what it was supposed to
Author: Jay Foad Date: 2021-01-11T13:32:34Z New Revision: 6dcf9207df11f5cdb0126e5c5632e93532642ed9 URL: https://github.com/llvm/llvm-project/commit/6dcf9207df11f5cdb0126e5c5632e93532642ed9 DIFF: https://github.com/llvm/llvm-project/commit/6dcf9207df11f5cdb0126e5c5632e93532642ed9.diff LOG: [AMDGPU] Fix a urem combine test to test what it was supposed to Added: Modified: llvm/test/CodeGen/AMDGPU/GlobalISel/combine-urem-pow-2.mir Removed: diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-urem-pow-2.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-urem-pow-2.mir index f92e32dab08f..da6c8480b25e 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-urem-pow-2.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-urem-pow-2.mir @@ -48,12 +48,14 @@ body: | ; GCN-LABEL: name: urem_s32_var_const2 ; GCN: liveins: $vgpr0 -; GCN: %const:_(s32) = G_CONSTANT i32 1 +; GCN: %var:_(s32) = COPY $vgpr0 +; GCN: %const:_(s32) = G_CONSTANT i32 2 ; GCN: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 -1 ; GCN: [[ADD:%[0-9]+]]:_(s32) = G_ADD %const, [[C]] -; GCN: $vgpr0 = COPY [[ADD]](s32) +; GCN: %rem:_(s32) = G_AND %var, [[ADD]] +; GCN: $vgpr0 = COPY %rem(s32) %var:_(s32) = COPY $vgpr0 -%const:_(s32) = G_CONSTANT i32 1 +%const:_(s32) = G_CONSTANT i32 2 %rem:_(s32) = G_UREM %var, %const $vgpr0 = COPY %rem ... ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 3914beb - [AMDGPU] Handle v_fmac_legacy_f32 in SIFoldOperands
Author: Jay Foad Date: 2021-01-05T11:55:33Z New Revision: 3914bebe91f6b557e61d6d74117762f9043593e0 URL: https://github.com/llvm/llvm-project/commit/3914bebe91f6b557e61d6d74117762f9043593e0 DIFF: https://github.com/llvm/llvm-project/commit/3914bebe91f6b557e61d6d74117762f9043593e0.diff LOG: [AMDGPU] Handle v_fmac_legacy_f32 in SIFoldOperands Convert it to v_fma_legacy_f32 if it is profitable to do so, just like other mac instructions that are converted to their mad equivalents. Differential Revision: https://reviews.llvm.org/D94010 Added: Modified: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index 6dc01c3d3c21..892dc1feb298 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -140,6 +140,8 @@ static unsigned macToMad(unsigned Opc) { return AMDGPU::V_FMA_F32; case AMDGPU::V_FMAC_F16_e64: return AMDGPU::V_FMA_F16_gfx9; + case AMDGPU::V_FMAC_LEGACY_F32_e64: +return AMDGPU::V_FMA_LEGACY_F32; } return AMDGPU::INSTRUCTION_LIST_END; } diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll index 8bfb81d86ace..e641d12444cc 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll @@ -70,16 +70,10 @@ define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, ; SDAG-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y ; SDAG-NEXT:s_and_b32 exec_lo, exec_lo, s16 ; SDAG-NEXT:s_waitcnt lgkmcnt(0) -; SDAG-NEXT:image_sample v[7:10], v[2:3], s[8:15], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D -; SDAG-NEXT:v_mov_b32_e32 v4, -1.0 -; SDAG-NEXT:v_mov_b32_e32 v5, -1.0 +; SDAG-NEXT:image_sample v[0:3], v[2:3], s[8:15], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D ; SDAG-NEXT:s_waitcnt vmcnt(0) -; SDAG-NEXT:v_fmac_legacy_f32_e64 v4, v7, 2.0 -; SDAG-NEXT:v_fmac_legacy_f32_e64 v5, v8, 2.0 -; SDAG-NEXT:v_mov_b32_e32 v2, v9 -; SDAG-NEXT:v_mov_b32_e32 v3, v10 -; SDAG-NEXT:v_mov_b32_e32 v0, v4 -; SDAG-NEXT:v_mov_b32_e32 v1, v5 +; SDAG-NEXT:v_fma_legacy_f32 v0, v0, 2.0, -1.0 +; SDAG-NEXT:v_fma_legacy_f32 v1, v1, 2.0, -1.0 ; SDAG-NEXT:; return to shader part epilog ; ; GISEL-LABEL: main: @@ -100,16 +94,10 @@ define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, ; GISEL-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x ; GISEL-NEXT:s_and_b32 exec_lo, exec_lo, s16 ; GISEL-NEXT:s_waitcnt lgkmcnt(0) -; GISEL-NEXT:image_sample v[7:10], v[2:3], s[8:15], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D -; GISEL-NEXT:v_mov_b32_e32 v4, -1.0 -; GISEL-NEXT:v_mov_b32_e32 v5, -1.0 +; GISEL-NEXT:image_sample v[0:3], v[2:3], s[8:15], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D ; GISEL-NEXT:s_waitcnt vmcnt(0) -; GISEL-NEXT:v_fmac_legacy_f32_e64 v4, v7, 2.0 -; GISEL-NEXT:v_fmac_legacy_f32_e64 v5, v8, 2.0 -; GISEL-NEXT:v_mov_b32_e32 v2, v9 -; GISEL-NEXT:v_mov_b32_e32 v3, v10 -; GISEL-NEXT:v_mov_b32_e32 v0, v4 -; GISEL-NEXT:v_mov_b32_e32 v1, v5 +; GISEL-NEXT:v_fma_legacy_f32 v0, v0, 2.0, -1.0 +; GISEL-NEXT:v_fma_legacy_f32 v1, v1, 2.0, -1.0 ; GISEL-NEXT:; return to shader part epilog %i = bitcast <2 x i32> %arg7 to <2 x float> %i22 = extractelement <2 x float> %i, i32 0 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 639a50e - [AMDGPU] Precommit test case for D94010
Author: Jay Foad Date: 2021-01-05T11:55:14Z New Revision: 639a50e2f138ed3e647b00809a2871a1b9ae9012 URL: https://github.com/llvm/llvm-project/commit/639a50e2f138ed3e647b00809a2871a1b9ae9012 DIFF: https://github.com/llvm/llvm-project/commit/639a50e2f138ed3e647b00809a2871a1b9ae9012.diff LOG: [AMDGPU] Precommit test case for D94010 Added: Modified: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll Removed: diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll index 27ba74c3f557..8bfb81d86ace 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 < %s | FileCheck -check-prefix=GCN %s -; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 < %s | FileCheck -check-prefix=GCN %s +; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | FileCheck -check-prefixes=GCN,SDAG %s +; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | FileCheck -check-prefixes=GCN,GISEL %s define float @v_fma(float %a, float %b, float %c) { ; GCN-LABEL: v_fma: @@ -51,5 +51,98 @@ define float @v_fneg_fma(float %a, float %b, float %c) { ret float %fma } +define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @main(<4 x i32> addrspace(6)* inreg noalias align 32 dereferenceable(18446744073709551615) %arg, <8 x i32> addrspace(6)* inreg noalias align 32 dereferenceable(18446744073709551615) %arg1, <4 x i32> addrspace(6)* inreg noalias align 32 dereferenceable(18446744073709551615) %arg2, <8 x i32> addrspace(6)* inreg noalias align 32 dereferenceable(18446744073709551615) %arg3, i32 inreg %arg4, i32 inreg %arg5, <2 x i32> %arg6, <2 x i32> %arg7, <2 x i32> %arg8, <3 x i32> %arg9, <2 x i32> %arg10, <2 x i32> %arg11, <2 x i32> %arg12, <3 x float> %arg13, float %arg14, float %arg15, float %arg16, float %arg17, i32 %arg18, i32 %arg19, float %arg20, i32 %arg21) #0 { +; SDAG-LABEL: main: +; SDAG: ; %bb.0: +; SDAG-NEXT:s_mov_b32 s16, exec_lo +; SDAG-NEXT:v_mov_b32_e32 v14, v2 +; SDAG-NEXT:s_mov_b32 s0, s5 +; SDAG-NEXT:s_wqm_b32 exec_lo, exec_lo +; SDAG-NEXT:s_mov_b32 s1, 0 +; SDAG-NEXT:s_mov_b32 m0, s7 +; SDAG-NEXT:s_clause 0x1 +; SDAG-NEXT:s_load_dwordx8 s[8:15], s[0:1], 0x400 +; SDAG-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x430 +; SDAG-NEXT:v_interp_p1_f32_e32 v2, v0, attr0.x +; SDAG-NEXT:v_interp_p1_f32_e32 v3, v0, attr0.y +; SDAG-NEXT:s_mov_b32 s4, s6 +; SDAG-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x +; SDAG-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y +; SDAG-NEXT:s_and_b32 exec_lo, exec_lo, s16 +; SDAG-NEXT:s_waitcnt lgkmcnt(0) +; SDAG-NEXT:image_sample v[7:10], v[2:3], s[8:15], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D +; SDAG-NEXT:v_mov_b32_e32 v4, -1.0 +; SDAG-NEXT:v_mov_b32_e32 v5, -1.0 +; SDAG-NEXT:s_waitcnt vmcnt(0) +; SDAG-NEXT:v_fmac_legacy_f32_e64 v4, v7, 2.0 +; SDAG-NEXT:v_fmac_legacy_f32_e64 v5, v8, 2.0 +; SDAG-NEXT:v_mov_b32_e32 v2, v9 +; SDAG-NEXT:v_mov_b32_e32 v3, v10 +; SDAG-NEXT:v_mov_b32_e32 v0, v4 +; SDAG-NEXT:v_mov_b32_e32 v1, v5 +; SDAG-NEXT:; return to shader part epilog +; +; GISEL-LABEL: main: +; GISEL: ; %bb.0: +; GISEL-NEXT:s_mov_b32 s16, exec_lo +; GISEL-NEXT:s_mov_b32 s4, s6 +; GISEL-NEXT:s_mov_b32 m0, s7 +; GISEL-NEXT:s_wqm_b32 exec_lo, exec_lo +; GISEL-NEXT:s_add_u32 s0, s5, 0x400 +; GISEL-NEXT:s_mov_b32 s1, 0 +; GISEL-NEXT:v_interp_p1_f32_e32 v3, v0, attr0.y +; GISEL-NEXT:s_load_dwordx8 s[8:15], s[0:1], 0x0 +; GISEL-NEXT:s_add_u32 s0, s5, 0x430 +; GISEL-NEXT:v_mov_b32_e32 v14, v2 +; GISEL-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x0 +; GISEL-NEXT:v_interp_p1_f32_e32 v2, v0, attr0.x +; GISEL-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y +; GISEL-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x +; GISEL-NEXT:s_and_b32 exec_lo, exec_lo, s16 +; GISEL-NEXT:s_waitcnt lgkmcnt(0) +; GISEL-NEXT:image_sample v[7:10], v[2:3], s[8:15], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D +; GISEL-NEXT:v_mov_b32_e32 v4, -1.0 +; GISEL-NEXT:v_mov_b32_e32 v5, -1.0 +; GISEL-NEXT:s_waitcnt vmcnt(0) +; GISEL-NEXT:v_fmac_legacy_f32_e64 v4, v7, 2.0 +; GISEL-NEXT:v_fmac_legacy_f32_e64 v5, v8, 2.0 +; GISEL-NEXT:v_mov_b32_e32 v2, v9 +; GISEL-NEXT:v_mov_b32_e32 v3, v10 +; GISEL-NEXT:v_mov_b32_e32 v0, v4 +; GISEL-NEXT:v_mov_b32_e32 v1, v5 +; GISEL-NEXT:; return to shader part epilog + %i = bitcast <2 x i32> %arg7 to <2 x float> + %i22 = extractelement <2 x float> %i, i32 0 + %i23 = extractelement <2 x float> %i, i32 1 + %i24 =
[llvm-branch-commits] [llvm] 4e6054a - [AMDGPU] Split out new helper function macToMad in SIFoldOperands. NFC.
Author: Jay Foad Date: 2021-01-05T11:54:48Z New Revision: 4e6054a86c0cb0697913007c99b59f3f65c9d04b URL: https://github.com/llvm/llvm-project/commit/4e6054a86c0cb0697913007c99b59f3f65c9d04b DIFF: https://github.com/llvm/llvm-project/commit/4e6054a86c0cb0697913007c99b59f3f65c9d04b.diff LOG: [AMDGPU] Split out new helper function macToMad in SIFoldOperands. NFC. Differential Revision: https://reviews.llvm.org/D94009 Added: Modified: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp Removed: diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp index d86527df5c3c..6dc01c3d3c21 100644 --- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp +++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp @@ -129,6 +129,21 @@ char SIFoldOperands::ID = 0; char ::SIFoldOperandsID = SIFoldOperands::ID; +// Map multiply-accumulate opcode to corresponding multiply-add opcode if any. +static unsigned macToMad(unsigned Opc) { + switch (Opc) { + case AMDGPU::V_MAC_F32_e64: +return AMDGPU::V_MAD_F32; + case AMDGPU::V_MAC_F16_e64: +return AMDGPU::V_MAD_F16; + case AMDGPU::V_FMAC_F32_e64: +return AMDGPU::V_FMA_F32; + case AMDGPU::V_FMAC_F16_e64: +return AMDGPU::V_FMA_F16_gfx9; + } + return AMDGPU::INSTRUCTION_LIST_END; +} + // Wrapper around isInlineConstant that understands special cases when // instruction types are replaced during operand folding. static bool isInlineConstantIfFolded(const SIInstrInfo *TII, @@ -139,31 +154,18 @@ static bool isInlineConstantIfFolded(const SIInstrInfo *TII, return true; unsigned Opc = UseMI.getOpcode(); - switch (Opc) { - case AMDGPU::V_MAC_F32_e64: - case AMDGPU::V_MAC_F16_e64: - case AMDGPU::V_FMAC_F32_e64: - case AMDGPU::V_FMAC_F16_e64: { + unsigned NewOpc = macToMad(Opc); + if (NewOpc != AMDGPU::INSTRUCTION_LIST_END) { // Special case for mac. Since this is replaced with mad when folded into // src2, we need to check the legality for the final instruction. int Src2Idx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2); if (static_cast(OpNo) == Src2Idx) { - bool IsFMA = Opc == AMDGPU::V_FMAC_F32_e64 || - Opc == AMDGPU::V_FMAC_F16_e64; - bool IsF32 = Opc == AMDGPU::V_MAC_F32_e64 || - Opc == AMDGPU::V_FMAC_F32_e64; - - unsigned Opc = IsFMA ? -(IsF32 ? AMDGPU::V_FMA_F32 : AMDGPU::V_FMA_F16_gfx9) : -(IsF32 ? AMDGPU::V_MAD_F32 : AMDGPU::V_MAD_F16); - const MCInstrDesc = TII->get(Opc); + const MCInstrDesc = TII->get(NewOpc); return TII->isInlineConstant(OpToFold, MadDesc.OpInfo[OpNo].OperandType); } -return false; - } - default: -return false; } + + return false; } // TODO: Add heuristic that the frame index might not fit in the addressing mode @@ -346,17 +348,8 @@ static bool tryAddToFoldList(SmallVectorImpl , if (!TII->isOperandLegal(*MI, OpNo, OpToFold)) { // Special case for v_mac_{f16, f32}_e64 if we are trying to fold into src2 unsigned Opc = MI->getOpcode(); -if ((Opc == AMDGPU::V_MAC_F32_e64 || Opc == AMDGPU::V_MAC_F16_e64 || - Opc == AMDGPU::V_FMAC_F32_e64 || Opc == AMDGPU::V_FMAC_F16_e64) && -(int)OpNo == AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2)) { - bool IsFMA = Opc == AMDGPU::V_FMAC_F32_e64 || - Opc == AMDGPU::V_FMAC_F16_e64; - bool IsF32 = Opc == AMDGPU::V_MAC_F32_e64 || - Opc == AMDGPU::V_FMAC_F32_e64; - unsigned NewOpc = IsFMA ? -(IsF32 ? AMDGPU::V_FMA_F32 : AMDGPU::V_FMA_F16_gfx9) : -(IsF32 ? AMDGPU::V_MAD_F32 : AMDGPU::V_MAD_F16); - +unsigned NewOpc = macToMad(Opc); +if (NewOpc != AMDGPU::INSTRUCTION_LIST_END) { // Check if changing this to a v_mad_{f16, f32} instruction will allow us // to fold the operand. MI->setDesc(TII->get(NewOpc)); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 07e92e6 - [AMDGPU] Make use of HasSMemRealTime predicate. NFC.
Author: Jay Foad Date: 2020-12-14T16:34:57Z New Revision: 07e92e6b6002d95d438d24eaabf4452ad6e4ef8f URL: https://github.com/llvm/llvm-project/commit/07e92e6b6002d95d438d24eaabf4452ad6e4ef8f DIFF: https://github.com/llvm/llvm-project/commit/07e92e6b6002d95d438d24eaabf4452ad6e4ef8f.diff LOG: [AMDGPU] Make use of HasSMemRealTime predicate. NFC. We have this subtarget feature so it makes sense to use it here. This is NFC because it's always defined by default on GFX8+. Differential Revision: https://reviews.llvm.org/D93202 Added: Modified: llvm/lib/Target/AMDGPU/AMDGPU.td llvm/lib/Target/AMDGPU/SMInstructions.td Removed: diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 77063f370976..42d134de9229 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -1264,6 +1264,9 @@ def HasGetWaveIdInst : Predicate<"Subtarget->hasGetWaveIdInst()">, def HasMAIInsts : Predicate<"Subtarget->hasMAIInsts()">, AssemblerPredicate<(all_of FeatureMAIInsts)>; +def HasSMemRealTime : Predicate<"Subtarget->hasSMemRealTime()">, + AssemblerPredicate<(all_of FeatureSMemRealTime)>; + def HasSMemTimeInst : Predicate<"Subtarget->hasSMemTimeInst()">, AssemblerPredicate<(all_of FeatureSMemTimeInst)>; diff --git a/llvm/lib/Target/AMDGPU/SMInstructions.td b/llvm/lib/Target/AMDGPU/SMInstructions.td index 70bf215c03f3..5b8896c21832 100644 --- a/llvm/lib/Target/AMDGPU/SMInstructions.td +++ b/llvm/lib/Target/AMDGPU/SMInstructions.td @@ -332,7 +332,6 @@ let OtherPredicates = [HasScalarStores] in { def S_DCACHE_WB : SM_Inval_Pseudo <"s_dcache_wb", int_amdgcn_s_dcache_wb>; def S_DCACHE_WB_VOL : SM_Inval_Pseudo <"s_dcache_wb_vol", int_amdgcn_s_dcache_wb_vol>; } // End OtherPredicates = [HasScalarStores] -def S_MEMREALTIME : SM_Time_Pseudo <"s_memrealtime", int_amdgcn_s_memrealtime>; defm S_ATC_PROBE: SM_Pseudo_Probe <"s_atc_probe", SReg_64>; let is_buffer = 1 in { @@ -340,6 +339,9 @@ defm S_ATC_PROBE_BUFFER : SM_Pseudo_Probe <"s_atc_probe_buffer", SReg_128>; } } // SubtargetPredicate = isGFX8Plus +let SubtargetPredicate = HasSMemRealTime in +def S_MEMREALTIME : SM_Time_Pseudo <"s_memrealtime", int_amdgcn_s_memrealtime>; + let SubtargetPredicate = isGFX10Plus in def S_GL1_INV : SM_Inval_Pseudo<"s_gl1_inv">; let SubtargetPredicate = HasGetWaveIdInst in ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 4f25e53 - [AMDGPU] Make use of emitRemovedIntrinsicError. NFC.
Author: Jay Foad Date: 2020-12-11T14:02:14Z New Revision: 4f25e5398211c603e765ab6c30ab35ad286d505f URL: https://github.com/llvm/llvm-project/commit/4f25e5398211c603e765ab6c30ab35ad286d505f DIFF: https://github.com/llvm/llvm-project/commit/4f25e5398211c603e765ab6c30ab35ad286d505f.diff LOG: [AMDGPU] Make use of emitRemovedIntrinsicError. NFC. Change-Id: I482bbf528255f2eacd3878ddfe7edb9a8f63d5c2 Added: Modified: llvm/lib/Target/AMDGPU/SIISelLowering.cpp Removed: diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 1accee5ccd2a..5fb1924bdd9f 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -6588,11 +6588,7 @@ SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op, if (Subtarget->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS) return SDValue(); -DiagnosticInfoUnsupported BadIntrin( - MF.getFunction(), "intrinsic not supported on subtarget", - DL.getDebugLoc()); - DAG.getContext()->diagnose(BadIntrin); - return DAG.getUNDEF(VT); +return emitRemovedIntrinsicError(DAG, DL, VT); } case Intrinsic::amdgcn_ldexp: return DAG.getNode(AMDGPUISD::LDEXP, DL, VT, ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 03663e4 - [AMDGPU] Add occupancy level tests for GFX10.3. NFC.
Author: Jay Foad Date: 2020-12-08T14:15:01Z New Revision: 03663e4130d700c6c8ea28b357fcac4d31b617f7 URL: https://github.com/llvm/llvm-project/commit/03663e4130d700c6c8ea28b357fcac4d31b617f7 DIFF: https://github.com/llvm/llvm-project/commit/03663e4130d700c6c8ea28b357fcac4d31b617f7.diff LOG: [AMDGPU] Add occupancy level tests for GFX10.3. NFC. getMaxWavesPerEU and getVGPRAllocGranule both changed in GFX10.3 and they both affect the occupancy calculation. Differential Revision: https://reviews.llvm.org/D92839 Added: Modified: llvm/test/CodeGen/AMDGPU/occupancy-levels.ll Removed: diff --git a/llvm/test/CodeGen/AMDGPU/occupancy-levels.ll b/llvm/test/CodeGen/AMDGPU/occupancy-levels.ll index db70c3d9387d..25e0376dd7ee 100644 --- a/llvm/test/CodeGen/AMDGPU/occupancy-levels.ll +++ b/llvm/test/CodeGen/AMDGPU/occupancy-levels.ll @@ -1,18 +1,21 @@ ; RUN: llc -march=amdgcn -mcpu=gfx900 < %s | FileCheck --check-prefixes=GCN,GFX9 %s -; RUN: llc -march=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefixes=GCN,GFX1010,GFX1010W32 %s -; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize64 < %s | FileCheck --check-prefixes=GCN,GFX1010,GFX1010W64 %s +; RUN: llc -march=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefixes=GCN,GFX10,GFX10W32,GFX1010,GFX1010W32 %s +; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize64 < %s | FileCheck --check-prefixes=GCN,GFX10,GFX10W64,GFX1010,GFX1010W64 %s +; RUN: llc -march=amdgcn -mcpu=gfx1030 < %s | FileCheck --check-prefixes=GCN,GFX10,GFX10W32,GFX1030,GFX1030W32 %s +; RUN: llc -march=amdgcn -mcpu=gfx1030 -mattr=+wavefrontsize64 < %s | FileCheck --check-prefixes=GCN,GFX10,GFX10W64,GFX1030,GFX1030W64 %s ; GCN-LABEL: {{^}}max_occupancy: ; GFX9: ; Occupancy: 10 ; GFX1010:; Occupancy: 20 +; GFX1030:; Occupancy: 16 define amdgpu_kernel void @max_occupancy() { ret void } ; GCN-LABEL: {{^}}limited_occupancy_3: ; GFX9: ; Occupancy: 3 -; GFX1010W64: ; Occupancy: 3 -; GFX1010W32: ; Occupancy: 4 +; GFX10W64: ; Occupancy: 3 +; GFX10W32: ; Occupancy: 4 define amdgpu_kernel void @limited_occupancy_3() #0 { ret void } @@ -20,6 +23,7 @@ define amdgpu_kernel void @limited_occupancy_3() #0 { ; GCN-LABEL: {{^}}limited_occupancy_18: ; GFX9: ; Occupancy: 10 ; GFX1010:; Occupancy: 18 +; GFX1030:; Occupancy: 16 define amdgpu_kernel void @limited_occupancy_18() #1 { ret void } @@ -27,6 +31,7 @@ define amdgpu_kernel void @limited_occupancy_18() #1 { ; GCN-LABEL: {{^}}limited_occupancy_19: ; GFX9: ; Occupancy: 10 ; GFX1010:; Occupancy: 18 +; GFX1030:; Occupancy: 16 define amdgpu_kernel void @limited_occupancy_19() #2 { ret void } @@ -34,6 +39,7 @@ define amdgpu_kernel void @limited_occupancy_19() #2 { ; GCN-LABEL: {{^}}used_24_vgprs: ; GFX9: ; Occupancy: 10 ; GFX1010:; Occupancy: 20 +; GFX1030:; Occupancy: 16 define amdgpu_kernel void @used_24_vgprs() { call void asm sideeffect "", "~{v23}" () ret void @@ -43,6 +49,7 @@ define amdgpu_kernel void @used_24_vgprs() { ; GFX9: ; Occupancy: 9 ; GFX1010W64: ; Occupancy: 18 ; GFX1010W32: ; Occupancy: 20 +; GFX1030:; Occupancy: 16 define amdgpu_kernel void @used_28_vgprs() { call void asm sideeffect "", "~{v27}" () ret void @@ -50,8 +57,9 @@ define amdgpu_kernel void @used_28_vgprs() { ; GCN-LABEL: {{^}}used_32_vgprs: ; GFX9: ; Occupancy: 8 -; GFX1010W64: ; Occupancy: 16 +; GFX10W64: ; Occupancy: 16 ; GFX1010W32: ; Occupancy: 20 +; GFX1030W32: ; Occupancy: 16 define amdgpu_kernel void @used_32_vgprs() { call void asm sideeffect "", "~{v31}" () ret void @@ -61,6 +69,8 @@ define amdgpu_kernel void @used_32_vgprs() { ; GFX9: ; Occupancy: 7 ; GFX1010W64: ; Occupancy: 14 ; GFX1010W32: ; Occupancy: 20 +; GFX1030W64: ; Occupancy: 12 +; GFX1030W32: ; Occupancy: 16 define amdgpu_kernel void @used_36_vgprs() { call void asm sideeffect "", "~{v35}" () ret void @@ -68,8 +78,9 @@ define amdgpu_kernel void @used_36_vgprs() { ; GCN-LABEL: {{^}}used_40_vgprs: ; GFX9: ; Occupancy: 6 -; GFX1010W64: ; Occupancy: 12 +; GFX10W64: ; Occupancy: 12 ; GFX1010W32: ; Occupancy: 20 +; GFX1030W32: ; Occupancy: 16 define amdgpu_kernel void @used_40_vgprs() { call void asm sideeffect "", "~{v39}" () ret void @@ -79,6 +90,8 @@ define amdgpu_kernel void @used_40_vgprs() { ; GFX9: ; Occupancy: 5 ; GFX1010W64: ; Occupancy: 11 ; GFX1010W32: ; Occupancy: 20 +; GFX1030W64: ; Occupancy: 10 +; GFX1030W32: ; Occupancy: 16 define amdgpu_kernel void @used_44_vgprs() { call void asm sideeffect "", "~{v43}" () ret void @@ -86,8 +99,9 @@ define amdgpu_kernel void @used_44_vgprs() { ; GCN-LABEL: {{^}}used_48_vgprs: ; GFX9: ; Occupancy: 5 -; GFX1010W64: ; Occupancy: 10 +; GFX10W64: ; Occupancy: 10 ; GFX1010W32: ; Occupancy: 20 +; GFX1030W32: ; Occupancy: 16 define
[llvm-branch-commits] [llvm] 0f32e81 - [TableGen] Remove unused class RecordValResolver. NFC.
Author: Jay Foad Date: 2020-12-03T13:36:58Z New Revision: 0f32e81407d33ab8886081db5d8ed2c7407a15e8 URL: https://github.com/llvm/llvm-project/commit/0f32e81407d33ab8886081db5d8ed2c7407a15e8 DIFF: https://github.com/llvm/llvm-project/commit/0f32e81407d33ab8886081db5d8ed2c7407a15e8.diff LOG: [TableGen] Remove unused class RecordValResolver. NFC. Differential Revision: https://reviews.llvm.org/D92477 Added: Modified: llvm/include/llvm/TableGen/Record.h Removed: diff --git a/llvm/include/llvm/TableGen/Record.h b/llvm/include/llvm/TableGen/Record.h index a26367a6fcc6..20b786dc6e42 100644 --- a/llvm/include/llvm/TableGen/Record.h +++ b/llvm/include/llvm/TableGen/Record.h @@ -2032,25 +2032,6 @@ class RecordResolver final : public Resolver { bool keepUnsetBits() const override { return true; } }; -/// Resolve all references to a specific RecordVal. -// -// TODO: This is used for resolving references to template arguments, in a -// rather inefficient way. Change those uses to resolve all template -// arguments simultaneously and get rid of this class. -class RecordValResolver final : public Resolver { - const RecordVal *RV; - -public: - explicit RecordValResolver(Record , const RecordVal *RV) - : Resolver(), RV(RV) {} - - Init *resolve(Init *VarName) override { -if (VarName == RV->getNameInit()) - return RV->getValue(); -return nullptr; - } -}; - /// Delegate resolving to a sub-resolver, but shadow some variable names. class ShadowResolver final : public Resolver { Resolver ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 839c963 - [AMDGPU] Simplify some generation checks. NFC.
Author: Jay Foad Date: 2020-12-01T10:15:32Z New Revision: 839c9635edce4f6ed348b154a4e755ff8263d366 URL: https://github.com/llvm/llvm-project/commit/839c9635edce4f6ed348b154a4e755ff8263d366 DIFF: https://github.com/llvm/llvm-project/commit/839c9635edce4f6ed348b154a4e755ff8263d366.diff LOG: [AMDGPU] Simplify some generation checks. NFC. Added: Modified: llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp Removed: diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp index b8b747ea8f99..d1e5fe59e910 100644 --- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp +++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp @@ -4866,7 +4866,7 @@ bool AMDGPUAsmParser::subtargetHasRegister(const MCRegisterInfo , case AMDGPU::SRC_PRIVATE_BASE: case AMDGPU::SRC_PRIVATE_LIMIT: case AMDGPU::SRC_POPS_EXITING_WAVE_ID: -return !isCI() && !isSI() && !isVI(); +return isGFX9Plus(); case AMDGPU::TBA: case AMDGPU::TBA_LO: case AMDGPU::TBA_HI: @@ -4877,7 +4877,7 @@ bool AMDGPUAsmParser::subtargetHasRegister(const MCRegisterInfo , case AMDGPU::XNACK_MASK: case AMDGPU::XNACK_MASK_LO: case AMDGPU::XNACK_MASK_HI: -return !isCI() && !isSI() && !isGFX10Plus() && hasXNACK(); +return (isVI() || isGFX9()) && hasXNACK(); case AMDGPU::SGPR_NULL: return isGFX10Plus(); default: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] e20efa3 - [LegacyPM] Simplify PMTopLevelManager::collectLastUses. NFC.
Author: Jay Foad Date: 2020-11-30T10:36:19Z New Revision: e20efa3dd5c75a79a47d40335aee0f63261f9c5b URL: https://github.com/llvm/llvm-project/commit/e20efa3dd5c75a79a47d40335aee0f63261f9c5b DIFF: https://github.com/llvm/llvm-project/commit/e20efa3dd5c75a79a47d40335aee0f63261f9c5b.diff LOG: [LegacyPM] Simplify PMTopLevelManager::collectLastUses. NFC. Added: Modified: llvm/lib/IR/LegacyPassManager.cpp Removed: diff --git a/llvm/lib/IR/LegacyPassManager.cpp b/llvm/lib/IR/LegacyPassManager.cpp index 8fd35ef975e2..544c56a789a3 100644 --- a/llvm/lib/IR/LegacyPassManager.cpp +++ b/llvm/lib/IR/LegacyPassManager.cpp @@ -685,16 +685,12 @@ PMTopLevelManager::setLastUser(ArrayRef AnalysisPasses, Pass *P) { /// Collect passes whose last user is P void PMTopLevelManager::collectLastUses(SmallVectorImpl , Pass *P) { - DenseMap >::iterator DMI = -InversedLastUser.find(P); + auto DMI = InversedLastUser.find(P); if (DMI == InversedLastUser.end()) return; - SmallPtrSet = DMI->second; - for (Pass *LUP : LU) { -LastUses.push_back(LUP); - } - + auto = DMI->second; + LastUses.append(LU.begin(), LU.end()); } AnalysisUsage *PMTopLevelManager::findAnalysisUsage(Pass *P) { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 68ed644 - [LegacyPM] Avoid a redundant map lookup in setLastUser. NFC.
Author: Jay Foad Date: 2020-11-27T10:42:01Z New Revision: 68ed6447855632b954b55f63807481eaa44705df URL: https://github.com/llvm/llvm-project/commit/68ed6447855632b954b55f63807481eaa44705df DIFF: https://github.com/llvm/llvm-project/commit/68ed6447855632b954b55f63807481eaa44705df.diff LOG: [LegacyPM] Avoid a redundant map lookup in setLastUser. NFC. As a bonus this makes it (IMO) obvious that the iterator is not invalidated, so remove the comment explaining that. Added: Modified: llvm/lib/IR/LegacyPassManager.cpp Removed: diff --git a/llvm/lib/IR/LegacyPassManager.cpp b/llvm/lib/IR/LegacyPassManager.cpp index bb2661d36b56..8fd35ef975e2 100644 --- a/llvm/lib/IR/LegacyPassManager.cpp +++ b/llvm/lib/IR/LegacyPassManager.cpp @@ -675,11 +675,9 @@ PMTopLevelManager::setLastUser(ArrayRef AnalysisPasses, Pass *P) { // If AP is the last user of other passes then make P last user of // such passes. -for (auto LU : LastUser) { +for (auto : LastUser) { if (LU.second == AP) -// DenseMap iterator is not invalidated here because -// this is just updating existing entries. -LastUser[LU.first] = P; +LU.second = P; } } } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 0d9166f - [LegacyPM] Remove unused undocumented parameter. NFC.
Author: Jay Foad Date: 2020-11-27T10:41:38Z New Revision: 0d9166ff79578c7e98cef8c554e1342ece8efee6 URL: https://github.com/llvm/llvm-project/commit/0d9166ff79578c7e98cef8c554e1342ece8efee6 DIFF: https://github.com/llvm/llvm-project/commit/0d9166ff79578c7e98cef8c554e1342ece8efee6.diff LOG: [LegacyPM] Remove unused undocumented parameter. NFC. The Direction parameter to AnalysisResolver::getAnalysisIfAvailable has never been documented or used for anything. Added: Modified: llvm/include/llvm/PassAnalysisSupport.h llvm/lib/IR/LegacyPassManager.cpp llvm/lib/IR/Pass.cpp Removed: diff --git a/llvm/include/llvm/PassAnalysisSupport.h b/llvm/include/llvm/PassAnalysisSupport.h index 84df171d38d8..4e28466c4968 100644 --- a/llvm/include/llvm/PassAnalysisSupport.h +++ b/llvm/include/llvm/PassAnalysisSupport.h @@ -183,7 +183,7 @@ class AnalysisResolver { } /// Return analysis result or null if it doesn't exist. - Pass *getAnalysisIfAvailable(AnalysisID ID, bool Direction) const; + Pass *getAnalysisIfAvailable(AnalysisID ID) const; private: /// This keeps track of which passes implements the interfaces that are @@ -207,7 +207,7 @@ AnalysisType *Pass::getAnalysisIfAvailable() const { const void *PI = ::ID; - Pass *ResultPass = Resolver->getAnalysisIfAvailable(PI, true); + Pass *ResultPass = Resolver->getAnalysisIfAvailable(PI); if (!ResultPass) return nullptr; // Because the AnalysisType may not be a subclass of pass (for diff --git a/llvm/lib/IR/LegacyPassManager.cpp b/llvm/lib/IR/LegacyPassManager.cpp index 7f94d42d6ecd..bb2661d36b56 100644 --- a/llvm/lib/IR/LegacyPassManager.cpp +++ b/llvm/lib/IR/LegacyPassManager.cpp @@ -1392,8 +1392,8 @@ PMDataManager::~PMDataManager() { //===--===// // NOTE: Is this the right place to define this method ? // getAnalysisIfAvailable - Return analysis result or null if it doesn't exist. -Pass *AnalysisResolver::getAnalysisIfAvailable(AnalysisID ID, bool dir) const { - return PM.findAnalysisPass(ID, dir); +Pass *AnalysisResolver::getAnalysisIfAvailable(AnalysisID ID) const { + return PM.findAnalysisPass(ID, true); } std::tuple diff --git a/llvm/lib/IR/Pass.cpp b/llvm/lib/IR/Pass.cpp index a815da2bdc51..0750501a92c4 100644 --- a/llvm/lib/IR/Pass.cpp +++ b/llvm/lib/IR/Pass.cpp @@ -62,7 +62,7 @@ bool ModulePass::skipModule(Module ) const { } bool Pass::mustPreserveAnalysisID(char ) const { - return Resolver->getAnalysisIfAvailable(, true) != nullptr; + return Resolver->getAnalysisIfAvailable() != nullptr; } // dumpPassStructure - Implement the -debug-pass=Structure option ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 4f87d30 - [AMDGPU] Introduce and use isGFX10Plus. NFC.
Author: Jay Foad Date: 2020-11-26T09:02:36Z New Revision: 4f87d30a06dd08cec45cb595e9dbed6345c9a7c5 URL: https://github.com/llvm/llvm-project/commit/4f87d30a06dd08cec45cb595e9dbed6345c9a7c5 DIFF: https://github.com/llvm/llvm-project/commit/4f87d30a06dd08cec45cb595e9dbed6345c9a7c5.diff LOG: [AMDGPU] Introduce and use isGFX10Plus. NFC. It's more future-proof to use isGFX10Plus from the start, on the assumption that future architectures will be based on current architectures. Also make use of the existing isGFX9Plus in a few places. Differential Revision: https://reviews.llvm.org/D92092 Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp llvm/lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h Removed: diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index 8148d0487802..137f6896c87b 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -338,7 +338,7 @@ bool AMDGPUAsmPrinter::doFinalization(Module ) { // causing stale data in caches. Arguably this should be done by the linker, // which is why this isn't done for Mesa. const MCSubtargetInfo = *getGlobalSTI(); - if (AMDGPU::isGFX10(STI) && + if (AMDGPU::isGFX10Plus(STI) && (STI.getTargetTriple().getOS() == Triple::AMDHSA || STI.getTargetTriple().getOS() == Triple::AMDPAL)) { OutStreamer->SwitchSection(getObjFileLowering().getTextSection()); diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp index 37a79ce4fa37..20b7c7849397 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp @@ -1485,7 +1485,7 @@ bool AMDGPUInstructionSelector::selectImageIntrinsic( const AMDGPU::MIMGMIPMappingInfo *MIPMappingInfo = AMDGPU::getMIMGMIPMappingInfo(Intr->BaseOpcode); unsigned IntrOpcode = Intr->BaseOpcode; - const bool IsGFX10 = STI.getGeneration() >= AMDGPUSubtarget::GFX10; + const bool IsGFX10Plus = AMDGPU::isGFX10Plus(STI); const unsigned ArgOffset = MI.getNumExplicitDefs() + 1; @@ -1603,12 +1603,12 @@ bool AMDGPUInstructionSelector::selectImageIntrinsic( GLC = true; // TODO no-return optimization if (!parseCachePolicy( MI.getOperand(ArgOffset + Intr->CachePolicyIndex).getImm(), nullptr, -, IsGFX10 ? : nullptr)) +, IsGFX10Plus ? : nullptr)) return false; } else { if (!parseCachePolicy( MI.getOperand(ArgOffset + Intr->CachePolicyIndex).getImm(), , -, IsGFX10 ? : nullptr)) +, IsGFX10Plus ? : nullptr)) return false; } @@ -1641,7 +1641,7 @@ bool AMDGPUInstructionSelector::selectImageIntrinsic( ++NumVDataDwords; int Opcode = -1; - if (IsGFX10) { + if (IsGFX10Plus) { Opcode = AMDGPU::getMIMGOpcode(IntrOpcode, UseNSA ? AMDGPU::MIMGEncGfx10NSA : AMDGPU::MIMGEncGfx10Default, @@ -1693,22 +1693,22 @@ bool AMDGPUInstructionSelector::selectImageIntrinsic( MIB.addImm(DMask); // dmask - if (IsGFX10) + if (IsGFX10Plus) MIB.addImm(DimInfo->Encoding); MIB.addImm(Unorm); - if (IsGFX10) + if (IsGFX10Plus) MIB.addImm(DLC); MIB.addImm(GLC); MIB.addImm(SLC); MIB.addImm(IsA16 && // a16 or r128 STI.hasFeature(AMDGPU::FeatureR128A16) ? -1 : 0); - if (IsGFX10) + if (IsGFX10Plus) MIB.addImm(IsA16 ? -1 : 0); MIB.addImm(TFE); // tfe MIB.addImm(LWE); // lwe - if (!IsGFX10) + if (!IsGFX10Plus) MIB.addImm(DimInfo->DA ? -1 : 0); if (BaseOpcode->HasD16) MIB.addImm(IsD16 ? -1 : 0); diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp index 4f05ba5ab576..b8b747ea8f99 100644 --- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp +++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp @@ -1232,6 +1232,8 @@ class AMDGPUAsmParser : public MCTargetAsmParser { return AMDGPU::isGFX10(getSTI()); } + bool isGFX10Plus() const { return AMDGPU::isGFX10Plus(getSTI()); } + bool isGFX10_BEncoding() const { return AMDGPU::isGFX10_BEncoding(getSTI()); } @@ -1248,9 +1250,7 @@ class AMDGPUAsmParser : public MCTargetAsmParser { return !isVI() && !isGFX9(); } - bool
[llvm-branch-commits] [llvm] 000400c - Fix speling in comments. NFC.
Author: Jay Foad Date: 2020-11-23T14:43:24Z New Revision: 000400ca0aeb32e347eefd110a4ed58ebc23d333 URL: https://github.com/llvm/llvm-project/commit/000400ca0aeb32e347eefd110a4ed58ebc23d333 DIFF: https://github.com/llvm/llvm-project/commit/000400ca0aeb32e347eefd110a4ed58ebc23d333.diff LOG: Fix speling in comments. NFC. Added: Modified: llvm/include/llvm/ADT/DenseMap.h llvm/lib/Analysis/GlobalsModRef.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.cpp llvm/lib/Target/AMDGPU/SIDefines.h Removed: diff --git a/llvm/include/llvm/ADT/DenseMap.h b/llvm/include/llvm/ADT/DenseMap.h index 34d397cc9793..42e4fc84175c 100644 --- a/llvm/include/llvm/ADT/DenseMap.h +++ b/llvm/include/llvm/ADT/DenseMap.h @@ -954,7 +954,7 @@ class SmallDenseMap std::swap(*LHSB, *RHSB); continue; } -// Swap separately and handle any assymetry. +// Swap separately and handle any asymmetry. std::swap(LHSB->getFirst(), RHSB->getFirst()); if (hasLHSValue) { ::new (>getSecond()) ValueT(std::move(LHSB->getSecond())); diff --git a/llvm/lib/Analysis/GlobalsModRef.cpp b/llvm/lib/Analysis/GlobalsModRef.cpp index 37a345885b33..1a42c69b8b66 100644 --- a/llvm/lib/Analysis/GlobalsModRef.cpp +++ b/llvm/lib/Analysis/GlobalsModRef.cpp @@ -44,7 +44,7 @@ STATISTIC(NumIndirectGlobalVars, "Number of indirect global objects"); // An option to enable unsafe alias results from the GlobalsModRef analysis. // When enabled, GlobalsModRef will provide no-alias results which in extremely // rare cases may not be conservatively correct. In particular, in the face of -// transforms which cause assymetry between how effective getUnderlyingObject +// transforms which cause asymmetry between how effective getUnderlyingObject // is for two pointers, it may produce incorrect results. // // These unsafe results have been returned by GMR for many years without diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp index 351f532ad4a3..cbbb0755b124 100644 --- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp @@ -1649,7 +1649,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction , // If the prologue didn't contain any SEH opcodes and didn't set the // MF.hasWinCFI() flag, assume the epilogue won't either, and skip the // EpilogStart - to avoid generating CFI for functions that don't need it. -// (And as we didn't generate any prologue at all, it would be assymetrical +// (And as we didn't generate any prologue at all, it would be asymmetrical // to the epilogue.) By the end of the function, we assert that // HasWinCFI is equal to MF.hasWinCFI(), to verify this assumption. HasWinCFI = true; diff --git a/llvm/lib/Target/AMDGPU/SIDefines.h b/llvm/lib/Target/AMDGPU/SIDefines.h index 0abd96dc4607..65c486ef73e2 100644 --- a/llvm/lib/Target/AMDGPU/SIDefines.h +++ b/llvm/lib/Target/AMDGPU/SIDefines.h @@ -33,7 +33,7 @@ enum : uint64_t { VOP2 = 1 << 8, VOPC = 1 << 9, - // TODO: Should this be spilt into VOP3 a and b? + // TODO: Should this be spilt into VOP3 a and b? VOP3 = 1 << 10, VOP3P = 1 << 12, ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits