[PATCH] D156121: [Clang][AArch64] svldr_vnum/svstr_vnum should use cntsb iso vscale for the offset
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGa8cbd27d1f23: [Clang][AArch64] svldr_vnum/svstr_vnum should use cntsb iso vscale for theā¦ (authored by sdesmalen). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D156121/new/ https://reviews.llvm.org/D156121 Files: clang/lib/CodeGen/CGBuiltin.cpp clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c Index: clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c === --- clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c +++ clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c @@ -18,8 +18,8 @@ // CHECK-C-LABEL: @test_svstr_vnum_za_1( // CHECK-CXX-LABEL: @_Z20test_svstr_vnum_za_1jPv( // CHECK-NEXT: entry: -// CHECK-NEXT:[[VSCALE:%.*]] = tail call i64 @llvm.vscale.i64() -// CHECK-NEXT:[[MULVL:%.*]] = mul nuw nsw i64 [[VSCALE]], 240 +// CHECK-NEXT:[[SVLB:%.*]] = tail call i64 @llvm.aarch64.sme.cntsb() +// CHECK-NEXT:[[MULVL:%.*]] = mul i64 [[SVLB]], 15 // CHECK-NEXT:[[TMP0:%.*]] = getelementptr i8, ptr [[PTR:%.*]], i64 [[MULVL]] // CHECK-NEXT:[[TILESLICE:%.*]] = add i32 [[SLICE_BASE:%.*]], 15 // CHECK-NEXT:tail call void @llvm.aarch64.sme.str(i32 [[TILESLICE]], ptr [[TMP0]]) Index: clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c === --- clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c +++ clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c @@ -18,8 +18,8 @@ // CHECK-C-LABEL: @test_svldr_vnum_za_1( // CHECK-CXX-LABEL: @_Z20test_svldr_vnum_za_1jPKv( // CHECK-NEXT: entry: -// CHECK-NEXT:[[VSCALE:%.*]] = tail call i64 @llvm.vscale.i64() -// CHECK-NEXT:[[MULVL:%.*]] = mul nuw nsw i64 [[VSCALE]], 240 +// CHECK-NEXT:[[SVLB:%.*]] = tail call i64 @llvm.aarch64.sme.cntsb() +// CHECK-NEXT:[[MULVL:%.*]] = mul i64 [[SVLB]], 15 // CHECK-NEXT:[[TMP0:%.*]] = getelementptr i8, ptr [[PTR:%.*]], i64 [[MULVL]] // CHECK-NEXT:[[TILESLICE:%.*]] = add i32 [[SLICE_BASE:%.*]], 15 // CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr(i32 [[TILESLICE]], ptr [[TMP0]]) Index: clang/lib/CodeGen/CGBuiltin.cpp === --- clang/lib/CodeGen/CGBuiltin.cpp +++ clang/lib/CodeGen/CGBuiltin.cpp @@ -9508,11 +9508,11 @@ Value *CodeGenFunction::EmitSMELdrStr(SVETypeFlags TypeFlags, SmallVectorImpl , unsigned IntID) { - Function *Vscale = CGM.getIntrinsic(Intrinsic::vscale, Int64Ty); - llvm::Value *VscaleCall = Builder.CreateCall(Vscale, {}, "vscale"); + Function *Cntsb = CGM.getIntrinsic(Intrinsic::aarch64_sme_cntsb); + llvm::Value *CntsbCall = Builder.CreateCall(Cntsb, {}, "svlb"); llvm::Value *MulVL = Builder.CreateMul( - VscaleCall, - Builder.getInt64(16 * cast(Ops[1])->getZExtValue()), + CntsbCall, + Builder.getInt64(cast(Ops[1])->getZExtValue()), "mulvl"); Ops[2] = Builder.CreateGEP(Int8Ty, Ops[2], MulVL); Ops[0] = EmitTileslice(Ops[1], Ops[0]); Index: clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c === --- clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c +++ clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c @@ -18,8 +18,8 @@ // CHECK-C-LABEL: @test_svstr_vnum_za_1( // CHECK-CXX-LABEL: @_Z20test_svstr_vnum_za_1jPv( // CHECK-NEXT: entry: -// CHECK-NEXT:[[VSCALE:%.*]] = tail call i64 @llvm.vscale.i64() -// CHECK-NEXT:[[MULVL:%.*]] = mul nuw nsw i64 [[VSCALE]], 240 +// CHECK-NEXT:[[SVLB:%.*]] = tail call i64 @llvm.aarch64.sme.cntsb() +// CHECK-NEXT:[[MULVL:%.*]] = mul i64 [[SVLB]], 15 // CHECK-NEXT:[[TMP0:%.*]] = getelementptr i8, ptr [[PTR:%.*]], i64 [[MULVL]] // CHECK-NEXT:[[TILESLICE:%.*]] = add i32 [[SLICE_BASE:%.*]], 15 // CHECK-NEXT:tail call void @llvm.aarch64.sme.str(i32 [[TILESLICE]], ptr [[TMP0]]) Index: clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c === --- clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c +++ clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c @@ -18,8 +18,8 @@ // CHECK-C-LABEL: @test_svldr_vnum_za_1( // CHECK-CXX-LABEL: @_Z20test_svldr_vnum_za_1jPKv( // CHECK-NEXT: entry: -// CHECK-NEXT:[[VSCALE:%.*]] = tail call i64 @llvm.vscale.i64() -// CHECK-NEXT:[[MULVL:%.*]] = mul nuw nsw i64 [[VSCALE]], 240 +// CHECK-NEXT:[[SVLB:%.*]] = tail call i64 @llvm.aarch64.sme.cntsb() +// CHECK-NEXT:[[MULVL:%.*]] = mul i64 [[SVLB]], 15 // CHECK-NEXT:[[TMP0:%.*]] = getelementptr i8, ptr [[PTR:%.*]], i64 [[MULVL]] // CHECK-NEXT:
[PATCH] D156121: [Clang][AArch64] svldr_vnum/svstr_vnum should use cntsb iso vscale for the offset
bryanpkc accepted this revision. bryanpkc added a comment. This revision is now accepted and ready to land. LGTM. Sorry for not catching this earlier. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D156121/new/ https://reviews.llvm.org/D156121 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D156121: [Clang][AArch64] svldr_vnum/svstr_vnum should use cntsb iso vscale for the offset
sdesmalen created this revision. sdesmalen added reviewers: bryanpkc, CarolineConcatto, dtemirbulatov. Herald added subscribers: ctetreau, kristof.beyls. Herald added a project: All. sdesmalen requested review of this revision. Herald added a project: clang. Herald added a subscriber: cfe-commits. The specification for LDR/STR says that: The ZA array vector is selected by the sum of the vector select register and immediate offset, modulo the number of bytes in a Streaming SVE vector. [..] This instruction does not require the PE to be in Streaming SVE mode When the instruction is used outside of streaming mode, 'vscale' will result in the wrong value being used for the offset because LLVM's code-generator will emit the non-streaming 'RDVL/ADDVL' instead of the 'RDSVL/ADDSVL' instructions which are used to get the Streaming-SVE vector length. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D156121 Files: clang/lib/CodeGen/CGBuiltin.cpp clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c Index: clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c === --- clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c +++ clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c @@ -18,8 +18,8 @@ // CHECK-C-LABEL: @test_svstr_vnum_za_1( // CHECK-CXX-LABEL: @_Z20test_svstr_vnum_za_1jPv( // CHECK-NEXT: entry: -// CHECK-NEXT:[[VSCALE:%.*]] = tail call i64 @llvm.vscale.i64() -// CHECK-NEXT:[[MULVL:%.*]] = mul nuw nsw i64 [[VSCALE]], 240 +// CHECK-NEXT:[[SVLB:%.*]] = tail call i64 @llvm.aarch64.sme.cntsb() +// CHECK-NEXT:[[MULVL:%.*]] = mul i64 [[SVLB]], 15 // CHECK-NEXT:[[TMP0:%.*]] = getelementptr i8, ptr [[PTR:%.*]], i64 [[MULVL]] // CHECK-NEXT:[[TILESLICE:%.*]] = add i32 [[SLICE_BASE:%.*]], 15 // CHECK-NEXT:tail call void @llvm.aarch64.sme.str(i32 [[TILESLICE]], ptr [[TMP0]]) Index: clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c === --- clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c +++ clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c @@ -18,8 +18,8 @@ // CHECK-C-LABEL: @test_svldr_vnum_za_1( // CHECK-CXX-LABEL: @_Z20test_svldr_vnum_za_1jPKv( // CHECK-NEXT: entry: -// CHECK-NEXT:[[VSCALE:%.*]] = tail call i64 @llvm.vscale.i64() -// CHECK-NEXT:[[MULVL:%.*]] = mul nuw nsw i64 [[VSCALE]], 240 +// CHECK-NEXT:[[SVLB:%.*]] = tail call i64 @llvm.aarch64.sme.cntsb() +// CHECK-NEXT:[[MULVL:%.*]] = mul i64 [[SVLB]], 15 // CHECK-NEXT:[[TMP0:%.*]] = getelementptr i8, ptr [[PTR:%.*]], i64 [[MULVL]] // CHECK-NEXT:[[TILESLICE:%.*]] = add i32 [[SLICE_BASE:%.*]], 15 // CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr(i32 [[TILESLICE]], ptr [[TMP0]]) Index: clang/lib/CodeGen/CGBuiltin.cpp === --- clang/lib/CodeGen/CGBuiltin.cpp +++ clang/lib/CodeGen/CGBuiltin.cpp @@ -9508,11 +9508,11 @@ Value *CodeGenFunction::EmitSMELdrStr(SVETypeFlags TypeFlags, SmallVectorImpl , unsigned IntID) { - Function *Vscale = CGM.getIntrinsic(Intrinsic::vscale, Int64Ty); - llvm::Value *VscaleCall = Builder.CreateCall(Vscale, {}, "vscale"); + Function *Cntsb = CGM.getIntrinsic(Intrinsic::aarch64_sme_cntsb); + llvm::Value *CntsbCall = Builder.CreateCall(Cntsb, {}, "svlb"); llvm::Value *MulVL = Builder.CreateMul( - VscaleCall, - Builder.getInt64(16 * cast(Ops[1])->getZExtValue()), + CntsbCall, + Builder.getInt64(cast(Ops[1])->getZExtValue()), "mulvl"); Ops[2] = Builder.CreateGEP(Int8Ty, Ops[2], MulVL); Ops[0] = EmitTileslice(Ops[1], Ops[0]); Index: clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c === --- clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c +++ clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_str.c @@ -18,8 +18,8 @@ // CHECK-C-LABEL: @test_svstr_vnum_za_1( // CHECK-CXX-LABEL: @_Z20test_svstr_vnum_za_1jPv( // CHECK-NEXT: entry: -// CHECK-NEXT:[[VSCALE:%.*]] = tail call i64 @llvm.vscale.i64() -// CHECK-NEXT:[[MULVL:%.*]] = mul nuw nsw i64 [[VSCALE]], 240 +// CHECK-NEXT:[[SVLB:%.*]] = tail call i64 @llvm.aarch64.sme.cntsb() +// CHECK-NEXT:[[MULVL:%.*]] = mul i64 [[SVLB]], 15 // CHECK-NEXT:[[TMP0:%.*]] = getelementptr i8, ptr [[PTR:%.*]], i64 [[MULVL]] // CHECK-NEXT:[[TILESLICE:%.*]] = add i32 [[SLICE_BASE:%.*]], 15 // CHECK-NEXT:tail call void @llvm.aarch64.sme.str(i32 [[TILESLICE]], ptr [[TMP0]]) Index: clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c === --- clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ldr.c +++