[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
https://github.com/Lukacma closed https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
https://github.com/Lukacma updated https://github.com/llvm/llvm-project/pull/128019 Rate limit · GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support — https://githubstatus.com";>GitHub Status — https://twitter.com/githubstatus";>@githubstatus ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, Builder.CreateStore(errorValue, swiftErrorTemp); } +// Mfloat8 type is loaded as scalar type, but is treated as single +// vector type for other operations. We need to bitcast it to the vector +// type here. +if (auto *EltTy = paulwalker-arm wrote: The underlying storage for `__mfp8` is an FPR and until we decide whether to use a dedicated target type, or LLVM gains an opaque 8-bit floating point type our only option is to represent it as an i8 vector type. The reason for using `i8` was for some specific code reuse but as this PR showed, that reuse is not total and so I'd rather we just be honest and insert the relevant bitcasts when necessary. This will put us in good stead if we decide to go the target type route. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, Builder.CreateStore(errorValue, swiftErrorTemp); } +// Mfloat8 type is loaded as scalar type, but is treated as single +// vector type for other operations. We need to bitcast it to the vector +// type here. +if (auto *EltTy = Lukacma wrote: I’m not yet confident in my understanding of the trade-offs between the two approaches, beside that one impacts target-specific code while the other affects target-independent code. As such, I don’t feel well-positioned to contribute meaningfully to this discussion. That said, I’d appreciate it if we could reach alignment here, as I’d like to merge this patch soon. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -4179,9 +4183,19 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const CallExpr *E, unsigned IntrinsicID, bool IsZExtReturn) { QualType LangPTy = E->getArg(1)->getType(); - llvm::Type *MemEltTy = CGM.getTypes().ConvertTypeForMem( + llvm::Type *MemEltTy = CGM.getTypes().ConvertType( LangPTy->castAs()->getPointeeType()); + // Mfloat8 types is stored as a vector, so extra work + // to extract sclar element type is necessary. + if (MemEltTy->isVectorTy()) { +#ifndef NDEBUG paulwalker-arm wrote: With the asserts simplified you should no longer require `NDEBUG`. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, Builder.CreateStore(errorValue, swiftErrorTemp); } +// Mfloat8 type is loaded as scalar type, but is treated as single +// vector type for other operations. We need to bitcast it to the vector +// type here. +if (auto *EltTy = momchil-velikov wrote: > Does the ABI say this? It doesn't. Unfortunately this discussion was split and I didn't replicate all my comments here. > Momchil Velikov 15 Apr at 16:11 > The ABI spec (naturally) does not say anything about <1 x i8> . It says (in a > somewhat obscure way) that the value > is passed in a FPR. > And then clang/llvm decide to implement the ABI by mapping to <1 x T>. I consider the "natural" mapping of `__mfp8` to LLVM types to be `i8` and `<1 x i8>` to be merely a hack coming from the peculiar way of implementing ABIs in clang/llvm (by implicit contracts and "mutual understading"). As such `<1 x i8>` out to be applicable only for values that are arguments passed in registers. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
https://github.com/paulwalker-arm edited https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -4226,9 +4242,21 @@ Value *CodeGenFunction::EmitSVEMaskedStore(const CallExpr *E, SmallVectorImpl &Ops, unsigned IntrinsicID) { QualType LangPTy = E->getArg(1)->getType(); - llvm::Type *MemEltTy = CGM.getTypes().ConvertTypeForMem( + llvm::Type *MemEltTy = CGM.getTypes().ConvertType( LangPTy->castAs()->getPointeeType()); + // Mfloat8 types is stored as a vector, so extra work + // to extract sclar element type is necessary. + if (MemEltTy->isVectorTy()) { +#ifndef NDEBUG +auto *VecTy = cast(MemEltTy); +ElementCount EC = VecTy->getElementCount(); +assert(EC.isScalar() && VecTy->getElementType() == Int8Ty && + "Only <1 x i8> expected"); +#endif paulwalker-arm wrote: As above. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -4179,9 +4183,21 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const CallExpr *E, unsigned IntrinsicID, bool IsZExtReturn) { QualType LangPTy = E->getArg(1)->getType(); - llvm::Type *MemEltTy = CGM.getTypes().ConvertTypeForMem( + llvm::Type *MemEltTy = CGM.getTypes().ConvertType( LangPTy->castAs()->getPointeeType()); + // Mfloat8 types is stored as a vector, so extra work + // to extract sclar element type is necessary. + if (MemEltTy->isVectorTy()) { +#ifndef NDEBUG +auto *VecTy = cast(MemEltTy); +ElementCount EC = VecTy->getElementCount(); +assert(EC.isScalar() && VecTy->getElementType() == Int8Ty && + "Only <1 x i8> expected"); +#endif paulwalker-arm wrote: I think `assert(MemEltTy == FixedVectorType::get(Int8Ty, 1) && ` should work here? https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
https://github.com/paulwalker-arm approved this pull request. I've not verified every line of the test files but what I've seen looks good, as do the code changes. Other than a few stylistic suggestions this looks good to me. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -4179,9 +4183,21 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const CallExpr *E, unsigned IntrinsicID, bool IsZExtReturn) { QualType LangPTy = E->getArg(1)->getType(); - llvm::Type *MemEltTy = CGM.getTypes().ConvertTypeForMem( + llvm::Type *MemEltTy = CGM.getTypes().ConvertType( paulwalker-arm wrote: Is this change necessary? ConvertTypeForMem should now return the same vector type for mfloat8? https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -2056,9 +2056,21 @@ void NeonEmitter::createIntrinsic(const Record *R, auto &Entry = IntrinsicMap[Name]; for (auto &I : NewTypeSpecs) { + +// MFloat8 type is only available on AArch64. If encountered set ArchGuard +// correctly. +std::string savedArchGuard = ArchGuard; +if (Type(I.first, ".").isMFloat8()) { + if (ArchGuard.empty()) { +ArchGuard = "defined(__aarch64__)"; + } else if (ArchGuard.find("defined(__aarch64__)") == std::string::npos) { +ArchGuard = "defined(__aarch64__) && (" + savedArchGuard + ")"; + } +} Entry.emplace_back(R, Name, Proto, I.first, I.second, CK, Body, *this, ArchGuard, TargetGuard, IsUnavailable, BigEndianSafe); Out.push_back(&Entry.back()); +ArchGuard = savedArchGuard; Lukacma wrote: Sorry ! Fixed now https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, Builder.CreateStore(errorValue, swiftErrorTemp); } +// Mfloat8 type is loaded as scalar type, but is treated as single +// vector type for other operations. We need to bitcast it to the vector +// type here. +if (auto *EltTy = Lukacma wrote: Done https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, Builder.CreateStore(errorValue, swiftErrorTemp); } +// Mfloat8 type is loaded as scalar type, but is treated as single +// vector type for other operations. We need to bitcast it to the vector +// type here. +if (auto *EltTy = paulwalker-arm wrote: Not sure what the fallout will be from this but I think the problem here is we should not have loaded a scalar in the first place. Looking at `CodeGenTypes::ConvertTypeForMem()` I can see that we're using a different type for the memory representation than the normal one, which I think is a mistake. Changing this so the types are consistent will remove the need for this code but I suspect it'll prompt further work elsewhere. My hope is that work sits in target specific areas relating to modelling the builtin so seem reasonable. Please shout though if it starts to get out of control. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -2056,9 +2056,21 @@ void NeonEmitter::createIntrinsic(const Record *R, auto &Entry = IntrinsicMap[Name]; for (auto &I : NewTypeSpecs) { + +// MFloat8 type is only available on AArch64. If encountered set ArchGuard +// correctly. +std::string savedArchGuard = ArchGuard; +if (Type(I.first, ".").isMFloat8()) { + if (ArchGuard.empty()) { +ArchGuard = "defined(__aarch64__)"; + } else if (ArchGuard.find("defined(__aarch64__)") == std::string::npos) { +ArchGuard = "defined(__aarch64__) && (" + savedArchGuard + ")"; + } +} Entry.emplace_back(R, Name, Proto, I.first, I.second, CK, Body, *this, ArchGuard, TargetGuard, IsUnavailable, BigEndianSafe); Out.push_back(&Entry.back()); +ArchGuard = savedArchGuard; paulwalker-arm wrote: If you use the new variable (whose current name breaks the coding standard) in the call to `Entry.emplace_back` you'll not need to restore ArchGuard. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
Lukacma wrote: I have adjusted NeonEmitter to automatically emit correct attribute for mfloat8 intrinsics and merged them to the original record. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
https://github.com/Lukacma updated https://github.com/llvm/llvm-project/pull/128019 >From c331c4c260b6432b6ae96723f78c16b189e9297a Mon Sep 17 00:00:00 2001 From: Marian Lukac Date: Thu, 20 Feb 2025 15:35:45 + Subject: [PATCH 1/2] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics This patch adds fp8 variants to existing intrinsics, whose operation doesn't depend on arguments being a specific type. --- clang/include/clang/Basic/arm_neon.td | 74 +- clang/lib/AST/Type.cpp|5 + clang/lib/CodeGen/CGCall.cpp |9 + clang/lib/CodeGen/TargetBuiltins/ARM.cpp | 20 + clang/lib/Sema/SemaInit.cpp |2 + .../fp8-intrinsics/acle_neon_fp8_untyped.c| 1114 + 6 files changed, 1220 insertions(+), 4 deletions(-) create mode 100644 clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_untyped.c diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td index ab0051efe5159..90f0e90e4a7f8 100644 --- a/clang/include/clang/Basic/arm_neon.td +++ b/clang/include/clang/Basic/arm_neon.td @@ -2090,17 +2090,17 @@ let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = "r // Lookup table read with 2-bit/4-bit indices let ArchGuard = "defined(__aarch64__)", TargetGuard = "lut" in { - def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcQcQUcQPc", + def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcmQcQUcQPcQm", [ImmCheck<2, ImmCheck0_1>]>; - def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcQcQUcQPc", + def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcmQcQUcQPcQm", [ImmCheck<2, ImmCheck0_3>]>; def VLUTI2_H: SInst<"vluti2_lane", "Q.(]>; def VLUTI2_H_Q : SInst<"vluti2_laneq", "Q.(]>; - def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPc", + def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPcQm", [ImmCheck<2, ImmCheck0_0>]>; - def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPc", + def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPcQm", [ImmCheck<2, ImmCheck0_1>]>; def VLUTI4_H_X2 : SInst<"vluti4_lane_x2", ".2(]>; @@ -2194,4 +2194,70 @@ let ArchGuard = "defined(__aarch64__)", TargetGuard = "fp8,neon" in { // fscale def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">; def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">; +} + +//FP8 versions of untyped intrinsics +let ArchGuard = "defined(__aarch64__)" in { + def VGET_LANE_MF8 : IInst<"vget_lane", "1.I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>; + def SPLAT_MF8 : WInst<"splat_lane", ".(!q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>; + def SPLATQ_MF8 : WInst<"splat_laneq", ".(!Q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>; + def VSET_LANE_MF8 : IInst<"vset_lane", ".1.I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 1>]>; + def VCREATE_MF8 : NoTestOpInst<"vcreate", ".(IU>)", "m", OP_CAST> { let BigEndianSafe = 1; } + let InstName = "vmov" in { +def VDUP_N_MF8 : WOpInst<"vdup_n", ".1", "mQm", OP_DUP>; +def VMOV_N_MF8 : WOpInst<"vmov_n", ".1", "mQm", OP_DUP>; + } + let InstName = "" in +def VDUP_LANE_MF8: WOpInst<"vdup_lane", ".qI", "mQm", OP_DUP_LN>; + def VCOMBINE_MF8 : NoTestOpInst<"vcombine", "Q..", "m", OP_CONC>; + let InstName = "vmov" in { +def VGET_HIGH_MF8 : NoTestOpInst<"vget_high", ".Q", "m", OP_HI>; +def VGET_LOW_MF8 : NoTestOpInst<"vget_low", ".Q", "m", OP_LO>; + } + let InstName = "vtbl" in { +def VTBL1_MF8 : WInst<"vtbl1", "..p", "m">; +def VTBL2_MF8 : WInst<"vtbl2", ".2p", "m">; +def VTBL3_MF8 : WInst<"vtbl3", ".3p", "m">; +def VTBL4_MF8 : WInst<"vtbl4", ".4p", "m">; + } + let InstName = "vtbx" in { +def VTBX1_MF8 : WInst<"vtbx1", "...p", "m">; +def VTBX2_MF8 : WInst<"vtbx2", "..2p", "m">; +def VTBX3_MF8 : WInst<"vtbx3", "..3p", "m">; +def VTBX4_MF8 : WInst<"vtbx4", "..4p", "m">; + } + def VEXT_MF8 : WInst<"vext", "...I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 0>]>; + def VREV64_MF8 : WOpInst<"vrev64", "..", "mQm", OP_REV64>; + def VREV32_MF8 : WOpInst<"vrev32", "..", "mQm", OP_REV32>; + def VREV16_MF8 : WOpInst<"vrev16", "..", "mQm", OP_REV16>; + let isHiddenLInst = 1 in + def VBSL_MF8 : SInst<"vbsl", ".U..", "mQm">; + def VTRN_MF8 : WInst<"vtrn", "2..", "mQm">; + def VZIP_MF8 : WInst<"vzip", "2..", "mQm">; + def VUZP_MF8 : WInst<"vuzp", "2..", "mQm">; + def COPY_LANE_MF8 : IOpInst<"vcopy_lane", "..I.I", "m", OP_COPY_LN>; + def COPYQ_LANE_MF8 : IOpInst<"vcopy_lane", "..IqI", "Qm", OP_COPY_LN>; + def COPY_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..IQI", "m", OP_COPY_LN>; + def COPYQ_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..I.I", "Qm", OP_COPY_LN>; + def VDUP_LANE2_MF8 : WOpInst<"vdup_laneq", ".QI", "mQm", OP_DUP_LN>; + def VTRN1_MF8 : SOpInst<"vtrn1", "...", "mQm", OP_TRN1>; + def VZIP1_MF8 : SOpInst<"vzip1", "..."
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
https://github.com/Lukacma updated https://github.com/llvm/llvm-project/pull/128019 >From c331c4c260b6432b6ae96723f78c16b189e9297a Mon Sep 17 00:00:00 2001 From: Marian Lukac Date: Thu, 20 Feb 2025 15:35:45 + Subject: [PATCH] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics This patch adds fp8 variants to existing intrinsics, whose operation doesn't depend on arguments being a specific type. --- clang/include/clang/Basic/arm_neon.td | 74 +- clang/lib/AST/Type.cpp|5 + clang/lib/CodeGen/CGCall.cpp |9 + clang/lib/CodeGen/TargetBuiltins/ARM.cpp | 20 + clang/lib/Sema/SemaInit.cpp |2 + .../fp8-intrinsics/acle_neon_fp8_untyped.c| 1114 + 6 files changed, 1220 insertions(+), 4 deletions(-) create mode 100644 clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_untyped.c diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td index ab0051efe5159..90f0e90e4a7f8 100644 --- a/clang/include/clang/Basic/arm_neon.td +++ b/clang/include/clang/Basic/arm_neon.td @@ -2090,17 +2090,17 @@ let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = "r // Lookup table read with 2-bit/4-bit indices let ArchGuard = "defined(__aarch64__)", TargetGuard = "lut" in { - def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcQcQUcQPc", + def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcmQcQUcQPcQm", [ImmCheck<2, ImmCheck0_1>]>; - def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcQcQUcQPc", + def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcmQcQUcQPcQm", [ImmCheck<2, ImmCheck0_3>]>; def VLUTI2_H: SInst<"vluti2_lane", "Q.(]>; def VLUTI2_H_Q : SInst<"vluti2_laneq", "Q.(]>; - def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPc", + def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPcQm", [ImmCheck<2, ImmCheck0_0>]>; - def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPc", + def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPcQm", [ImmCheck<2, ImmCheck0_1>]>; def VLUTI4_H_X2 : SInst<"vluti4_lane_x2", ".2(]>; @@ -2194,4 +2194,70 @@ let ArchGuard = "defined(__aarch64__)", TargetGuard = "fp8,neon" in { // fscale def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">; def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">; +} + +//FP8 versions of untyped intrinsics +let ArchGuard = "defined(__aarch64__)" in { + def VGET_LANE_MF8 : IInst<"vget_lane", "1.I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>; + def SPLAT_MF8 : WInst<"splat_lane", ".(!q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>; + def SPLATQ_MF8 : WInst<"splat_laneq", ".(!Q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>; + def VSET_LANE_MF8 : IInst<"vset_lane", ".1.I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 1>]>; + def VCREATE_MF8 : NoTestOpInst<"vcreate", ".(IU>)", "m", OP_CAST> { let BigEndianSafe = 1; } + let InstName = "vmov" in { +def VDUP_N_MF8 : WOpInst<"vdup_n", ".1", "mQm", OP_DUP>; +def VMOV_N_MF8 : WOpInst<"vmov_n", ".1", "mQm", OP_DUP>; + } + let InstName = "" in +def VDUP_LANE_MF8: WOpInst<"vdup_lane", ".qI", "mQm", OP_DUP_LN>; + def VCOMBINE_MF8 : NoTestOpInst<"vcombine", "Q..", "m", OP_CONC>; + let InstName = "vmov" in { +def VGET_HIGH_MF8 : NoTestOpInst<"vget_high", ".Q", "m", OP_HI>; +def VGET_LOW_MF8 : NoTestOpInst<"vget_low", ".Q", "m", OP_LO>; + } + let InstName = "vtbl" in { +def VTBL1_MF8 : WInst<"vtbl1", "..p", "m">; +def VTBL2_MF8 : WInst<"vtbl2", ".2p", "m">; +def VTBL3_MF8 : WInst<"vtbl3", ".3p", "m">; +def VTBL4_MF8 : WInst<"vtbl4", ".4p", "m">; + } + let InstName = "vtbx" in { +def VTBX1_MF8 : WInst<"vtbx1", "...p", "m">; +def VTBX2_MF8 : WInst<"vtbx2", "..2p", "m">; +def VTBX3_MF8 : WInst<"vtbx3", "..3p", "m">; +def VTBX4_MF8 : WInst<"vtbx4", "..4p", "m">; + } + def VEXT_MF8 : WInst<"vext", "...I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 0>]>; + def VREV64_MF8 : WOpInst<"vrev64", "..", "mQm", OP_REV64>; + def VREV32_MF8 : WOpInst<"vrev32", "..", "mQm", OP_REV32>; + def VREV16_MF8 : WOpInst<"vrev16", "..", "mQm", OP_REV16>; + let isHiddenLInst = 1 in + def VBSL_MF8 : SInst<"vbsl", ".U..", "mQm">; + def VTRN_MF8 : WInst<"vtrn", "2..", "mQm">; + def VZIP_MF8 : WInst<"vzip", "2..", "mQm">; + def VUZP_MF8 : WInst<"vuzp", "2..", "mQm">; + def COPY_LANE_MF8 : IOpInst<"vcopy_lane", "..I.I", "m", OP_COPY_LN>; + def COPYQ_LANE_MF8 : IOpInst<"vcopy_lane", "..IqI", "Qm", OP_COPY_LN>; + def COPY_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..IQI", "m", OP_COPY_LN>; + def COPYQ_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..I.I", "Qm", OP_COPY_LN>; + def VDUP_LANE2_MF8 : WOpInst<"vdup_laneq", ".QI", "mQm", OP_DUP_LN>; + def VTRN1_MF8 : SOpInst<"vtrn1", "...", "mQm", OP_TRN1>; + def VZIP1_MF8 : SOpInst<"vzip1", "...", "mQ
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
paulwalker-arm wrote: > @paulwalker-arm the reasoning behind creating separate records, is that > mfloat type is not available for aarch32 architectures and therefore all > intrinsics using it need to be gated behind `ArchGuard = > "defined(__aarch64__)"` . I see. How practical would it be for NEONEmitter to infer the ArchGuard based on the type? I'm assuming ArchGuard is either unset of set to what we need for all the cases we care about. This is not a firm ask but it would be nice to reuse the existing definitions if possible. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
Lukacma wrote: @paulwalker-arm the reasoning behind creating separate records, is that mfloat type is not available for aarch32 architectures and therefore all intrinsics using it need to be gated behind `ArchGuard = "defined(__aarch64__)"` . https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
Lukacma wrote: > For my education can you explain why the fp8 variants are broken out into > their own definitions. Taking VREV64_MF8 as an example, it looks like you > should be able to add the new type strings to the current definition? That's a good question. Its been a while since I implemented this patch, so I have forgotten my reasoning behind this, but I think this might be because I was originally not sure if we want to target guard these, behind fp8 feature flag. Since it looks like we are not doing that I can merge them back to their original intrinsics https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, Builder.CreateStore(errorValue, swiftErrorTemp); } +// Mfloat8 type is loaded as scalar type, but is treated as single +// vector type for other operations. We need to bitcast it to the vector +// type here. +if (auto *EltTy = paulwalker-arm wrote: Does the ABI say this? My understand is that values of type _mfp8 are floating-point 8-bit values that are passes as _mfp8. The pretend it's an `i8` in some cases and `<1 x i8>` in others is purely an implementation detail within clang. This is not to say the code is invalid, but we should be cautious with how far down the rabbit hole we go. FYI: As part of @MacDue's work to improve streaming-mode code generation I asked him to add the MVT `aarch64mfp8` along with support to load and store it. I expect over time we'll migrate away from using `i8` as our scalar type. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
paulwalker-arm wrote: For my education can you explain why the fp8 variants are broken out into their own definitions. Taking `VREV64_MF8` as an example, it looks like you should be able to add the new type strings to the current definition? https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, Builder.CreateStore(errorValue, swiftErrorTemp); } +// Mfloat8 type is loaded as scalar type, but is treated as single +// vector type for other operations. We need to bitcast it to the vector +// type here. +if (auto *EltTy = momchil-velikov wrote: I don't see an issue here. That is exactly what should happen regardless of the target architecture any time the ABI for that architecture says values of type `T` are passed as `<1 x T>`. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
Lukacma wrote: Failures should be fixedn now https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, Builder.CreateStore(errorValue, swiftErrorTemp); } +// Mfloat8 type is loaded as scalar type, but is treated as single +// vector type for other operations. We need to bitcast it to the vector +// type here. +if (auto *EltTy = Lukacma wrote: I am not sure if this is the best way to solve this issue so would appreciate your feedback on this. https://github.com/llvm/llvm-project/pull/128019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)
https://github.com/Lukacma updated https://github.com/llvm/llvm-project/pull/128019 >From c331c4c260b6432b6ae96723f78c16b189e9297a Mon Sep 17 00:00:00 2001 From: Marian Lukac Date: Thu, 20 Feb 2025 15:35:45 + Subject: [PATCH] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics This patch adds fp8 variants to existing intrinsics, whose operation doesn't depend on arguments being a specific type. --- clang/include/clang/Basic/arm_neon.td | 74 +- clang/lib/AST/Type.cpp|5 + clang/lib/CodeGen/CGCall.cpp |9 + clang/lib/CodeGen/TargetBuiltins/ARM.cpp | 20 + clang/lib/Sema/SemaInit.cpp |2 + .../fp8-intrinsics/acle_neon_fp8_untyped.c| 1114 + 6 files changed, 1220 insertions(+), 4 deletions(-) create mode 100644 clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_untyped.c diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td index ab0051efe5159..90f0e90e4a7f8 100644 --- a/clang/include/clang/Basic/arm_neon.td +++ b/clang/include/clang/Basic/arm_neon.td @@ -2090,17 +2090,17 @@ let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = "r // Lookup table read with 2-bit/4-bit indices let ArchGuard = "defined(__aarch64__)", TargetGuard = "lut" in { - def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcQcQUcQPc", + def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcmQcQUcQPcQm", [ImmCheck<2, ImmCheck0_1>]>; - def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcQcQUcQPc", + def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcmQcQUcQPcQm", [ImmCheck<2, ImmCheck0_3>]>; def VLUTI2_H: SInst<"vluti2_lane", "Q.(]>; def VLUTI2_H_Q : SInst<"vluti2_laneq", "Q.(]>; - def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPc", + def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPcQm", [ImmCheck<2, ImmCheck0_0>]>; - def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPc", + def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPcQm", [ImmCheck<2, ImmCheck0_1>]>; def VLUTI4_H_X2 : SInst<"vluti4_lane_x2", ".2(]>; @@ -2194,4 +2194,70 @@ let ArchGuard = "defined(__aarch64__)", TargetGuard = "fp8,neon" in { // fscale def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">; def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">; +} + +//FP8 versions of untyped intrinsics +let ArchGuard = "defined(__aarch64__)" in { + def VGET_LANE_MF8 : IInst<"vget_lane", "1.I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>; + def SPLAT_MF8 : WInst<"splat_lane", ".(!q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>; + def SPLATQ_MF8 : WInst<"splat_laneq", ".(!Q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>; + def VSET_LANE_MF8 : IInst<"vset_lane", ".1.I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 1>]>; + def VCREATE_MF8 : NoTestOpInst<"vcreate", ".(IU>)", "m", OP_CAST> { let BigEndianSafe = 1; } + let InstName = "vmov" in { +def VDUP_N_MF8 : WOpInst<"vdup_n", ".1", "mQm", OP_DUP>; +def VMOV_N_MF8 : WOpInst<"vmov_n", ".1", "mQm", OP_DUP>; + } + let InstName = "" in +def VDUP_LANE_MF8: WOpInst<"vdup_lane", ".qI", "mQm", OP_DUP_LN>; + def VCOMBINE_MF8 : NoTestOpInst<"vcombine", "Q..", "m", OP_CONC>; + let InstName = "vmov" in { +def VGET_HIGH_MF8 : NoTestOpInst<"vget_high", ".Q", "m", OP_HI>; +def VGET_LOW_MF8 : NoTestOpInst<"vget_low", ".Q", "m", OP_LO>; + } + let InstName = "vtbl" in { +def VTBL1_MF8 : WInst<"vtbl1", "..p", "m">; +def VTBL2_MF8 : WInst<"vtbl2", ".2p", "m">; +def VTBL3_MF8 : WInst<"vtbl3", ".3p", "m">; +def VTBL4_MF8 : WInst<"vtbl4", ".4p", "m">; + } + let InstName = "vtbx" in { +def VTBX1_MF8 : WInst<"vtbx1", "...p", "m">; +def VTBX2_MF8 : WInst<"vtbx2", "..2p", "m">; +def VTBX3_MF8 : WInst<"vtbx3", "..3p", "m">; +def VTBX4_MF8 : WInst<"vtbx4", "..4p", "m">; + } + def VEXT_MF8 : WInst<"vext", "...I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 0>]>; + def VREV64_MF8 : WOpInst<"vrev64", "..", "mQm", OP_REV64>; + def VREV32_MF8 : WOpInst<"vrev32", "..", "mQm", OP_REV32>; + def VREV16_MF8 : WOpInst<"vrev16", "..", "mQm", OP_REV16>; + let isHiddenLInst = 1 in + def VBSL_MF8 : SInst<"vbsl", ".U..", "mQm">; + def VTRN_MF8 : WInst<"vtrn", "2..", "mQm">; + def VZIP_MF8 : WInst<"vzip", "2..", "mQm">; + def VUZP_MF8 : WInst<"vuzp", "2..", "mQm">; + def COPY_LANE_MF8 : IOpInst<"vcopy_lane", "..I.I", "m", OP_COPY_LN>; + def COPYQ_LANE_MF8 : IOpInst<"vcopy_lane", "..IqI", "Qm", OP_COPY_LN>; + def COPY_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..IQI", "m", OP_COPY_LN>; + def COPYQ_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..I.I", "Qm", OP_COPY_LN>; + def VDUP_LANE2_MF8 : WOpInst<"vdup_laneq", ".QI", "mQm", OP_DUP_LN>; + def VTRN1_MF8 : SOpInst<"vtrn1", "...", "mQm", OP_TRN1>; + def VZIP1_MF8 : SOpInst<"vzip1", "...", "mQ