[PATCH] D53633: [AArch64] Implement FP16FML intrinsics
SjoerdMeijer added a comment. FYI: a new ACLE version has been published, please find it here: https://developer.arm.com/architectures/system-architectures/software-standards/acle The "Neon Intrinsics" section contains these new intrinsics. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D53633/new/ https://reviews.llvm.org/D53633 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D53633: [AArch64] Implement FP16FML intrinsics
ab added inline comments. Comment at: cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c:12 + +float32x2_t test_vfmlal_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) { +// CHECK-LABEL: define <2 x float> @test_vfmlal_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c) SjoerdMeijer wrote: > SjoerdMeijer wrote: > > ab wrote: > > > Hey folks, I'm curious: where does the "_u32" suffix come from? Should it > > > be _f16? > > > > > > Also, are there any new ACLE/intrinsic list documents? As far as I can > > > tell there hasn't been any release since IHI0073B/IHI0053D. > > > Also, are there any new ACLE/intrinsic list documents? As far as I can > > > tell there hasn't been any release since IHI0073B/IHI0053D. > > > > I've checked, and an updated ACLE that includes these FP16FML intrinsics is > > coming soon. > > > > > where does the "_u32" suffix come from? Should it be _f16? > > > > Good question. It could probably be _f32 or _f16, but _u32 doesn't seem to > > make much sense. Looks like the spec says _u32, and that's also what GCC > > has implemented. I think we want to update the spec and fix the name before > > the updated spec is available. Will chase this, and let you know once I > > know more. > An update on this: we should change this to _f32 (because the first suffixes > were refering to the ouput type). The ACLE will be updated accordingly, and > also GCC will change its current implementation (from _u32 to _f32). Many > thanks for raising this issue. > Is there a volunteer to prepare a patch? Or do you have one already? :-) I > could look at it, but that will be towards the end of next week. > I've checked, and an updated ACLE that includes these FP16FML intrinsics is > coming soon. Great, thanks! > An update on this: we should change this to _f32 (because the first suffixes > were refering to the ouput type). Hmm, I was thinking _f16 based on the vmlal intrinsics: they seem to be named after the multiplication type rather than that of the accumulator/output. Either way seems fine to me though, I'll defer to you folks. > The ACLE will be updated accordingly, and also GCC will change its current > implementation (from _u32 to _f32). Many thanks for raising this issue. Is there a volunteer to prepare a patch? Or do you have one already? :-) I could look at it, but that will be towards the end of next week. Sure: D58306 (with _f16 though, let me know what you think of vmlal) Thanks for checking! Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D53633/new/ https://reviews.llvm.org/D53633 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D53633: [AArch64] Implement FP16FML intrinsics
SjoerdMeijer added inline comments. Comment at: cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c:12 + +float32x2_t test_vfmlal_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) { +// CHECK-LABEL: define <2 x float> @test_vfmlal_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c) SjoerdMeijer wrote: > ab wrote: > > Hey folks, I'm curious: where does the "_u32" suffix come from? Should it > > be _f16? > > > > Also, are there any new ACLE/intrinsic list documents? As far as I can tell > > there hasn't been any release since IHI0073B/IHI0053D. > > Also, are there any new ACLE/intrinsic list documents? As far as I can tell > > there hasn't been any release since IHI0073B/IHI0053D. > > I've checked, and an updated ACLE that includes these FP16FML intrinsics is > coming soon. > > > where does the "_u32" suffix come from? Should it be _f16? > > Good question. It could probably be _f32 or _f16, but _u32 doesn't seem to > make much sense. Looks like the spec says _u32, and that's also what GCC has > implemented. I think we want to update the spec and fix the name before the > updated spec is available. Will chase this, and let you know once I know more. An update on this: we should change this to _f32 (because the first suffixes were refering to the ouput type). The ACLE will be updated accordingly, and also GCC will change its current implementation (from _u32 to _f32). Many thanks for raising this issue. Is there a volunteer to prepare a patch? Or do you have one already? :-) I could look at it, but that will be towards the end of next week. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D53633/new/ https://reviews.llvm.org/D53633 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D53633: [AArch64] Implement FP16FML intrinsics
SjoerdMeijer added inline comments. Comment at: cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c:12 + +float32x2_t test_vfmlal_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) { +// CHECK-LABEL: define <2 x float> @test_vfmlal_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c) ab wrote: > Hey folks, I'm curious: where does the "_u32" suffix come from? Should it be > _f16? > > Also, are there any new ACLE/intrinsic list documents? As far as I can tell > there hasn't been any release since IHI0073B/IHI0053D. > Also, are there any new ACLE/intrinsic list documents? As far as I can tell > there hasn't been any release since IHI0073B/IHI0053D. I've checked, and an updated ACLE that includes these FP16FML intrinsics is coming soon. > where does the "_u32" suffix come from? Should it be _f16? Good question. It could probably be _f32 or _f16, but _u32 doesn't seem to make much sense. Looks like the spec says _u32, and that's also what GCC has implemented. I think we want to update the spec and fix the name before the updated spec is available. Will chase this, and let you know once I know more. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D53633/new/ https://reviews.llvm.org/D53633 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D53633: [AArch64] Implement FP16FML intrinsics
ab added inline comments. Herald added a subscriber: jdoerfert. Herald added a project: LLVM. Comment at: cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c:12 + +float32x2_t test_vfmlal_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) { +// CHECK-LABEL: define <2 x float> @test_vfmlal_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c) Hey folks, I'm curious: where does the "_u32" suffix come from? Should it be _f16? Also, are there any new ACLE/intrinsic list documents? As far as I can tell there hasn't been any release since IHI0073B/IHI0053D. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D53633/new/ https://reviews.llvm.org/D53633 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D53633: [AArch64] Implement FP16FML intrinsics
This revision was automatically updated to reflect the committed changes. Closed by commit rL345344: [AArch64] Implement FP16FML intrinsics (authored by bryanpkc, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D53633?vs=170811=171230#toc Repository: rL LLVM https://reviews.llvm.org/D53633 Files: cfe/trunk/include/clang/Basic/arm_neon.td cfe/trunk/include/clang/Basic/arm_neon_incl.td cfe/trunk/lib/Basic/Targets/AArch64.cpp cfe/trunk/lib/Basic/Targets/AArch64.h cfe/trunk/lib/CodeGen/CGBuiltin.cpp cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c cfe/trunk/test/Preprocessor/aarch64-target-features.c cfe/trunk/utils/TableGen/NeonEmitter.cpp Index: cfe/trunk/utils/TableGen/NeonEmitter.cpp === --- cfe/trunk/utils/TableGen/NeonEmitter.cpp +++ cfe/trunk/utils/TableGen/NeonEmitter.cpp @@ -494,6 +494,7 @@ std::pair emitDagSaveTemp(DagInit *DI); std::pair emitDagSplat(DagInit *DI); std::pair emitDagDup(DagInit *DI); +std::pair emitDagDupTyped(DagInit *DI); std::pair emitDagShuffle(DagInit *DI); std::pair emitDagCast(DagInit *DI, bool IsBitCast); std::pair emitDagCall(DagInit *DI); @@ -897,6 +898,18 @@ Float = true; ElementBitwidth = 16; break; + case '0': +Float = true; +if (AppliedQuad) + Bitwidth /= 2; +ElementBitwidth = 16; +break; + case '1': +Float = true; +if (!AppliedQuad) + Bitwidth *= 2; +ElementBitwidth = 16; +break; case 'g': if (AppliedQuad) Bitwidth /= 2; @@ -1507,6 +1520,8 @@ return emitDagShuffle(DI); if (Op == "dup") return emitDagDup(DI); + if (Op == "dup_typed") +return emitDagDupTyped(DI); if (Op == "splat") return emitDagSplat(DI); if (Op == "save_temp") @@ -1771,6 +1786,28 @@ return std::make_pair(T, S); } +std::pair Intrinsic::DagEmitter::emitDagDupTyped(DagInit *DI) { + assert_with_loc(DI->getNumArgs() == 2, "dup_typed() expects two arguments"); + std::pair A = emitDagArg(DI->getArg(0), + DI->getArgNameStr(0)); + std::pair B = emitDagArg(DI->getArg(1), + DI->getArgNameStr(1)); + assert_with_loc(B.first.isScalar(), + "dup_typed() requires a scalar as the second argument"); + + Type T = A.first; + assert_with_loc(T.isVector(), "dup_typed() used but target type is scalar!"); + std::string S = "(" + T.str() + ") {"; + for (unsigned I = 0; I < T.getNumElements(); ++I) { +if (I != 0) + S += ", "; +S += B.second; + } + S += "}"; + + return std::make_pair(T, S); +} + std::pair Intrinsic::DagEmitter::emitDagSplat(DagInit *DI) { assert_with_loc(DI->getNumArgs() == 2, "splat() expects two arguments"); std::pair A = emitDagArg(DI->getArg(0), Index: cfe/trunk/include/clang/Basic/arm_neon.td === --- cfe/trunk/include/clang/Basic/arm_neon.td +++ cfe/trunk/include/clang/Basic/arm_neon.td @@ -206,6 +206,15 @@ : Op<(call "vdot", $p0, $p1, (bitcast $p1, (splat(bitcast "uint32x4_t", $p2), $p3)))>; +def OP_FMLAL_LN : Op<(call "vfmlal_low", $p0, $p1, + (dup_typed $p1, (call "vget_lane", $p2, $p3)))>; +def OP_FMLSL_LN : Op<(call "vfmlsl_low", $p0, $p1, + (dup_typed $p1, (call "vget_lane", $p2, $p3)))>; +def OP_FMLAL_LN_Hi : Op<(call "vfmlal_high", $p0, $p1, + (dup_typed $p1, (call "vget_lane", $p2, $p3)))>; +def OP_FMLSL_LN_Hi : Op<(call "vfmlsl_high", $p0, $p1, + (dup_typed $p1, (call "vget_lane", $p2, $p3)))>; + //===--===// // Instructions //===--===// @@ -1640,3 +1649,21 @@ // Variants indexing into a 128-bit vector are A64 only. def UDOT_LANEQ : SOpInst<"vdot_laneq", "dd89i", "iUiQiQUi", OP_DOT_LNQ>; } + +// v8.2-A FP16 fused multiply-add long instructions. +let ArchGuard = "defined(__ARM_FEATURE_FP16FML) && defined(__aarch64__)" in { + def VFMLAL_LOW : SInst<"vfmlal_low", "ffHH", "UiQUi">; + def VFMLSL_LOW : SInst<"vfmlsl_low", "ffHH", "UiQUi">; + def VFMLAL_HIGH : SInst<"vfmlal_high", "ffHH", "UiQUi">; + def VFMLSL_HIGH : SInst<"vfmlsl_high", "ffHH", "UiQUi">; + + def VFMLAL_LANE_LOW : SOpInst<"vfmlal_lane_low", "ffH0i", "UiQUi", OP_FMLAL_LN>; + def VFMLSL_LANE_LOW : SOpInst<"vfmlsl_lane_low", "ffH0i", "UiQUi", OP_FMLSL_LN>; + def VFMLAL_LANE_HIGH : SOpInst<"vfmlal_lane_high", "ffH0i", "UiQUi", OP_FMLAL_LN_Hi>; + def VFMLSL_LANE_HIGH : SOpInst<"vfmlsl_lane_high", "ffH0i", "UiQUi", OP_FMLSL_LN_Hi>; + + def VFMLAL_LANEQ_LOW : SOpInst<"vfmlal_laneq_low", "ffH1i", "UiQUi", OP_FMLAL_LN>; + def VFMLSL_LANEQ_LOW : SOpInst<"vfmlsl_laneq_low",
[PATCH] D53633: [AArch64] Implement FP16FML intrinsics
bryanpkc added a comment. In https://reviews.llvm.org/D53633#1274621, @t.p.northover wrote: > I think this is reasonable. Thanks Tim. Could you also review https://reviews.llvm.org/D53632, which is the LLVM part of this implementation? Repository: rC Clang https://reviews.llvm.org/D53633 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D53633: [AArch64] Implement FP16FML intrinsics
t.p.northover accepted this revision. t.p.northover added a comment. This revision is now accepted and ready to land. I think this is reasonable. Repository: rC Clang https://reviews.llvm.org/D53633 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D53633: [AArch64] Implement FP16FML intrinsics
bryanpkc created this revision. bryanpkc added reviewers: SjoerdMeijer, bogden, efriedma, t.p.northover. Herald added subscribers: cfe-commits, kristof.beyls, javed.absar. Generate the FP16FML intrinsics into arm_neon.h (AArch64 only for now). Add two new type modifiers to NeonEmitter to handle the new prototypes. Define __ARM_FEATURE_FP16FML when +fp16fml is enabled and guard the intrinsics with the macro in arm_neon.h. Based on a patch by Gao Yiling. Repository: rC Clang https://reviews.llvm.org/D53633 Files: include/clang/Basic/arm_neon.td include/clang/Basic/arm_neon_incl.td lib/Basic/Targets/AArch64.cpp lib/Basic/Targets/AArch64.h lib/CodeGen/CGBuiltin.cpp test/CodeGen/aarch64-neon-fp16fml.c test/Preprocessor/aarch64-target-features.c utils/TableGen/NeonEmitter.cpp Index: utils/TableGen/NeonEmitter.cpp === --- utils/TableGen/NeonEmitter.cpp +++ utils/TableGen/NeonEmitter.cpp @@ -494,6 +494,7 @@ std::pair emitDagSaveTemp(DagInit *DI); std::pair emitDagSplat(DagInit *DI); std::pair emitDagDup(DagInit *DI); +std::pair emitDagDupTyped(DagInit *DI); std::pair emitDagShuffle(DagInit *DI); std::pair emitDagCast(DagInit *DI, bool IsBitCast); std::pair emitDagCall(DagInit *DI); @@ -897,6 +898,18 @@ Float = true; ElementBitwidth = 16; break; + case '0': +Float = true; +if (AppliedQuad) + Bitwidth /= 2; +ElementBitwidth = 16; +break; + case '1': +Float = true; +if (!AppliedQuad) + Bitwidth *= 2; +ElementBitwidth = 16; +break; case 'g': if (AppliedQuad) Bitwidth /= 2; @@ -1507,6 +1520,8 @@ return emitDagShuffle(DI); if (Op == "dup") return emitDagDup(DI); + if (Op == "dup_typed") +return emitDagDupTyped(DI); if (Op == "splat") return emitDagSplat(DI); if (Op == "save_temp") @@ -1771,6 +1786,28 @@ return std::make_pair(T, S); } +std::pair Intrinsic::DagEmitter::emitDagDupTyped(DagInit *DI) { + assert_with_loc(DI->getNumArgs() == 2, "dup_typed() expects two arguments"); + std::pair A = emitDagArg(DI->getArg(0), + DI->getArgNameStr(0)); + std::pair B = emitDagArg(DI->getArg(1), + DI->getArgNameStr(1)); + assert_with_loc(B.first.isScalar(), + "dup_typed() requires a scalar as the second argument"); + + Type T = A.first; + assert_with_loc(T.isVector(), "dup_typed() used but target type is scalar!"); + std::string S = "(" + T.str() + ") {"; + for (unsigned I = 0; I < T.getNumElements(); ++I) { +if (I != 0) + S += ", "; +S += B.second; + } + S += "}"; + + return std::make_pair(T, S); +} + std::pair Intrinsic::DagEmitter::emitDagSplat(DagInit *DI) { assert_with_loc(DI->getNumArgs() == 2, "splat() expects two arguments"); std::pair A = emitDagArg(DI->getArg(0), Index: test/Preprocessor/aarch64-target-features.c === --- test/Preprocessor/aarch64-target-features.c +++ test/Preprocessor/aarch64-target-features.c @@ -93,16 +93,20 @@ // RUN: %clang -target aarch64-none-linux-gnu -march=armv8.2a+dotprod -x c -E -dM %s -o - | FileCheck --check-prefix=CHECK-DOTPROD %s // CHECK-DOTPROD: __ARM_FEATURE_DOTPROD 1 -// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16fml+fp16 -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s -// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16+fp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s -// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+fp16+nofp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s -// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s -// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16 -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s -// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16fml+fp16 -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s -// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16+fp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s -// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16+nofp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s -// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s -// RUN: %clang -target