[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
c-rhodes added inline comments. Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll:809 +define @rev_bf16( %a) { +; CHECK-LABEL: rev_bf16 sdesmalen wrote: > Does this test not need the `+bf16` attribute to work? (which implies the > patterns are missing the right predicate) It should do, sorry I missed that. I've tried: ```diff --git a/llvm/lib/Target/AArch64/SVEInstrFormats.td b/llvm/lib/Target/AArch64/SVEInstrFormats.td index 46cca2a..5ab2502 100644 --- a/llvm/lib/Target/AArch64/SVEInstrFormats.td +++ b/llvm/lib/Target/AArch64/SVEInstrFormats.td @@ -1124,10 +1124,13 @@ multiclass sve_int_perm_reverse_z { def : SVE_1_Op_Pat(NAME # _S)>; def : SVE_1_Op_Pat(NAME # _D)>; - def : SVE_1_Op_Pat(NAME # _H)>; def : SVE_1_Op_Pat(NAME # _H)>; def : SVE_1_Op_Pat(NAME # _S)>; def : SVE_1_Op_Pat(NAME # _D)>; + + let Predicates = [HasBF16] in { +def : SVE_1_Op_Pat(NAME # _H)>; + } }``` but this still works without `+bf16`. I noticed in your patch D82187 you check `Subtarget->hasBF16()` for `MVT::nxv8bf16` at select phase of ISEL, I guess it's different here with patterns. I also noticed we add the register class for `MVT::nxv8bf16` in AArch64ISelLowering without checking `Subtarget->hasBF16()` which I suspect is a bug. This test requires `+bf16` with that fixed but I wonder why the predicate isn't being recognised. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
c-rhodes marked an inline comment as done. c-rhodes added inline comments. Comment at: clang/include/clang/Basic/arm_sve.td:1115 +let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { +def SVREV_BF16: SInst<"svrev[_{d}]","dd", "b", MergeNone, "aarch64_sve_rev">; c-rhodes wrote: > fpetrogalli wrote: > > c-rhodes wrote: > > > c-rhodes wrote: > > > > fpetrogalli wrote: > > > > > nit: could create a multiclass here like @sdesmalen have done in > > > > > https://reviews.llvm.org/D82187, seems quite a nice way to keep the > > > > > definition of the intrinsics together (look for `multiclass > > > > > StructLoad`, for example) > > > > it might be a bit tedious having separate multiclasses, what do you > > > > think about: > > > > ```multiclass SInstBF16 > > > string i = "", > > > > list ft = [], list ch = []> { > > > > def : SInst; > > > > let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { > > > > def : SInst; > > > > } > > > > } > > > > > > > > defm SVREV: SInstBF16<"svrev[_{d}]","dd", "csilUcUsUiUlhfd", > > > > MergeNone, "aarch64_sve_rev">; > > > > defm SVSEL: SInstBF16<"svsel[_{d}]","dPdd", "csilUcUsUiUlhfd", > > > > MergeNone, "aarch64_sve_sel">; > > > > defm SVSPLICE : SInstBF16<"svsplice[_{d}]", "dPdd", "csilUcUsUiUlhfd", > > > > MergeNone, "aarch64_sve_splice">; > > > > defm SVTRN1 : SInstBF16<"svtrn1[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > > MergeNone, "aarch64_sve_trn1">; > > > > defm SVTRN2 : SInstBF16<"svtrn2[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > > MergeNone, "aarch64_sve_trn2">; > > > > defm SVUZP1 : SInstBF16<"svuzp1[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > > MergeNone, "aarch64_sve_uzp1">; > > > > defm SVUZP2 : SInstBF16<"svuzp2[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > > MergeNone, "aarch64_sve_uzp2">; > > > > defm SVZIP1 : SInstBF16<"svzip1[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > > MergeNone, "aarch64_sve_zip1">; > > > > defm SVZIP2 : SInstBF16<"svzip2[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > > MergeNone, "aarch64_sve_zip2">;``` > > > > > > > > ? > > > I've played around with this and it works great for instructions guarded > > > on a single feature flag but falls apart for the .Q forms that also > > > require `__ARM_FEATURE_SVE_MATMUL_FP64`. I suspect there's a nice way of > > > handling it in tablegen by passing the features as a list of strings and > > > joining them but I spent long enough trying to get that to work so I'm > > > going to keep it simple for now. > > > it might be a bit tedious having separate multiclasses, what do you think > > > about: > > > > Sorry I think I misunderstood you when we last discussed this. I didn't > > mean to write a multiclass that would work for ALL intrinsics that uses > > regular types and bfloats I just meant to merge together those who were > > using the same archguard and that you are adding in this patch. > > > > I think you could keep both macros in a single ArchGuard string: > > > > ``` > > multiclass SInstPerm { > > def : SInst; > > let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { > > def : SInst; > > } > > } > > > > defm SVREV: SInstPerm<"svrev[_{d}]","dd",MergeNone, > > "aarch64_sve_rev">; > > ... > > > > multiclass SInstPermMatmul { > > def : SInst; > > let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16) && > > defined(__ARM_FEATURE_SVE_MATMUL_FP64)" in { > > def : SInst; > > } > > } > > > > def SVTRN1Q : SInstPermMatmul ... > > ... > > ``` > Sure, I understood you meant separate multiclasses for each intrinsic / group > similar to what Sander implemented for structured loads / stores but I > thought it would be quite abit of extra code to implement that, hence why I > proposed a single multiclass that could handle this. I've experimented with > the `SInstBF16` multiclass I mentioned above and have it working with an > extra arg for arch features. I'll create a follow up patch and if people are > happy with it we'll move forward with that, otherwise I'm happy to implement > your suggestion. > I'll create a follow up patch https://reviews.llvm.org/D82450 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
sdesmalen added inline comments. Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll:809 +define @rev_bf16( %a) { +; CHECK-LABEL: rev_bf16 Does this test not need the `+bf16` attribute to work? (which implies the patterns are missing the right predicate) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
This revision was automatically updated to reflect the committed changes. Closed by commit rG26502ad60922: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics (authored by c-rhodes). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 Files: clang/include/clang/Basic/arm_sve.td clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_splice-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1-fp64-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2-fp64-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1-fp64-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2-fp64-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1-fp64-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2-fp64-bfloat.c llvm/lib/Target/AArch64/SVEInstrFormats.td llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll Index: llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll === --- llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll +++ llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll @@ -57,6 +57,16 @@ ret %out } +define @sel_bf16( %pg, %a, %b) { +; CHECK-LABEL: sel_bf16: +; CHECK: sel z0.h, p0, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.sel.nxv8bf16( %pg, +%a, +%b) + ret %out +} + define @sel_f16( %pg, %a, %b) { ; CHECK-LABEL: sel_f16: ; CHECK: sel z0.h, p0, z0.h, z1.h @@ -92,6 +102,7 @@ declare @llvm.aarch64.sve.sel.nxv8i16(, , ) declare @llvm.aarch64.sve.sel.nxv4i32(, , ) declare @llvm.aarch64.sve.sel.nxv2i64(, , ) +declare @llvm.aarch64.sve.sel.nxv8bf16(, , ) declare @llvm.aarch64.sve.sel.nxv8f16(, , ) declare @llvm.aarch64.sve.sel.nxv4f32(, , ) declare @llvm.aarch64.sve.sel.nxv2f64(, , ) Index: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll === --- llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll +++ llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll @@ -806,6 +806,14 @@ ret %res } +define @rev_bf16( %a) { +; CHECK-LABEL: rev_bf16 +; CHECK: rev z0.h, z0.h +; CHECK-NEXT: ret + %res = call @llvm.aarch64.sve.rev.nxv8bf16( %a) + ret %res +} + define @rev_f16( %a) { ; CHECK-LABEL: rev_f16 ; CHECK: rev z0.h, z0.h @@ -874,6 +882,16 @@ ret %out } +define @splice_bf16( %pg, %a, %b) { +; CHECK-LABEL: splice_bf16: +; CHECK: splice z0.h, p0, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.splice.nxv8bf16( %pg, + %a, + %b) + ret %out +} + define @splice_f16( %pg, %a, %b) { ; CHECK-LABEL: splice_f16: ; CHECK: splice z0.h, p0, z0.h, z1.h @@ -1168,6 +1186,15 @@ ret %out } +define @trn1_bf16( %a, %b) { +; CHECK-LABEL: trn1_bf16: +; CHECK: trn1 z0.h, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.trn1.nxv8bf16( %a, + %b) + ret %out +} + define @trn1_f16( %a, %b) { ; CHECK-LABEL: trn1_f16: ; CHECK: trn1 z0.h, z0.h, z1.h @@ -1280,6 +1307,15 @@ ret %out } +define @trn2_bf16( %a, %b) { +; CHECK-LABEL: trn2_bf16: +; CHECK: trn2 z0.h, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.trn2.nxv8bf16( %a, + %b) + ret %out +} + define @trn2_f16( %a, %b) { ; CHECK-LABEL: trn2_f16: ; CHECK: trn2 z0.h, z0.h, z1.h @@ -1392,6 +1428,15 @@ ret %out } +define @uzp1_bf16( %a, %b) { +; CHECK-LABEL: uzp1_bf16: +; CHECK: uzp1 z0.h, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.uzp1.nxv8bf16( %a, + %b) + ret %out +} + define @uzp1_f16( %a, %b) { ; CHECK-LABEL: uzp1_f16: ; CHECK: uzp1 z0.h, z0.h, z1.h @@ -1504,6 +1549,15 @@ ret %out } +define @uzp2_bf16( %a, %b) { +; CHECK-LABEL: uzp2_bf16: +; CHECK: uzp2 z0.h, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.uzp2.nxv8bf16( %a, +
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
c-rhodes added inline comments. Comment at: clang/include/clang/Basic/arm_sve.td:1115 +let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { +def SVREV_BF16: SInst<"svrev[_{d}]","dd", "b", MergeNone, "aarch64_sve_rev">; fpetrogalli wrote: > c-rhodes wrote: > > c-rhodes wrote: > > > fpetrogalli wrote: > > > > nit: could create a multiclass here like @sdesmalen have done in > > > > https://reviews.llvm.org/D82187, seems quite a nice way to keep the > > > > definition of the intrinsics together (look for `multiclass > > > > StructLoad`, for example) > > > it might be a bit tedious having separate multiclasses, what do you think > > > about: > > > ```multiclass SInstBF16 > > string i = "", > > > list ft = [], list ch = []> { > > > def : SInst; > > > let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { > > > def : SInst; > > > } > > > } > > > > > > defm SVREV: SInstBF16<"svrev[_{d}]","dd", "csilUcUsUiUlhfd", > > > MergeNone, "aarch64_sve_rev">; > > > defm SVSEL: SInstBF16<"svsel[_{d}]","dPdd", "csilUcUsUiUlhfd", > > > MergeNone, "aarch64_sve_sel">; > > > defm SVSPLICE : SInstBF16<"svsplice[_{d}]", "dPdd", "csilUcUsUiUlhfd", > > > MergeNone, "aarch64_sve_splice">; > > > defm SVTRN1 : SInstBF16<"svtrn1[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > MergeNone, "aarch64_sve_trn1">; > > > defm SVTRN2 : SInstBF16<"svtrn2[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > MergeNone, "aarch64_sve_trn2">; > > > defm SVUZP1 : SInstBF16<"svuzp1[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > MergeNone, "aarch64_sve_uzp1">; > > > defm SVUZP2 : SInstBF16<"svuzp2[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > MergeNone, "aarch64_sve_uzp2">; > > > defm SVZIP1 : SInstBF16<"svzip1[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > MergeNone, "aarch64_sve_zip1">; > > > defm SVZIP2 : SInstBF16<"svzip2[_{d}]", "ddd", "csilUcUsUiUlhfd", > > > MergeNone, "aarch64_sve_zip2">;``` > > > > > > ? > > I've played around with this and it works great for instructions guarded on > > a single feature flag but falls apart for the .Q forms that also require > > `__ARM_FEATURE_SVE_MATMUL_FP64`. I suspect there's a nice way of handling > > it in tablegen by passing the features as a list of strings and joining > > them but I spent long enough trying to get that to work so I'm going to > > keep it simple for now. > > it might be a bit tedious having separate multiclasses, what do you think > > about: > > Sorry I think I misunderstood you when we last discussed this. I didn't mean > to write a multiclass that would work for ALL intrinsics that uses regular > types and bfloats I just meant to merge together those who were using the > same archguard and that you are adding in this patch. > > I think you could keep both macros in a single ArchGuard string: > > ``` > multiclass SInstPerm { > def : SInst; > let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { > def : SInst; > } > } > > defm SVREV: SInstPerm<"svrev[_{d}]","dd",MergeNone, > "aarch64_sve_rev">; > ... > > multiclass SInstPermMatmul { > def : SInst; > let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16) && > defined(__ARM_FEATURE_SVE_MATMUL_FP64)" in { > def : SInst; > } > } > > def SVTRN1Q : SInstPermMatmul ... > ... > ``` Sure, I understood you meant separate multiclasses for each intrinsic / group similar to what Sander implemented for structured loads / stores but I thought it would be quite abit of extra code to implement that, hence why I proposed a single multiclass that could handle this. I've experimented with the `SInstBF16` multiclass I mentioned above and have it working with an extra arg for arch features. I'll create a follow up patch and if people are happy with it we'll move forward with that, otherwise I'm happy to implement your suggestion. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
fpetrogalli added inline comments. Comment at: clang/include/clang/Basic/arm_sve.td:1115 +let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { +def SVREV_BF16: SInst<"svrev[_{d}]","dd", "b", MergeNone, "aarch64_sve_rev">; c-rhodes wrote: > c-rhodes wrote: > > fpetrogalli wrote: > > > nit: could create a multiclass here like @sdesmalen have done in > > > https://reviews.llvm.org/D82187, seems quite a nice way to keep the > > > definition of the intrinsics together (look for `multiclass StructLoad`, > > > for example) > > it might be a bit tedious having separate multiclasses, what do you think > > about: > > ```multiclass SInstBF16 > i = "", > > list ft = [], list ch = []> { > > def : SInst; > > let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { > > def : SInst; > > } > > } > > > > defm SVREV: SInstBF16<"svrev[_{d}]","dd", "csilUcUsUiUlhfd", > > MergeNone, "aarch64_sve_rev">; > > defm SVSEL: SInstBF16<"svsel[_{d}]","dPdd", "csilUcUsUiUlhfd", > > MergeNone, "aarch64_sve_sel">; > > defm SVSPLICE : SInstBF16<"svsplice[_{d}]", "dPdd", "csilUcUsUiUlhfd", > > MergeNone, "aarch64_sve_splice">; > > defm SVTRN1 : SInstBF16<"svtrn1[_{d}]", "ddd", "csilUcUsUiUlhfd", > > MergeNone, "aarch64_sve_trn1">; > > defm SVTRN2 : SInstBF16<"svtrn2[_{d}]", "ddd", "csilUcUsUiUlhfd", > > MergeNone, "aarch64_sve_trn2">; > > defm SVUZP1 : SInstBF16<"svuzp1[_{d}]", "ddd", "csilUcUsUiUlhfd", > > MergeNone, "aarch64_sve_uzp1">; > > defm SVUZP2 : SInstBF16<"svuzp2[_{d}]", "ddd", "csilUcUsUiUlhfd", > > MergeNone, "aarch64_sve_uzp2">; > > defm SVZIP1 : SInstBF16<"svzip1[_{d}]", "ddd", "csilUcUsUiUlhfd", > > MergeNone, "aarch64_sve_zip1">; > > defm SVZIP2 : SInstBF16<"svzip2[_{d}]", "ddd", "csilUcUsUiUlhfd", > > MergeNone, "aarch64_sve_zip2">;``` > > > > ? > I've played around with this and it works great for instructions guarded on a > single feature flag but falls apart for the .Q forms that also require > `__ARM_FEATURE_SVE_MATMUL_FP64`. I suspect there's a nice way of handling it > in tablegen by passing the features as a list of strings and joining them but > I spent long enough trying to get that to work so I'm going to keep it simple > for now. > it might be a bit tedious having separate multiclasses, what do you think > about: Sorry I think I misunderstood you when we last discussed this. I didn't mean to write a multiclass that would work for ALL intrinsics that uses regular types and bfloats I just meant to merge together those who were using the same archguard and that you are adding in this patch. I think you could keep both macros in a single ArchGuard string: ``` multiclass SInstPerm { def : SInst; let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { def : SInst; } } defm SVREV: SInstPerm<"svrev[_{d}]","dd",MergeNone, "aarch64_sve_rev">; ... multiclass SInstPermMatmul { def : SInst; let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16) && defined(__ARM_FEATURE_SVE_MATMUL_FP64)" in { def : SInst; } } def SVTRN1Q : SInstPermMatmul ... ... ``` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
c-rhodes updated this revision to Diff 272745. c-rhodes added a comment. Changes: - Moved bfloat tests to separate files. - Added checks to test intrinsics are guarded by feature flag, this is by omitting the feature macro `__ARM_FEATURE_SVE_BF16` for now but will eventually be updated to omit `+bf16` once the feature flag implies the macro. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 Files: clang/include/clang/Basic/arm_sve.td clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_splice-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1-fp64-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2-fp64-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1-fp64-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2-fp64-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1-fp64-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2-bfloat.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2-fp64-bfloat.c llvm/lib/Target/AArch64/SVEInstrFormats.td llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll Index: llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll === --- llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll +++ llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll @@ -57,6 +57,16 @@ ret %out } +define @sel_bf16( %pg, %a, %b) { +; CHECK-LABEL: sel_bf16: +; CHECK: sel z0.h, p0, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.sel.nxv8bf16( %pg, +%a, +%b) + ret %out +} + define @sel_f16( %pg, %a, %b) { ; CHECK-LABEL: sel_f16: ; CHECK: sel z0.h, p0, z0.h, z1.h @@ -92,6 +102,7 @@ declare @llvm.aarch64.sve.sel.nxv8i16(, , ) declare @llvm.aarch64.sve.sel.nxv4i32(, , ) declare @llvm.aarch64.sve.sel.nxv2i64(, , ) +declare @llvm.aarch64.sve.sel.nxv8bf16(, , ) declare @llvm.aarch64.sve.sel.nxv8f16(, , ) declare @llvm.aarch64.sve.sel.nxv4f32(, , ) declare @llvm.aarch64.sve.sel.nxv2f64(, , ) Index: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll === --- llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll +++ llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll @@ -806,6 +806,14 @@ ret %res } +define @rev_bf16( %a) { +; CHECK-LABEL: rev_bf16 +; CHECK: rev z0.h, z0.h +; CHECK-NEXT: ret + %res = call @llvm.aarch64.sve.rev.nxv8bf16( %a) + ret %res +} + define @rev_f16( %a) { ; CHECK-LABEL: rev_f16 ; CHECK: rev z0.h, z0.h @@ -874,6 +882,16 @@ ret %out } +define @splice_bf16( %pg, %a, %b) { +; CHECK-LABEL: splice_bf16: +; CHECK: splice z0.h, p0, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.splice.nxv8bf16( %pg, + %a, + %b) + ret %out +} + define @splice_f16( %pg, %a, %b) { ; CHECK-LABEL: splice_f16: ; CHECK: splice z0.h, p0, z0.h, z1.h @@ -1168,6 +1186,15 @@ ret %out } +define @trn1_bf16( %a, %b) { +; CHECK-LABEL: trn1_bf16: +; CHECK: trn1 z0.h, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.trn1.nxv8bf16( %a, + %b) + ret %out +} + define @trn1_f16( %a, %b) { ; CHECK-LABEL: trn1_f16: ; CHECK: trn1 z0.h, z0.h, z1.h @@ -1280,6 +1307,15 @@ ret %out } +define @trn2_bf16( %a, %b) { +; CHECK-LABEL: trn2_bf16: +; CHECK: trn2 z0.h, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.trn2.nxv8bf16( %a, + %b) + ret %out +} + define @trn2_f16( %a, %b) { ; CHECK-LABEL: trn2_f16: ; CHECK: trn2 z0.h, z0.h, z1.h @@ -1392,6 +1428,15 @@ ret %out } +define @uzp1_bf16( %a, %b) { +; CHECK-LABEL: uzp1_bf16: +; CHECK: uzp1 z0.h, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.uzp1.nxv8bf16( %a, + %b) + ret %out +} + define @uzp1_f16( %a, %b) { ; CHECK-LABEL: uzp1_f16: ; CHECK: uzp1 z0.h, z0.h, z1.h @@ -1504,6 +1549,15 @@ ret %out } +define @uzp2_bf16( %a, %b) { +; CHECK-LABEL:
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
c-rhodes added inline comments. Comment at: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel.c:2 // REQUIRES: aarch64-registered-target -// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s -// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s -// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -o - %s >/dev/null 2>%t +// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -D__ARM_FEATURE_SVE_BF16 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -D__ARM_FEATURE_SVE_BF16 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s sdesmalen wrote: > Can you move the clang bfloat tests to separate files and add a RUN line > similar to what we've done for the sve2 tests (to check that we get a > diagnostic if +bf16 is not specified) ? I can move the bfloat tests to separate files but I'm not sure about the RUN line, if `+bf16` is omitted we get the following: ```/home/culrho01/llvm-project/build/bin/clang -cc1 -internal-isystem /home/culrho01/llvm-project/build/lib/clang/11.0.0/include -nostdsysteminc -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -D__ARM_FEATURE_SVE_BF16 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -fsyntax-only -verify /home/culrho01/llvm-project/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev-bfloat.c error: no expected directives found: consider use of 'expected-no-diagnostics' error: 'error' diagnostics seen but not expected: File /home/culrho01/llvm-project/build/lib/clang/11.0.0/include/arm_bf16.h Line 14: __bf16 is not supported on this target File /home/culrho01/llvm-project/build/lib/clang/11.0.0/include/arm_sve.h Line 52: __bf16 is not supported on this target 3 errors generated.``` Whereas I think the desired behaviour we want to test as we do for sve2 is checking the intrinsics are guarded with the right feature flag, which at the moment would be omitting `-D__ARM_FEATURE_SVE_BF16` from the RUN line, until `+bf16` implies `-D__ARM_FEATURE_SVE_BF16` anyway, which is when the ACLE is fully complete. Should I do that? I guess we'd want to update this RUN line to omit `+bf16` when it implies the feature macro Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
sdesmalen added inline comments. Comment at: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel.c:2 // REQUIRES: aarch64-registered-target -// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s -// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s -// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -o - %s >/dev/null 2>%t +// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -D__ARM_FEATURE_SVE_BF16 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -D__ARM_FEATURE_SVE_BF16 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s Can you move the clang bfloat tests to separate files and add a RUN line similar to what we've done for the sve2 tests (to check that we get a diagnostic if +bf16 is not specified) ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
c-rhodes added inline comments. Comment at: clang/include/clang/Basic/arm_sve.td:1115 +let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { +def SVREV_BF16: SInst<"svrev[_{d}]","dd", "b", MergeNone, "aarch64_sve_rev">; c-rhodes wrote: > fpetrogalli wrote: > > nit: could create a multiclass here like @sdesmalen have done in > > https://reviews.llvm.org/D82187, seems quite a nice way to keep the > > definition of the intrinsics together (look for `multiclass StructLoad`, > > for example) > it might be a bit tedious having separate multiclasses, what do you think > about: > ```multiclass SInstBF16 = "", > list ft = [], list ch = []> { > def : SInst; > let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { > def : SInst; > } > } > > defm SVREV: SInstBF16<"svrev[_{d}]","dd", "csilUcUsUiUlhfd", > MergeNone, "aarch64_sve_rev">; > defm SVSEL: SInstBF16<"svsel[_{d}]","dPdd", "csilUcUsUiUlhfd", > MergeNone, "aarch64_sve_sel">; > defm SVSPLICE : SInstBF16<"svsplice[_{d}]", "dPdd", "csilUcUsUiUlhfd", > MergeNone, "aarch64_sve_splice">; > defm SVTRN1 : SInstBF16<"svtrn1[_{d}]", "ddd", "csilUcUsUiUlhfd", > MergeNone, "aarch64_sve_trn1">; > defm SVTRN2 : SInstBF16<"svtrn2[_{d}]", "ddd", "csilUcUsUiUlhfd", > MergeNone, "aarch64_sve_trn2">; > defm SVUZP1 : SInstBF16<"svuzp1[_{d}]", "ddd", "csilUcUsUiUlhfd", > MergeNone, "aarch64_sve_uzp1">; > defm SVUZP2 : SInstBF16<"svuzp2[_{d}]", "ddd", "csilUcUsUiUlhfd", > MergeNone, "aarch64_sve_uzp2">; > defm SVZIP1 : SInstBF16<"svzip1[_{d}]", "ddd", "csilUcUsUiUlhfd", > MergeNone, "aarch64_sve_zip1">; > defm SVZIP2 : SInstBF16<"svzip2[_{d}]", "ddd", "csilUcUsUiUlhfd", > MergeNone, "aarch64_sve_zip2">;``` > > ? I've played around with this and it works great for instructions guarded on a single feature flag but falls apart for the .Q forms that also require `__ARM_FEATURE_SVE_MATMUL_FP64`. I suspect there's a nice way of handling it in tablegen by passing the features as a list of strings and joining them but I spent long enough trying to get that to work so I'm going to keep it simple for now. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
c-rhodes added inline comments. Comment at: clang/include/clang/Basic/arm_sve.td:1115 +let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { +def SVREV_BF16: SInst<"svrev[_{d}]","dd", "b", MergeNone, "aarch64_sve_rev">; fpetrogalli wrote: > nit: could create a multiclass here like @sdesmalen have done in > https://reviews.llvm.org/D82187, seems quite a nice way to keep the > definition of the intrinsics together (look for `multiclass StructLoad`, for > example) it might be a bit tedious having separate multiclasses, what do you think about: ```multiclass SInstBF16 ft = [], list ch = []> { def : SInst; let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { def : SInst; } } defm SVREV: SInstBF16<"svrev[_{d}]","dd", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_rev">; defm SVSEL: SInstBF16<"svsel[_{d}]","dPdd", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_sel">; defm SVSPLICE : SInstBF16<"svsplice[_{d}]", "dPdd", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_splice">; defm SVTRN1 : SInstBF16<"svtrn1[_{d}]", "ddd", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_trn1">; defm SVTRN2 : SInstBF16<"svtrn2[_{d}]", "ddd", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_trn2">; defm SVUZP1 : SInstBF16<"svuzp1[_{d}]", "ddd", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_uzp1">; defm SVUZP2 : SInstBF16<"svuzp2[_{d}]", "ddd", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_uzp2">; defm SVZIP1 : SInstBF16<"svzip1[_{d}]", "ddd", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_zip1">; defm SVZIP2 : SInstBF16<"svzip2[_{d}]", "ddd", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_zip2">;``` ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
david-arm accepted this revision. david-arm added a comment. This revision is now accepted and ready to land. LGTM! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
fpetrogalli added inline comments. Comment at: clang/include/clang/Basic/arm_sve.td:1115 +let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in { +def SVREV_BF16: SInst<"svrev[_{d}]","dd", "b", MergeNone, "aarch64_sve_rev">; nit: could create a multiclass here like @sdesmalen have done in https://reviews.llvm.org/D82187, seems quite a nice way to keep the definition of the intrinsics together (look for `multiclass StructLoad`, for example) Comment at: clang/include/clang/Basic/arm_sve.td:1298 +let ArchGuard = "defined(__ARM_FEATURE_SVE_MATMUL_FP64) && defined(__ARM_FEATURE_SVE_BF16)" in { +def SVTRN1Q_BF16 : SInst<"svtrn1q[_{d}]", "ddd", "b", MergeNone, "aarch64_sve_trn1q">; Same here, could use a multiclass to merge the "regular" intrinsics definition with the BF ones. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82182/new/ https://reviews.llvm.org/D82182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
c-rhodes created this revision. c-rhodes added reviewers: sdesmalen, efriedma, stuij, david-arm, fpetrogalli, kmclaughlin. Herald added subscribers: danielkiss, psnobl, rkruppe, hiraditya, kristof.beyls, tschuett. Herald added projects: clang, LLVM. Added for following intrinsics: - zip1, zip2, zip1q, zip2q - trn1, trn2, trn1q, trn2q - uzp1, uzp2, uzp1q, uzp2q - splice - rev - sel Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D82182 Files: clang/include/clang/Basic/arm_sve.td clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_splice.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1-fp64.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2-fp64.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1-fp64.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2-fp64.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1-fp64.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2-fp64.c clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c llvm/lib/Target/AArch64/SVEInstrFormats.td llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll Index: llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll === --- llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll +++ llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll @@ -57,6 +57,16 @@ ret %out } +define @sel_bf16( %pg, %a, %b) { +; CHECK-LABEL: sel_bf16: +; CHECK: sel z0.h, p0, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.sel.nxv8bf16( %pg, +%a, +%b) + ret %out +} + define @sel_f16( %pg, %a, %b) { ; CHECK-LABEL: sel_f16: ; CHECK: sel z0.h, p0, z0.h, z1.h @@ -92,6 +102,7 @@ declare @llvm.aarch64.sve.sel.nxv8i16(, , ) declare @llvm.aarch64.sve.sel.nxv4i32(, , ) declare @llvm.aarch64.sve.sel.nxv2i64(, , ) +declare @llvm.aarch64.sve.sel.nxv8bf16(, , ) declare @llvm.aarch64.sve.sel.nxv8f16(, , ) declare @llvm.aarch64.sve.sel.nxv4f32(, , ) declare @llvm.aarch64.sve.sel.nxv2f64(, , ) Index: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll === --- llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll +++ llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll @@ -803,6 +803,14 @@ ret %res } +define @rev_bf16( %a) { +; CHECK-LABEL: rev_bf16 +; CHECK: rev z0.h, z0.h +; CHECK-NEXT: ret + %res = call @llvm.aarch64.sve.rev.nxv8bf16( %a) + ret %res +} + define @rev_f16( %a) { ; CHECK-LABEL: rev_f16 ; CHECK: rev z0.h, z0.h @@ -871,6 +879,16 @@ ret %out } +define @splice_bf16( %pg, %a, %b) { +; CHECK-LABEL: splice_bf16: +; CHECK: splice z0.h, p0, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.splice.nxv8bf16( %pg, + %a, + %b) + ret %out +} + define @splice_f16( %pg, %a, %b) { ; CHECK-LABEL: splice_f16: ; CHECK: splice z0.h, p0, z0.h, z1.h @@ -1165,6 +1183,15 @@ ret %out } +define @trn1_bf16( %a, %b) { +; CHECK-LABEL: trn1_bf16: +; CHECK: trn1 z0.h, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.trn1.nxv8bf16( %a, + %b) + ret %out +} + define @trn1_f16( %a, %b) { ; CHECK-LABEL: trn1_f16: ; CHECK: trn1 z0.h, z0.h, z1.h @@ -1277,6 +1304,15 @@ ret %out } +define @trn2_bf16( %a, %b) { +; CHECK-LABEL: trn2_bf16: +; CHECK: trn2 z0.h, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.trn2.nxv8bf16( %a, + %b) + ret %out +} + define @trn2_f16( %a, %b) { ; CHECK-LABEL: trn2_f16: ; CHECK: trn2 z0.h, z0.h, z1.h @@ -1389,6 +1425,15 @@ ret %out } +define @uzp1_bf16( %a, %b) { +; CHECK-LABEL: uzp1_bf16: +; CHECK: uzp1 z0.h, z0.h, z1.h +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.uzp1.nxv8bf16( %a, + %b) + ret %out +} + define @uzp1_f16( %a, %b) { ; CHECK-LABEL: uzp1_f16: ; CHECK: uzp1 z0.h, z0.h, z1.h @@ -1501,6 +1546,15 @@ ret %out } +define @uzp2_bf16( %a, %b) { +; CHECK-LABEL: uzp2_bf16: +; CHECK: uzp2 z0.h, z0.h, z1.h +; CHECK-NEXT: ret + %out = call