[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-24 Thread Cullen Rhodes via Phabricator via cfe-commits
c-rhodes added inline comments.



Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll:809
 
+define  @rev_bf16( %a) {
+; CHECK-LABEL: rev_bf16

sdesmalen wrote:
> Does this test not need the `+bf16` attribute to work? (which implies the 
> patterns are missing the right predicate)
It should do, sorry I missed that. I've tried:
```diff --git a/llvm/lib/Target/AArch64/SVEInstrFormats.td 
b/llvm/lib/Target/AArch64/SVEInstrFormats.td
index 46cca2a..5ab2502 100644
--- a/llvm/lib/Target/AArch64/SVEInstrFormats.td
+++ b/llvm/lib/Target/AArch64/SVEInstrFormats.td
@@ -1124,10 +1124,13 @@ multiclass sve_int_perm_reverse_z {
   def : SVE_1_Op_Pat(NAME # _S)>;
   def : SVE_1_Op_Pat(NAME # _D)>;

-  def : SVE_1_Op_Pat(NAME # _H)>;
   def : SVE_1_Op_Pat(NAME # _H)>;
   def : SVE_1_Op_Pat(NAME # _S)>;
   def : SVE_1_Op_Pat(NAME # _D)>;
+
+  let Predicates = [HasBF16] in {
+def : SVE_1_Op_Pat(NAME # _H)>;
+  }
 }```

but this still works without `+bf16`. I noticed in your patch D82187 you check 
`Subtarget->hasBF16()` for `MVT::nxv8bf16` at select phase of ISEL, I guess 
it's different here with patterns. I also noticed we add the register class for 
`MVT::nxv8bf16` in AArch64ISelLowering without checking `Subtarget->hasBF16()` 
which I suspect is a bug. This test requires `+bf16` with that fixed but I 
wonder why the predicate isn't being recognised.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-24 Thread Cullen Rhodes via Phabricator via cfe-commits
c-rhodes marked an inline comment as done.
c-rhodes added inline comments.



Comment at: clang/include/clang/Basic/arm_sve.td:1115
 
+let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
+def SVREV_BF16: SInst<"svrev[_{d}]","dd",   "b", MergeNone, 
"aarch64_sve_rev">;

c-rhodes wrote:
> fpetrogalli wrote:
> > c-rhodes wrote:
> > > c-rhodes wrote:
> > > > fpetrogalli wrote:
> > > > > nit: could create a multiclass here like @sdesmalen have done in 
> > > > > https://reviews.llvm.org/D82187, seems quite a nice way to keep the 
> > > > > definition of the intrinsics together (look for `multiclass 
> > > > > StructLoad`, for example)
> > > > it might be a bit tedious having separate multiclasses, what do you 
> > > > think about:
> > > > ```multiclass SInstBF16 > > > string i = "",
> > > >  list ft = [], list ch = []> {
> > > >   def : SInst;
> > > >   let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
> > > > def : SInst;
> > > >   }
> > > > }
> > > > 
> > > > defm SVREV: SInstBF16<"svrev[_{d}]","dd",   "csilUcUsUiUlhfd", 
> > > > MergeNone, "aarch64_sve_rev">;
> > > > defm SVSEL: SInstBF16<"svsel[_{d}]","dPdd", "csilUcUsUiUlhfd", 
> > > > MergeNone, "aarch64_sve_sel">;
> > > > defm SVSPLICE : SInstBF16<"svsplice[_{d}]", "dPdd", "csilUcUsUiUlhfd", 
> > > > MergeNone, "aarch64_sve_splice">;
> > > > defm SVTRN1   : SInstBF16<"svtrn1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > > MergeNone, "aarch64_sve_trn1">;
> > > > defm SVTRN2   : SInstBF16<"svtrn2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > > MergeNone, "aarch64_sve_trn2">;
> > > > defm SVUZP1   : SInstBF16<"svuzp1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > > MergeNone, "aarch64_sve_uzp1">;
> > > > defm SVUZP2   : SInstBF16<"svuzp2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > > MergeNone, "aarch64_sve_uzp2">;
> > > > defm SVZIP1   : SInstBF16<"svzip1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > > MergeNone, "aarch64_sve_zip1">;
> > > > defm SVZIP2   : SInstBF16<"svzip2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > > MergeNone, "aarch64_sve_zip2">;```
> > > > 
> > > > ?
> > > I've played around with this and it works great for instructions guarded 
> > > on a single feature flag but falls apart for the .Q forms that also 
> > > require `__ARM_FEATURE_SVE_MATMUL_FP64`. I suspect there's a nice way of 
> > > handling it in tablegen by passing the features as a list of strings and 
> > > joining them but I spent long enough trying to get that to work so I'm 
> > > going to keep it simple for now.
> > > it might be a bit tedious having separate multiclasses, what do you think 
> > > about:
> > 
> > Sorry I think I misunderstood you when we last discussed this. I didn't 
> > mean to write a multiclass that would work for ALL intrinsics that uses 
> > regular types and bfloats I just meant to merge together those who were 
> > using the same archguard and that you are adding in this patch.
> > 
> > I think you could keep both macros in a single ArchGuard string:
> > 
> > ```
> > multiclass SInstPerm {
> >   def : SInst;
> >   let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
> > def : SInst;
> >   }
> > }
> > 
> > defm SVREV: SInstPerm<"svrev[_{d}]","dd",MergeNone, 
> > "aarch64_sve_rev">;
> > ...
> > 
> > multiclass SInstPermMatmul {
> >   def : SInst;
> >   let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16) && 
> > defined(__ARM_FEATURE_SVE_MATMUL_FP64)" in {
> > def : SInst;
> >   }
> > }
> > 
> > def SVTRN1Q : SInstPermMatmul ...
> > ...
> > ```
> Sure, I understood you meant separate multiclasses for each intrinsic / group 
> similar to what Sander implemented for structured loads / stores but I 
> thought it would be quite abit of extra code to implement that, hence why I 
> proposed a single multiclass that could handle this. I've experimented with 
> the `SInstBF16` multiclass I mentioned above and have it working with an 
> extra arg for arch features. I'll create a follow up patch and if people are 
> happy with it we'll move forward with that, otherwise I'm happy to implement 
> your suggestion.
> I'll create a follow up patch

https://reviews.llvm.org/D82450


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-24 Thread Sander de Smalen via Phabricator via cfe-commits
sdesmalen added inline comments.



Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll:809
 
+define  @rev_bf16( %a) {
+; CHECK-LABEL: rev_bf16

Does this test not need the `+bf16` attribute to work? (which implies the 
patterns are missing the right predicate)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-24 Thread Cullen Rhodes via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG26502ad60922: [AArch64][SVE] Add bfloat16 support to perm 
and select intrinsics (authored by c-rhodes).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_splice-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1-fp64-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2-fp64-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1-fp64-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2-fp64-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1-fp64-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2-fp64-bfloat.c
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll
@@ -57,6 +57,16 @@
   ret  %out
 }
 
+define  @sel_bf16( %pg,  %a,  %b) {
+; CHECK-LABEL: sel_bf16:
+; CHECK: sel z0.h, p0, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sel.nxv8bf16( %pg,
+%a,
+%b)
+  ret  %out
+}
+
 define  @sel_f16( %pg,  %a,  %b) {
 ; CHECK-LABEL: sel_f16:
 ; CHECK: sel z0.h, p0, z0.h, z1.h
@@ -92,6 +102,7 @@
 declare  @llvm.aarch64.sve.sel.nxv8i16(, , )
 declare  @llvm.aarch64.sve.sel.nxv4i32(, , )
 declare  @llvm.aarch64.sve.sel.nxv2i64(, , )
+declare  @llvm.aarch64.sve.sel.nxv8bf16(, , )
 declare  @llvm.aarch64.sve.sel.nxv8f16(, , )
 declare  @llvm.aarch64.sve.sel.nxv4f32(, , )
 declare  @llvm.aarch64.sve.sel.nxv2f64(, , )
Index: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
@@ -806,6 +806,14 @@
   ret  %res
 }
 
+define  @rev_bf16( %a) {
+; CHECK-LABEL: rev_bf16
+; CHECK: rev z0.h, z0.h
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.rev.nxv8bf16( %a)
+  ret  %res
+}
+
 define  @rev_f16( %a) {
 ; CHECK-LABEL: rev_f16
 ; CHECK: rev z0.h, z0.h
@@ -874,6 +882,16 @@
   ret  %out
 }
 
+define  @splice_bf16( %pg,  %a,  %b) {
+; CHECK-LABEL: splice_bf16:
+; CHECK: splice z0.h, p0, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.splice.nxv8bf16( %pg,
+   %a,
+   %b)
+  ret  %out
+}
+
 define  @splice_f16( %pg,  %a,  %b) {
 ; CHECK-LABEL: splice_f16:
 ; CHECK: splice z0.h, p0, z0.h, z1.h
@@ -1168,6 +1186,15 @@
   ret  %out
 }
 
+define  @trn1_bf16( %a,  %b) {
+; CHECK-LABEL: trn1_bf16:
+; CHECK: trn1 z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.trn1.nxv8bf16( %a,
+ %b)
+  ret  %out
+}
+
 define  @trn1_f16( %a,  %b) {
 ; CHECK-LABEL: trn1_f16:
 ; CHECK: trn1 z0.h, z0.h, z1.h
@@ -1280,6 +1307,15 @@
   ret  %out
 }
 
+define  @trn2_bf16( %a,  %b) {
+; CHECK-LABEL: trn2_bf16:
+; CHECK: trn2 z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.trn2.nxv8bf16( %a,
+ %b)
+  ret  %out
+}
+
 define  @trn2_f16( %a,  %b) {
 ; CHECK-LABEL: trn2_f16:
 ; CHECK: trn2 z0.h, z0.h, z1.h
@@ -1392,6 +1428,15 @@
   ret  %out
 }
 
+define  @uzp1_bf16( %a,  %b) {
+; CHECK-LABEL: uzp1_bf16:
+; CHECK: uzp1 z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uzp1.nxv8bf16( %a,
+ %b)
+  ret  %out
+}
+
 define  @uzp1_f16( %a,  %b) {
 ; CHECK-LABEL: uzp1_f16:
 ; CHECK: uzp1 z0.h, z0.h, z1.h
@@ -1504,6 +1549,15 @@
   ret  %out
 }
 
+define  @uzp2_bf16( %a,  %b) {
+; CHECK-LABEL: uzp2_bf16:
+; CHECK: uzp2 z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uzp2.nxv8bf16( %a,
+  

[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-24 Thread Cullen Rhodes via Phabricator via cfe-commits
c-rhodes added inline comments.



Comment at: clang/include/clang/Basic/arm_sve.td:1115
 
+let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
+def SVREV_BF16: SInst<"svrev[_{d}]","dd",   "b", MergeNone, 
"aarch64_sve_rev">;

fpetrogalli wrote:
> c-rhodes wrote:
> > c-rhodes wrote:
> > > fpetrogalli wrote:
> > > > nit: could create a multiclass here like @sdesmalen have done in 
> > > > https://reviews.llvm.org/D82187, seems quite a nice way to keep the 
> > > > definition of the intrinsics together (look for `multiclass 
> > > > StructLoad`, for example)
> > > it might be a bit tedious having separate multiclasses, what do you think 
> > > about:
> > > ```multiclass SInstBF16 > > string i = "",
> > >  list ft = [], list ch = []> {
> > >   def : SInst;
> > >   let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
> > > def : SInst;
> > >   }
> > > }
> > > 
> > > defm SVREV: SInstBF16<"svrev[_{d}]","dd",   "csilUcUsUiUlhfd", 
> > > MergeNone, "aarch64_sve_rev">;
> > > defm SVSEL: SInstBF16<"svsel[_{d}]","dPdd", "csilUcUsUiUlhfd", 
> > > MergeNone, "aarch64_sve_sel">;
> > > defm SVSPLICE : SInstBF16<"svsplice[_{d}]", "dPdd", "csilUcUsUiUlhfd", 
> > > MergeNone, "aarch64_sve_splice">;
> > > defm SVTRN1   : SInstBF16<"svtrn1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > MergeNone, "aarch64_sve_trn1">;
> > > defm SVTRN2   : SInstBF16<"svtrn2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > MergeNone, "aarch64_sve_trn2">;
> > > defm SVUZP1   : SInstBF16<"svuzp1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > MergeNone, "aarch64_sve_uzp1">;
> > > defm SVUZP2   : SInstBF16<"svuzp2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > MergeNone, "aarch64_sve_uzp2">;
> > > defm SVZIP1   : SInstBF16<"svzip1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > MergeNone, "aarch64_sve_zip1">;
> > > defm SVZIP2   : SInstBF16<"svzip2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > > MergeNone, "aarch64_sve_zip2">;```
> > > 
> > > ?
> > I've played around with this and it works great for instructions guarded on 
> > a single feature flag but falls apart for the .Q forms that also require 
> > `__ARM_FEATURE_SVE_MATMUL_FP64`. I suspect there's a nice way of handling 
> > it in tablegen by passing the features as a list of strings and joining 
> > them but I spent long enough trying to get that to work so I'm going to 
> > keep it simple for now.
> > it might be a bit tedious having separate multiclasses, what do you think 
> > about:
> 
> Sorry I think I misunderstood you when we last discussed this. I didn't mean 
> to write a multiclass that would work for ALL intrinsics that uses regular 
> types and bfloats I just meant to merge together those who were using the 
> same archguard and that you are adding in this patch.
> 
> I think you could keep both macros in a single ArchGuard string:
> 
> ```
> multiclass SInstPerm {
>   def : SInst;
>   let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
> def : SInst;
>   }
> }
> 
> defm SVREV: SInstPerm<"svrev[_{d}]","dd",MergeNone, 
> "aarch64_sve_rev">;
> ...
> 
> multiclass SInstPermMatmul {
>   def : SInst;
>   let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16) && 
> defined(__ARM_FEATURE_SVE_MATMUL_FP64)" in {
> def : SInst;
>   }
> }
> 
> def SVTRN1Q : SInstPermMatmul ...
> ...
> ```
Sure, I understood you meant separate multiclasses for each intrinsic / group 
similar to what Sander implemented for structured loads / stores but I thought 
it would be quite abit of extra code to implement that, hence why I proposed a 
single multiclass that could handle this. I've experimented with the 
`SInstBF16` multiclass I mentioned above and have it working with an extra arg 
for arch features. I'll create a follow up patch and if people are happy with 
it we'll move forward with that, otherwise I'm happy to implement your 
suggestion.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-23 Thread Francesco Petrogalli via Phabricator via cfe-commits
fpetrogalli added inline comments.



Comment at: clang/include/clang/Basic/arm_sve.td:1115
 
+let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
+def SVREV_BF16: SInst<"svrev[_{d}]","dd",   "b", MergeNone, 
"aarch64_sve_rev">;

c-rhodes wrote:
> c-rhodes wrote:
> > fpetrogalli wrote:
> > > nit: could create a multiclass here like @sdesmalen have done in 
> > > https://reviews.llvm.org/D82187, seems quite a nice way to keep the 
> > > definition of the intrinsics together (look for `multiclass StructLoad`, 
> > > for example)
> > it might be a bit tedious having separate multiclasses, what do you think 
> > about:
> > ```multiclass SInstBF16 > i = "",
> >  list ft = [], list ch = []> {
> >   def : SInst;
> >   let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
> > def : SInst;
> >   }
> > }
> > 
> > defm SVREV: SInstBF16<"svrev[_{d}]","dd",   "csilUcUsUiUlhfd", 
> > MergeNone, "aarch64_sve_rev">;
> > defm SVSEL: SInstBF16<"svsel[_{d}]","dPdd", "csilUcUsUiUlhfd", 
> > MergeNone, "aarch64_sve_sel">;
> > defm SVSPLICE : SInstBF16<"svsplice[_{d}]", "dPdd", "csilUcUsUiUlhfd", 
> > MergeNone, "aarch64_sve_splice">;
> > defm SVTRN1   : SInstBF16<"svtrn1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > MergeNone, "aarch64_sve_trn1">;
> > defm SVTRN2   : SInstBF16<"svtrn2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > MergeNone, "aarch64_sve_trn2">;
> > defm SVUZP1   : SInstBF16<"svuzp1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > MergeNone, "aarch64_sve_uzp1">;
> > defm SVUZP2   : SInstBF16<"svuzp2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > MergeNone, "aarch64_sve_uzp2">;
> > defm SVZIP1   : SInstBF16<"svzip1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > MergeNone, "aarch64_sve_zip1">;
> > defm SVZIP2   : SInstBF16<"svzip2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> > MergeNone, "aarch64_sve_zip2">;```
> > 
> > ?
> I've played around with this and it works great for instructions guarded on a 
> single feature flag but falls apart for the .Q forms that also require 
> `__ARM_FEATURE_SVE_MATMUL_FP64`. I suspect there's a nice way of handling it 
> in tablegen by passing the features as a list of strings and joining them but 
> I spent long enough trying to get that to work so I'm going to keep it simple 
> for now.
> it might be a bit tedious having separate multiclasses, what do you think 
> about:

Sorry I think I misunderstood you when we last discussed this. I didn't mean to 
write a multiclass that would work for ALL intrinsics that uses regular types 
and bfloats I just meant to merge together those who were using the same 
archguard and that you are adding in this patch.

I think you could keep both macros in a single ArchGuard string:

```
multiclass SInstPerm {
  def : SInst;
  let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
def : SInst;
  }
}

defm SVREV: SInstPerm<"svrev[_{d}]","dd",MergeNone, 
"aarch64_sve_rev">;
...

multiclass SInstPermMatmul {
  def : SInst;
  let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16) && 
defined(__ARM_FEATURE_SVE_MATMUL_FP64)" in {
def : SInst;
  }
}

def SVTRN1Q : SInstPermMatmul ...
...
```


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-23 Thread Cullen Rhodes via Phabricator via cfe-commits
c-rhodes updated this revision to Diff 272745.
c-rhodes added a comment.

Changes:

- Moved bfloat tests to separate files.
- Added checks to test intrinsics are guarded by feature flag, this is by 
omitting the feature macro `__ARM_FEATURE_SVE_BF16` for now but will eventually 
be updated to omit `+bf16` once the feature flag implies the macro.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_splice-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1-fp64-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2-fp64-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1-fp64-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2-fp64-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1-fp64-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2-fp64-bfloat.c
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll
@@ -57,6 +57,16 @@
   ret  %out
 }
 
+define  @sel_bf16( %pg,  %a,  %b) {
+; CHECK-LABEL: sel_bf16:
+; CHECK: sel z0.h, p0, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sel.nxv8bf16( %pg,
+%a,
+%b)
+  ret  %out
+}
+
 define  @sel_f16( %pg,  %a,  %b) {
 ; CHECK-LABEL: sel_f16:
 ; CHECK: sel z0.h, p0, z0.h, z1.h
@@ -92,6 +102,7 @@
 declare  @llvm.aarch64.sve.sel.nxv8i16(, , )
 declare  @llvm.aarch64.sve.sel.nxv4i32(, , )
 declare  @llvm.aarch64.sve.sel.nxv2i64(, , )
+declare  @llvm.aarch64.sve.sel.nxv8bf16(, , )
 declare  @llvm.aarch64.sve.sel.nxv8f16(, , )
 declare  @llvm.aarch64.sve.sel.nxv4f32(, , )
 declare  @llvm.aarch64.sve.sel.nxv2f64(, , )
Index: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
@@ -806,6 +806,14 @@
   ret  %res
 }
 
+define  @rev_bf16( %a) {
+; CHECK-LABEL: rev_bf16
+; CHECK: rev z0.h, z0.h
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.rev.nxv8bf16( %a)
+  ret  %res
+}
+
 define  @rev_f16( %a) {
 ; CHECK-LABEL: rev_f16
 ; CHECK: rev z0.h, z0.h
@@ -874,6 +882,16 @@
   ret  %out
 }
 
+define  @splice_bf16( %pg,  %a,  %b) {
+; CHECK-LABEL: splice_bf16:
+; CHECK: splice z0.h, p0, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.splice.nxv8bf16( %pg,
+   %a,
+   %b)
+  ret  %out
+}
+
 define  @splice_f16( %pg,  %a,  %b) {
 ; CHECK-LABEL: splice_f16:
 ; CHECK: splice z0.h, p0, z0.h, z1.h
@@ -1168,6 +1186,15 @@
   ret  %out
 }
 
+define  @trn1_bf16( %a,  %b) {
+; CHECK-LABEL: trn1_bf16:
+; CHECK: trn1 z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.trn1.nxv8bf16( %a,
+ %b)
+  ret  %out
+}
+
 define  @trn1_f16( %a,  %b) {
 ; CHECK-LABEL: trn1_f16:
 ; CHECK: trn1 z0.h, z0.h, z1.h
@@ -1280,6 +1307,15 @@
   ret  %out
 }
 
+define  @trn2_bf16( %a,  %b) {
+; CHECK-LABEL: trn2_bf16:
+; CHECK: trn2 z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.trn2.nxv8bf16( %a,
+ %b)
+  ret  %out
+}
+
 define  @trn2_f16( %a,  %b) {
 ; CHECK-LABEL: trn2_f16:
 ; CHECK: trn2 z0.h, z0.h, z1.h
@@ -1392,6 +1428,15 @@
   ret  %out
 }
 
+define  @uzp1_bf16( %a,  %b) {
+; CHECK-LABEL: uzp1_bf16:
+; CHECK: uzp1 z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uzp1.nxv8bf16( %a,
+ %b)
+  ret  %out
+}
+
 define  @uzp1_f16( %a,  %b) {
 ; CHECK-LABEL: uzp1_f16:
 ; CHECK: uzp1 z0.h, z0.h, z1.h
@@ -1504,6 +1549,15 @@
   ret  %out
 }
 
+define  @uzp2_bf16( %a,  %b) {
+; CHECK-LABEL: 

[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-23 Thread Cullen Rhodes via Phabricator via cfe-commits
c-rhodes added inline comments.



Comment at: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel.c:2
 // REQUIRES: aarch64-registered-target
-// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -triple aarch64-none-linux-gnu 
-target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall 
-emit-llvm -o - %s | FileCheck %s
-// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -DSVE_OVERLOADED_FORMS -triple 
aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns 
-S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s
-// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -triple aarch64-none-linux-gnu 
-target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -o 
- %s >/dev/null 2>%t
+// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC 
-D__ARM_FEATURE_SVE_BF16 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall 
-emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC 
-D__ARM_FEATURE_SVE_BF16 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns 
-S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s

sdesmalen wrote:
> Can you move the clang bfloat tests to separate files and add a RUN line 
> similar to what we've done for the sve2 tests (to check that we get a 
> diagnostic if +bf16 is not specified) ?
I can move the bfloat tests to separate files but I'm not sure about the RUN 
line, if `+bf16` is omitted we get the following:

```/home/culrho01/llvm-project/build/bin/clang -cc1 -internal-isystem 
/home/culrho01/llvm-project/build/lib/clang/11.0.0/include -nostdsysteminc 
-D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC 
-D__ARM_FEATURE_SVE_BF16 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -fsyntax-only -verify 
/home/culrho01/llvm-project/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev-bfloat.c
error: no expected directives found: consider use of 'expected-no-diagnostics'
error: 'error' diagnostics seen but not expected:
  File /home/culrho01/llvm-project/build/lib/clang/11.0.0/include/arm_bf16.h 
Line 14: __bf16 is not supported on this target
  File /home/culrho01/llvm-project/build/lib/clang/11.0.0/include/arm_sve.h 
Line 52: __bf16 is not supported on this target
3 errors generated.```

Whereas I think the desired behaviour we want to test as we do for sve2 is 
checking the intrinsics are guarded with the right feature flag, which at the 
moment would be omitting `-D__ARM_FEATURE_SVE_BF16` from the RUN line, until 
`+bf16` implies `-D__ARM_FEATURE_SVE_BF16` anyway, which is when the ACLE is 
fully complete. Should I do that? I guess we'd want to update this RUN line to 
omit `+bf16` when it implies the feature macro


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-23 Thread Sander de Smalen via Phabricator via cfe-commits
sdesmalen added inline comments.



Comment at: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel.c:2
 // REQUIRES: aarch64-registered-target
-// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -triple aarch64-none-linux-gnu 
-target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall 
-emit-llvm -o - %s | FileCheck %s
-// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -DSVE_OVERLOADED_FORMS -triple 
aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns 
-S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s
-// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -triple aarch64-none-linux-gnu 
-target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -o 
- %s >/dev/null 2>%t
+// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC 
-D__ARM_FEATURE_SVE_BF16 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall 
-emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC 
-D__ARM_FEATURE_SVE_BF16 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns 
-S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s

Can you move the clang bfloat tests to separate files and add a RUN line 
similar to what we've done for the sve2 tests (to check that we get a 
diagnostic if +bf16 is not specified) ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-23 Thread Cullen Rhodes via Phabricator via cfe-commits
c-rhodes added inline comments.



Comment at: clang/include/clang/Basic/arm_sve.td:1115
 
+let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
+def SVREV_BF16: SInst<"svrev[_{d}]","dd",   "b", MergeNone, 
"aarch64_sve_rev">;

c-rhodes wrote:
> fpetrogalli wrote:
> > nit: could create a multiclass here like @sdesmalen have done in 
> > https://reviews.llvm.org/D82187, seems quite a nice way to keep the 
> > definition of the intrinsics together (look for `multiclass StructLoad`, 
> > for example)
> it might be a bit tedious having separate multiclasses, what do you think 
> about:
> ```multiclass SInstBF16 = "",
>  list ft = [], list ch = []> {
>   def : SInst;
>   let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
> def : SInst;
>   }
> }
> 
> defm SVREV: SInstBF16<"svrev[_{d}]","dd",   "csilUcUsUiUlhfd", 
> MergeNone, "aarch64_sve_rev">;
> defm SVSEL: SInstBF16<"svsel[_{d}]","dPdd", "csilUcUsUiUlhfd", 
> MergeNone, "aarch64_sve_sel">;
> defm SVSPLICE : SInstBF16<"svsplice[_{d}]", "dPdd", "csilUcUsUiUlhfd", 
> MergeNone, "aarch64_sve_splice">;
> defm SVTRN1   : SInstBF16<"svtrn1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> MergeNone, "aarch64_sve_trn1">;
> defm SVTRN2   : SInstBF16<"svtrn2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> MergeNone, "aarch64_sve_trn2">;
> defm SVUZP1   : SInstBF16<"svuzp1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> MergeNone, "aarch64_sve_uzp1">;
> defm SVUZP2   : SInstBF16<"svuzp2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> MergeNone, "aarch64_sve_uzp2">;
> defm SVZIP1   : SInstBF16<"svzip1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> MergeNone, "aarch64_sve_zip1">;
> defm SVZIP2   : SInstBF16<"svzip2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
> MergeNone, "aarch64_sve_zip2">;```
> 
> ?
I've played around with this and it works great for instructions guarded on a 
single feature flag but falls apart for the .Q forms that also require 
`__ARM_FEATURE_SVE_MATMUL_FP64`. I suspect there's a nice way of handling it in 
tablegen by passing the features as a list of strings and joining them but I 
spent long enough trying to get that to work so I'm going to keep it simple for 
now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-19 Thread Cullen Rhodes via Phabricator via cfe-commits
c-rhodes added inline comments.



Comment at: clang/include/clang/Basic/arm_sve.td:1115
 
+let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
+def SVREV_BF16: SInst<"svrev[_{d}]","dd",   "b", MergeNone, 
"aarch64_sve_rev">;

fpetrogalli wrote:
> nit: could create a multiclass here like @sdesmalen have done in 
> https://reviews.llvm.org/D82187, seems quite a nice way to keep the 
> definition of the intrinsics together (look for `multiclass StructLoad`, for 
> example)
it might be a bit tedious having separate multiclasses, what do you think about:
```multiclass SInstBF16 ft = [], list ch = []> {
  def : SInst;
  let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
def : SInst;
  }
}

defm SVREV: SInstBF16<"svrev[_{d}]","dd",   "csilUcUsUiUlhfd", 
MergeNone, "aarch64_sve_rev">;
defm SVSEL: SInstBF16<"svsel[_{d}]","dPdd", "csilUcUsUiUlhfd", 
MergeNone, "aarch64_sve_sel">;
defm SVSPLICE : SInstBF16<"svsplice[_{d}]", "dPdd", "csilUcUsUiUlhfd", 
MergeNone, "aarch64_sve_splice">;
defm SVTRN1   : SInstBF16<"svtrn1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
MergeNone, "aarch64_sve_trn1">;
defm SVTRN2   : SInstBF16<"svtrn2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
MergeNone, "aarch64_sve_trn2">;
defm SVUZP1   : SInstBF16<"svuzp1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
MergeNone, "aarch64_sve_uzp1">;
defm SVUZP2   : SInstBF16<"svuzp2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
MergeNone, "aarch64_sve_uzp2">;
defm SVZIP1   : SInstBF16<"svzip1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
MergeNone, "aarch64_sve_zip1">;
defm SVZIP2   : SInstBF16<"svzip2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", 
MergeNone, "aarch64_sve_zip2">;```

?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-19 Thread David Sherwood via Phabricator via cfe-commits
david-arm accepted this revision.
david-arm added a comment.
This revision is now accepted and ready to land.

LGTM!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-19 Thread Francesco Petrogalli via Phabricator via cfe-commits
fpetrogalli added inline comments.



Comment at: clang/include/clang/Basic/arm_sve.td:1115
 
+let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
+def SVREV_BF16: SInst<"svrev[_{d}]","dd",   "b", MergeNone, 
"aarch64_sve_rev">;

nit: could create a multiclass here like @sdesmalen have done in 
https://reviews.llvm.org/D82187, seems quite a nice way to keep the definition 
of the intrinsics together (look for `multiclass StructLoad`, for example)



Comment at: clang/include/clang/Basic/arm_sve.td:1298
 
+let ArchGuard = "defined(__ARM_FEATURE_SVE_MATMUL_FP64) && 
defined(__ARM_FEATURE_SVE_BF16)" in {
+def SVTRN1Q_BF16  : SInst<"svtrn1q[_{d}]", "ddd",  "b", MergeNone, 
"aarch64_sve_trn1q">;

Same here, could use a multiclass to merge the "regular" intrinsics definition 
with the BF ones.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-19 Thread Cullen Rhodes via Phabricator via cfe-commits
c-rhodes created this revision.
c-rhodes added reviewers: sdesmalen, efriedma, stuij, david-arm, fpetrogalli, 
kmclaughlin.
Herald added subscribers: danielkiss, psnobl, rkruppe, hiraditya, 
kristof.beyls, tschuett.
Herald added projects: clang, LLVM.

Added for following intrinsics:

- zip1, zip2, zip1q, zip2q
- trn1, trn2, trn1q, trn2q
- uzp1, uzp2, uzp1q, uzp2q
- splice
- rev
- sel


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D82182

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_splice.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1-fp64.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2-fp64.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1-fp64.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2-fp64.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1-fp64.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2-fp64.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-sel.ll
@@ -57,6 +57,16 @@
   ret  %out
 }
 
+define  @sel_bf16( %pg,  %a,  %b) {
+; CHECK-LABEL: sel_bf16:
+; CHECK: sel z0.h, p0, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sel.nxv8bf16( %pg,
+%a,
+%b)
+  ret  %out
+}
+
 define  @sel_f16( %pg,  %a,  %b) {
 ; CHECK-LABEL: sel_f16:
 ; CHECK: sel z0.h, p0, z0.h, z1.h
@@ -92,6 +102,7 @@
 declare  @llvm.aarch64.sve.sel.nxv8i16(, , )
 declare  @llvm.aarch64.sve.sel.nxv4i32(, , )
 declare  @llvm.aarch64.sve.sel.nxv2i64(, , )
+declare  @llvm.aarch64.sve.sel.nxv8bf16(, , )
 declare  @llvm.aarch64.sve.sel.nxv8f16(, , )
 declare  @llvm.aarch64.sve.sel.nxv4f32(, , )
 declare  @llvm.aarch64.sve.sel.nxv2f64(, , )
Index: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
@@ -803,6 +803,14 @@
   ret  %res
 }
 
+define  @rev_bf16( %a) {
+; CHECK-LABEL: rev_bf16
+; CHECK: rev z0.h, z0.h
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.rev.nxv8bf16( %a)
+  ret  %res
+}
+
 define  @rev_f16( %a) {
 ; CHECK-LABEL: rev_f16
 ; CHECK: rev z0.h, z0.h
@@ -871,6 +879,16 @@
   ret  %out
 }
 
+define  @splice_bf16( %pg,  %a,  %b) {
+; CHECK-LABEL: splice_bf16:
+; CHECK: splice z0.h, p0, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.splice.nxv8bf16( %pg,
+   %a,
+   %b)
+  ret  %out
+}
+
 define  @splice_f16( %pg,  %a,  %b) {
 ; CHECK-LABEL: splice_f16:
 ; CHECK: splice z0.h, p0, z0.h, z1.h
@@ -1165,6 +1183,15 @@
   ret  %out
 }
 
+define  @trn1_bf16( %a,  %b) {
+; CHECK-LABEL: trn1_bf16:
+; CHECK: trn1 z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.trn1.nxv8bf16( %a,
+ %b)
+  ret  %out
+}
+
 define  @trn1_f16( %a,  %b) {
 ; CHECK-LABEL: trn1_f16:
 ; CHECK: trn1 z0.h, z0.h, z1.h
@@ -1277,6 +1304,15 @@
   ret  %out
 }
 
+define  @trn2_bf16( %a,  %b) {
+; CHECK-LABEL: trn2_bf16:
+; CHECK: trn2 z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.trn2.nxv8bf16( %a,
+ %b)
+  ret  %out
+}
+
 define  @trn2_f16( %a,  %b) {
 ; CHECK-LABEL: trn2_f16:
 ; CHECK: trn2 z0.h, z0.h, z1.h
@@ -1389,6 +1425,15 @@
   ret  %out
 }
 
+define  @uzp1_bf16( %a,  %b) {
+; CHECK-LABEL: uzp1_bf16:
+; CHECK: uzp1 z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uzp1.nxv8bf16( %a,
+ %b)
+  ret  %out
+}
+
 define  @uzp1_f16( %a,  %b) {
 ; CHECK-LABEL: uzp1_f16:
 ; CHECK: uzp1 z0.h, z0.h, z1.h
@@ -1501,6 +1546,15 @@
   ret  %out
 }
 
+define  @uzp2_bf16( %a,  %b) {
+; CHECK-LABEL: uzp2_bf16:
+; CHECK: uzp2 z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call