from:"David Sherwood via Phabricator via cfe\-commits"

[PATCH] D80712: [SVE] Add checks for no warnings in SVE tests

2020-05-29 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @efriedma, at least amongst all the tests in llvm/test/CodeGen/AArch64/sve-* 
there are still 66 with warnings. @sdesmalen and I discussed this and our 
reason for adding checks for warnings is mainly to do with the fact we are 
still fixing up cases and implementing SVE codgen support, so there is a chance 
that somewhere we'll introduce a regression without realising it. That's the 
main rationale for this, but I do realise that ultimately 
getVectorNumElements() will be deprecated. I also realised that the warnings 
I've added to the clang tests aren't that useful unless we're compiling them to 
real assembly so I may add some new RUN lines for those.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80712/new/

https://reviews.llvm.org/D80712



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-05-29 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi Kerry, just a couple of comments about the use of getVectorNumElements() - 
we're trying to remove calls to this function so it would be good if you could 
use  getVectorElementCount() instead. Thanks!




Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13837
+
+  if (InVT.getVectorNumElements() != (VT.getVectorNumElements()*2))
+return;

I think we want to move away from calling getVectorNumElements(), so might need 
to change this to something like

ElementCount ResEC = VT.getVectorElementCount();
if (InVT.getVectorElementCount() != (ResEC * 2))



Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13845
+  unsigned Index = CIndex->getZExtValue();
+  if ((Index != 0) && (Index != VT.getVectorNumElements()))
+return;

And here you could then change this to:

if ((Index != 0) && (Index != ResEC.Min))



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-05-29 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Sorry I forgot to mention I think we have an existing test file for extends:

llvm/test/CodeGen/AArch64/sve-sext-zext.ll

It might be worth adding these cases to that file instead of creating a new one?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D80712: [SVE] Add checks for no warnings in SVE tests

2020-06-16 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @sdesmalen @efriedma , hopefully I've addressed your review comments with my 
latest patch!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80712/new/

https://reviews.llvm.org/D80712



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D82943: [SVE] Add more warnings checks to clang and LLVM SVE tests

2020-07-07 Thread David Sherwood via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG9a1a7d888b53: [SVE] Add more warnings checks to clang and 
LLVM SVE tests (authored by david-arm).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82943/new/

https://reviews.llvm.org/D82943

Files:
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acge.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acgt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acle.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_aclt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpeq.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpge.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpgt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmple.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmplt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpne.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpuo.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dup.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_index.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1ub.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1ub.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1ub.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_pnext.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ptrue.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_setffr.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_undef.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpkhi.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpklo.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilele.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilelt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c
  llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll
  llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll
  llvm/test/CodeGen/AArch64/sve-fcmp.ll
  llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll
  llvm/test/CodeGen/AArch64/sve-gep.ll
  llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-scalar-offset.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll

Index: llvm/test/CodeG

[PATCH] D82943: [SVE] Add more warnings checks to clang and LLVM SVE tests

2020-07-07 Thread David Sherwood via Phabricator via cfe-commits

This revision was not accepted when it landed; it landed in state "Needs 
Review".
This revision was automatically updated to reflect the committed changes.
Closed by commit rG9a1a7d888b53: [SVE] Add more warnings checks to clang and 
LLVM SVE tests (authored by david-arm).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82943/new/

https://reviews.llvm.org/D82943

Files:
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acge.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acgt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acle.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_aclt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpeq.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpge.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpgt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmple.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmplt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpne.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpuo.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dup.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_index.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1ub.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1ub.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1ub.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_pnext.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ptrue.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_setffr.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_undef.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpkhi.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpklo.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilele.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilelt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c
  llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll
  llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll
  llvm/test/CodeGen/AArch64/sve-fcmp.ll
  llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll
  llvm/test/CodeGen/AArch64/sve-gep.ll
  llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-scalar-offset.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1ro-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads-f

[PATCH] D82182: [AArch64][SVE] Add bfloat16 support to perm and select intrinsics

2020-06-19 Thread David Sherwood via Phabricator via cfe-commits

david-arm accepted this revision.
david-arm added a comment.
This revision is now accepted and ready to land.

LGTM!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82182/new/

https://reviews.llvm.org/D82182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D82298: [AArch64][SVE] Add bfloat16 support to load intrinsics

2020-06-23 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1.c:2-4
+// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC 
-D__ARM_FEATURE_SVE_BF16 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall 
-emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC 
-D__ARM_FEATURE_SVE_BF16 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns 
-S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC 
-D__ARM_FEATURE_SVE_BF16 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall 
-o - %s >/dev/null 2>%t

fpetrogalli wrote:
> With @sdesmalen  we where thinking that maybe it is better to duplicate the 
> run lines to have the BF16 intrinsics tested separately:
> 
> ```
>  RUN: %clang_cc1 -D__ARM_FEATURE_SVE  ... -target-feature +sve ...
>  RUN: %clang_cc1 _DENABLE_BF16_TEST -D__ARM_FEATURE_SVE 
> -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -D__ARM_FEATURE_SVE_BF16 ... 
> -target-feature +sve -target-feature +bf16 ... 
> ```
> 
> and wrap the BF16 tests in `#ifdef ENABLE_BF16_TEST ... #endif`.
> 
> this will make sure that the non BF16 tests will be erroneously associated to 
> the BF16 flags.
> 
> Please apply these to all the run lines involving BF16 modified in this patch.
> 
Is that definite? I mean there is a difference between "we were thinking" and 
"this is how we are going to do things in future". :) Just to avoid unnecessary 
code changes that's all. I presume existing tests already written in the same 
way (committed in last week or so) would be changed too?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82298/new/

https://reviews.llvm.org/D82298



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D82501: [sve][acle] Add reinterpret intrinsics for brain float.

2020-06-24 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: 
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret-bfloat.c:5
+
+#include 
+

Hi @fpetrogalli, in the same way that you asked @kmclaughlin if she could add 
the ASM-NOT check line in her patch, are you able to do that here? You'd need 
to add an additional RUN line though to compile to assembly. Don't worry if 
it's not possible though!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82501/new/

https://reviews.llvm.org/D82501



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D82501: [sve][acle] Add reinterpret intrinsics for brain float.

2020-06-26 Thread David Sherwood via Phabricator via cfe-commits

david-arm accepted this revision.
david-arm added a comment.
This revision is now accepted and ready to land.

Can you remove the duplicate tests before submitting? Otherwise LGTM!




Comment at: llvm/test/CodeGen/AArch64/sve-bitcast-bfloat.ll:8
+
+define  @bitcast_bfloat_to_i8( %v) {
+; CHECK-LABEL: bitcast_bfloat_to_i8:

Aren't these tests all duplicates of ones in 
llvm/test/CodeGen/AArch64/sve-bitcast.ll? Looks like you can remove this file 
completely.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82501/new/

https://reviews.llvm.org/D82501



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D82746: [CodeGen] Fix warning in getNode for EXTRACT_SUBVECTOR

2020-06-30 Thread David Sherwood via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGc02332a69399: [CodeGen] Fix warning in getNode for 
EXTRACT_SUBVECTOR (authored by david-arm).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Changed prior to commit:
  https://reviews.llvm.org/D82746?vs=274022&id=274350#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82746/new/

https://reviews.llvm.org/D82746

Files:
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get2.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get3.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get4.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st2.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st3.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st4.c
  llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
===
--- llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5566,7 +5566,7 @@
 // the concat have the same type as the extract.
 if (N2C && N1.getOpcode() == ISD::CONCAT_VECTORS &&
 N1.getNumOperands() > 0 && VT == N1.getOperand(0).getValueType()) {
-  unsigned Factor = VT.getVectorNumElements();
+  unsigned Factor = VT.getVectorMinNumElements();
   return N1.getOperand(N2C->getZExtValue() / Factor);
 }
 
Index: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st4.c
===
--- clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st4.c
+++ clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st4.c
@@ -1,6 +1,11 @@
+// REQUIRES: aarch64-registered-target
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s
 // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -o - %s >/dev/null 2>%t
+// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t
 
+// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it.
+// ASM-NOT: warning
 #include 
 
 #ifdef SVE_OVERLOADED_FORMS
Index: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st3.c
===
--- clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st3.c
+++ clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st3.c
@@ -1,6 +1,11 @@
+// REQUIRES: aarch64-registered-target
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s
 // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -o - %s >/dev/null 2>%t
+// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t
 
+// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it.
+// ASM-NOT: warning
 #include 
 
 #ifdef SVE_OVERLOADED_FORMS
Index: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st2.c
===
--- clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st2.c
+++ clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st2.c
@@ -1,6 +1,11 @@
+// REQUIRES: aarch64-registered-target
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s
 // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -o - %s >/dev/null 2>%t
+// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t
 
+// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README for instructions on how to resolve it.
+// ASM-NOT: warning
 #include 
 
 #ifdef SVE_OVERLOADED_FORMS
Index: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get4.c
===
--- clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get4.c
+++ clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get4.c
@@ -1,6 +1,11 @@
+// REQUIRES: aarch64-registered-target
 // RUN: %clang_cc1 -tri

[PATCH] D82943: [SVE] Add more warnings checks to clang and LLVM SVE tests

2020-07-01 Thread David Sherwood via Phabricator via cfe-commits

david-arm created this revision.
david-arm added reviewers: sdesmalen, ctetreau, kmclaughlin.
Herald added subscribers: llvm-commits, cfe-commits, psnobl, arphaman, rkruppe, 
tschuett.
Herald added a reviewer: rengolin.
Herald added a reviewer: efriedma.
Herald added projects: clang, LLVM.

There are now more SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D82943

Files:
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acge.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acgt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acle.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_aclt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpeq.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpge.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpgt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmple.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmplt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpne.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpuo.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dup.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_index.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1ub.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1ub.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1ub.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uh.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_pnext.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ptrue.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_setffr.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_undef.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpkhi.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpklo.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilele.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilelt.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c
  llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll
  llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll
  llvm/test/CodeGen/AArch64/sve-fcmp.ll
  llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll
  llvm/test/CodeGen/AArch64/sve-gep.ll
  llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll
  
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-scalar-offset.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen

[PATCH] D85743: [CodeGen][AArch64] Support arm_sve_vector_bits attribute

2020-08-14 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:1342
+  if (SrcSize.getKnownMinSize() <= DstSize.getKnownMinSize() ||
+  (isa(SrcTy) ||
+   isa(DstTy))) {

I think if you restructure the code here you could do:

if (isa(SrcTy) || 
isa(DstTy) ||
SrcSize.getFixedSize() <= DstSize.getFixedSize())

since you know that the scalable types have been eliminated by the time we do 
the "<=" comparison.




Comment at: clang/lib/CodeGen/CGCall.cpp:1361
+Tmp.getAlignment().getAsAlign(),
+llvm::ConstantInt::get(CGF.IntPtrTy, DstSize.getKnownMinSize()));
   }

c-rhodes wrote:
> @efriedma If we're happy with the element bitcast above this can also be 
> fixed but I wasn't if that was ok, although it's pretty much what was 
> implemented in the original codegen patch.
Given the if statement above has eliminated scalable vector types I think it's 
safe to use DstSize.getFixedSize() here.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85743/new/

https://reviews.llvm.org/D85743

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D86065: [SVE] Make ElementCount members private

2020-08-17 Thread David Sherwood via Phabricator via cfe-commits

david-arm created this revision.
david-arm added reviewers: sdesmalen, ctetreau, efriedma, fpetrogalli, 
kmclaughlin, c-rhodes.
Herald added subscribers: llvm-commits, cfe-commits, psnobl, hiraditya, 
tschuett.
Herald added projects: clang, LLVM.
david-arm requested review of this revision.

This patch changes ElementCount so that the Min and Scalable
members are now private and can only be access via the get
functions getKnownMinValue() and isScalable(). This is now inline
with the TypeSize class.

In addition I've added some other member functions for more
commonly used operations. Hopefully this makes the class more
useful and will reduce the need for calling getKnownMinValue().


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D86065

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  llvm/include/llvm/Analysis/VectorUtils.h
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/DerivedTypes.h
  llvm/include/llvm/IR/Instructions.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/include/llvm/Support/TypeSize.h
  llvm/lib/Analysis/InstructionSimplify.cpp
  llvm/lib/Analysis/VFABIDemangling.cpp
  llvm/lib/Analysis/ValueTracking.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/CodeGenPrepare.cpp
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/IR/Instructions.cpp
  llvm/lib/IR/IntrinsicInst.cpp
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
  llvm/lib/Transforms/Utils/FunctionComparator.cpp
  llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
  llvm/unittests/IR/VectorTypesTest.cpp

Index: llvm/unittests/IR/VectorTypesTest.cpp
===
--- llvm/unittests/IR/VectorTypesTest.cpp
+++ llvm/unittests/IR/VectorTypesTest.cpp
@@ -119,8 +119,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = V8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, Scalable) {
@@ -215,8 +215,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = ScV8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_TRUE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, BaseVectorType) {
@@ -249,7 +249,7 @@
 // test I == J
 VectorType *VI = VTys[I];
 ElementCount ECI = VI->getElementCount();
-EXPECT_EQ(isa(VI), ECI.Scalable);
+EXPECT_EQ(isa(VI), ECI.isScalable());
 
 for (size_t J = I + 1, JEnd = VTys.size(); J < JEnd; ++J) {
   // test I < J
Index: llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
===
--- llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
+++ llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
@@ -71,8 +71,8 @@
 
   // Check fields inside llvm::ElementCount
   EltCnt = Vnx4i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 4U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 4U);
+  ASSERT_TRUE(EltCnt.isScalable());
 
   // Check that fixed-length vector types aren't scalable.
   EVT V8i32 = EVT::getVectorVT(Ctx, MVT::i32, 8);
@@ -82,8 +82,8 @@
 
   // Check that llvm::ElementCount works for fixed-length types.
   EltCnt = V8i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(ScalableVectorMVTsTest, IRToVTTranslation) {
Index: llvm/lib/Transforms/Utils/FunctionComparator.cpp
===
--- llvm/lib/Transforms/Utils/FunctionComparator.cpp
+++ llvm/lib/Transforms/Utils/FunctionComparator.cpp
@@ -488,12 +488,13 @@
   case Type::ScalableVectorTyID: {
 auto *STyL = cast(TyL);
 auto *STyR = cast(TyR);
-if (STyL->getElementCount().Scalable != STyR->getElementCount().Scalable)
-  return cmpNumbers(STyL->getElementCount().Scalable,
-STyR->getElementCount().Scalable);
-if (STyL->getElementCount().Min != STyR->getElementCount().Min)
-  return cmpNumbers(STyL->getElementCount().Min,
-STyR->getElementCount().Min);
+if (STyL->getElementCount().isScalable() !=
+STyR->getElementCount().isScalable())
+  return cmpNumbers(STyL->getElementCount().is

[PATCH] D85977: [release][docs] Update contributions to LLVM 11 for SVE.

2020-08-17 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: llvm/docs/ReleaseNotes.rst:70
+  ``VFDatabase`` class. When scanning through the set of vector
+  functions associated to a scalar call, the loop vectorizer now
+  relies on ``VFDatabase``, instead of ``TargetLibraryInfo``.

I think this should probably be "associated with a ..."



Comment at: llvm/docs/ReleaseNotes.rst:84
+* LLVM IR now supports two distinct ``llvm::FixedVectorType`` and
+  ``llvm::ScalableVectorType``, both derived from the base class
+  ``llvm::VectorType``. A number of algorithms dealing with IR vector

Perhaps this should be "now support two distinct ... and ... vector types, ..." 
?



Comment at: llvm/docs/ReleaseNotes.rst:117
 
+* Clang supports to the following macros that enable the C-intrinsics
+  from the `Arm C language extensions for SVE

"Clang adds support for the ..." perhaps?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85977/new/

https://reviews.llvm.org/D85977

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-05-10 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi Kerry, I think we do already have some suitable test files where perhaps you 
could add your tests instead of creating new files? For example, there are:

CodeGen/AArch64/sve-int-arith.ll (perhaps integer divides and shifts could live 
there?)
CodeGen/AArch64/sve-int-div-pred.ll (some divides already in here)
CodeGen/AArch64/sve-sext-zext.ll (extend tests recently added)

Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-05-10 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @efriedma, is there a target-independent equivalent of SUNPKHI? From a quick 
glance at the codebase where X86 uses ISD::SIGN_EXTEND_VECTOR_INREG it seems 
vector shuffles are still required for the Hi part, which is fine for fixed 
length vectors I guess.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D129137: [NFC][LoopVectorize] Explicitly disable tail-folding on some SVE tests

2022-07-21 Thread David Sherwood via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGceb6c23b708d: [NFC][LoopVectorize] Explicitly disable 
tail-folding on some SVE tests (authored by david-arm).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129137/new/

https://reviews.llvm.org/D129137

Files:
  clang/test/CodeGen/aarch64-sve-vector-bits-codegen.c
  
llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll
  llvm/test/Transforms/LoopVectorize/AArch64/i1-reg-usage.ll
  llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll
  llvm/test/Transforms/LoopVectorize/AArch64/scalable-reduction-inloop-cond.ll
  llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
  llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
  llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-cond-inv-loads.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-fneg.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter-cost.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-illegal-type.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-loads.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-store.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-large-strides.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-masked-loadstore.ll
  
llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
  llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll
  llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse.ll

Index: llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse.ll
===
--- llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse.ll
+++ llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse.ll
@@ -5,7 +5,8 @@
 ;  for (int i = N-1; i >= 0; --i)
 ;a[i] = b[i] + 1.0;
 
-; RUN: opt -loop-vectorize -dce  -mtriple aarch64-linux-gnu -S < %s | FileCheck %s
+; RUN: opt -loop-vectorize -dce  -mtriple aarch64-linux-gnu -S \
+; RUN:   -prefer-predicate-over-epilogue=scalar-epilogue < %s | FileCheck %s
 
 define void @vector_reverse_f64(i64 %N, double* %a, double* %b) #0 {
 ; CHECK-LABEL: vector_reverse_f64
Index: llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll
===
--- llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll
+++ llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll
@@ -11,7 +11,8 @@
 
 ; The test checks if the mask is being correctly created, reverted  and used
 
-; RUN: opt -loop-vectorize -dce -instcombine -mtriple aarch64-linux-gnu -S < %s | FileCheck %s
+; RUN: opt -loop-vectorize -dce -instcombine -mtriple aarch64-linux-gnu -S \
+; RUN:   -prefer-predicate-over-epilogue=scalar-epilogue < %s | FileCheck %s
 
 target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
 target triple = "aarch64-unknown-linux-gnu"
Index: llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
===
--- llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
+++ llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt -mtriple aarch64-linux-gnu -mattr=+sve -loop-vectorize -dce -instcombine -S < %s | FileCheck %s
+; RUN: opt -mtriple aarch64-linux-gnu -mattr=+sve -loop-vectorize -dce -instcombine -S \
+; RUN:   -prefer-predicate-over-epilogue=scalar-epilogue < %s | FileCheck %s
 
 ; Ensure that we can vectorize loops such as:
 ;   int *ptr = c;
Index: llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
===
--- llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
+++ llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
@@ -1,6 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
 ;

[PATCH] D87700: [SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy

2020-09-28 Thread David Sherwood via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGbafdd11326a4: [SVE] Replace / operator in 
TypeSize/ElementCount with divideCoefficientBy (authored by david-arm).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Changed prior to commit:
  https://reviews.llvm.org/D87700?vs=294351&id=294617#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87700/new/

https://reviews.llvm.org/D87700

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/include/llvm/IR/DerivedTypes.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/include/llvm/Support/TypeSize.h
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
  llvm/unittests/IR/VectorTypesTest.cpp

Index: llvm/unittests/IR/VectorTypesTest.cpp
===
--- llvm/unittests/IR/VectorTypesTest.cpp
+++ llvm/unittests/IR/VectorTypesTest.cpp
@@ -71,8 +71,8 @@
   EXPECT_EQ(V4Int64Ty->getNumElements(), 4U);
   EXPECT_EQ(V4Int64Ty->getElementType()->getScalarSizeInBits(), 64U);
 
-  auto *V2Int64Ty =
-  dyn_cast(VectorType::get(Int64Ty, EltCnt / 2));
+  auto *V2Int64Ty = dyn_cast(
+  VectorType::get(Int64Ty, EltCnt.divideCoefficientBy(2)));
   ASSERT_NE(nullptr, V2Int64Ty);
   EXPECT_EQ(V2Int64Ty->getNumElements(), 2U);
   EXPECT_EQ(V2Int64Ty->getElementType()->getScalarSizeInBits(), 64U);
@@ -166,8 +166,8 @@
   EXPECT_EQ(ScV4Int64Ty->getMinNumElements(), 4U);
   EXPECT_EQ(ScV4Int64Ty->getElementType()->getScalarSizeInBits(), 64U);
 
-  auto *ScV2Int64Ty =
-  dyn_cast(VectorType::get(Int64Ty, EltCnt / 2));
+  auto *ScV2Int64Ty = dyn_cast(
+  VectorType::get(Int64Ty, EltCnt.divideCoefficientBy(2)));
   ASSERT_NE(nullptr, ScV2Int64Ty);
   EXPECT_EQ(ScV2Int64Ty->getMinNumElements(), 2U);
   EXPECT_EQ(ScV2Int64Ty->getElementType()->getScalarSizeInBits(), 64U);
Index: llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
===
--- llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
+++ llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
@@ -61,9 +61,10 @@
   EXPECT_EQ(Vnx2i32.widenIntegerVectorElementType(Ctx), Vnx2i64);
   EXPECT_EQ(Vnx4i32.getHalfNumVectorElementsVT(Ctx), Vnx2i32);
 
-  // Check that overloaded '*' and '/' operators work
+  // Check that operators work
   EXPECT_EQ(EVT::getVectorVT(Ctx, MVT::i64, EltCnt * 2), MVT::nxv4i64);
-  EXPECT_EQ(EVT::getVectorVT(Ctx, MVT::i64, EltCnt / 2), MVT::nxv1i64);
+  EXPECT_EQ(EVT::getVectorVT(Ctx, MVT::i64, EltCnt.divideCoefficientBy(2)),
+MVT::nxv1i64);
 
   // Check that float->int conversion works
   EVT Vnx2f64 = EVT::getVectorVT(Ctx, MVT::f64, ElementCount::getScalable(2));
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -10598,8 +10598,9 @@
   assert(VT.getVectorElementCount().getKnownMinValue() % N == 0 &&
  "invalid tuple vector type!");
 
-  EVT SplitVT = EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(),
- VT.getVectorElementCount() / N);
+  EVT SplitVT =
+  EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(),
+   VT.getVectorElementCount().divideCoefficientBy(N));
   assert(isTypeLegal(SplitVT));
 
   SmallVector VTs(N, SplitVT);
@@ -14393,9 +14394,7 @@
 assert((EltTy == MVT::i8 || EltTy == MVT::i16 || EltTy == MVT::i32) &&
"Sign extending from an invalid type");
 
-EVT ExtVT = EVT::getVectorVT(*DAG.getContext(),
- VT.getVectorElementType(),
- VT.getVectorElementCount() * 2);
+EVT ExtVT = VT.getDoubleNumVectorElementsVT(*DAG.getContext());
 
 SDValue Ext = DAG.getNode(ISD::SIGN_EXTEND_INREG, DL, ExtOp.getValueType(),
   ExtOp, DAG.getValueType(ExtVT));
Index: llvm/lib/CodeGen/TargetLoweringBase.cpp
===
--- llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -831,9 +831,7 @@
"Promote may not follow Expand or Promote");
 
 if (LA == TypeSplitVector)
-  return LegalizeKind(LA,
-  EVT::getVectorVT(Context, SVT.getVectorElementType(),
-   SVT.getVectorElementCount() / 2));
+  return LegalizeKind(LA, EVT(SVT).getHalfNumVectorElementsVT(Context));
 if (LA == TypeScalarizeVector)
   return LegalizeKind(LA, SVT.getVectorElementType());
 return LegalizeKind(LA, NVT);
@@ -8

[PATCH] D86065: [SVE] Make ElementCount members private

2020-08-20 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @ctetreau, I agree with @efriedma that keeping the two classes distinct for 
now seems best. The reason is I spent quite a lot of time trying to unify these 
classes already and I hit a stumbling block - TypeSize has the ugly uint64_t() 
cast operator, which makes unifying difficult. I didn't want to introduce a 
templated cast operator that ElementCount would then have too. I also tried 
making TypeSize derive from a templated parent, but that was pretty ugly too. 
Perhaps once we've removed the TypeSize -> uint64_t we might be better able to 
consider it?

Comment at: llvm/include/llvm/Support/TypeSize.h:56

+  friend bool operator>(const ElementCount &LHS, const ElementCount &RHS) {
+assert(LHS.Scalable == RHS.Scalable &&

ctetreau wrote:
> fpetrogalli wrote:
> > I think that @ctetreau is right on 
> > https://reviews.llvm.org/D85794#inline-793909. We should not overload a 
> > comparison operator on this class because the set it represent it cannot be 
> > ordered.
> > 
> > Chris suggests an approach of writing a static function that can be used as 
> > a comparison operator,  so that we can make it explicit of what kind of 
> > comparison we  are doing. 
> In C++, it's common to overload the comparison operators for the purposes of 
> being able to std::sort and use ordered sets. Normally, I would be OK with 
> such usages. However, since `ElementCount` is basically a numeric type, and 
> they only have a partial ordering, I think this is dangerous. I'm concerned 
> that this will result in more bugs whereby somebody didn't remember that 
> vectors can be scalable.
> 
> I don't have a strong opinion what the comparator function should be called, 
> but I strongly prefer that it not be a comparison operator.
Hi @ctetreau, yeah I understand. The reason I chose to use operators was simply 
to be consistent with what we have already in TypeSize. Also, we have existing 
"==" and "!=" operators in ElementCount too, although these are essentially 
testing that two ElementCounts are identically the same or not, i.e. for 2 
given polynomials (a + bx) and (c + dx) we're essentially asking if both a==c 
and b==d.

If I introduce a new comparison function, I'll probably keep the asserts in for 
now, but in general we can do better than simply asserting if something is 
scalable or not. For example, we know that (vscale * 4) is definitely >= 4 
because vscale is at least 1. I'm just not sure if we have that need yet.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86065/new/

https://reviews.llvm.org/D86065

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D86065: [SVE] Make ElementCount members private

2020-08-21 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 287003.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86065/new/

https://reviews.llvm.org/D86065

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CodeGenTypes.cpp
  llvm/include/llvm/Analysis/VectorUtils.h
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/DerivedTypes.h
  llvm/include/llvm/IR/Instructions.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/include/llvm/Support/TypeSize.h
  llvm/lib/Analysis/InstructionSimplify.cpp
  llvm/lib/Analysis/VFABIDemangling.cpp
  llvm/lib/Analysis/ValueTracking.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/CodeGenPrepare.cpp
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/IR/Instructions.cpp
  llvm/lib/IR/IntrinsicInst.cpp
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
  llvm/lib/Transforms/Utils/FunctionComparator.cpp
  llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
  llvm/unittests/IR/VectorTypesTest.cpp

Index: llvm/unittests/IR/VectorTypesTest.cpp
===
--- llvm/unittests/IR/VectorTypesTest.cpp
+++ llvm/unittests/IR/VectorTypesTest.cpp
@@ -119,8 +119,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = V8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, Scalable) {
@@ -215,8 +215,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = ScV8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_TRUE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, BaseVectorType) {
@@ -250,7 +250,7 @@
 // test I == J
 VectorType *VI = VTys[I];
 ElementCount ECI = VI->getElementCount();
-EXPECT_EQ(isa(VI), ECI.Scalable);
+EXPECT_EQ(isa(VI), ECI.isScalable());
 
 for (size_t J = I + 1, JEnd = VTys.size(); J < JEnd; ++J) {
   // test I < J
Index: llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
===
--- llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
+++ llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
@@ -71,8 +71,8 @@
 
   // Check fields inside llvm::ElementCount
   EltCnt = Vnx4i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 4U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 4U);
+  ASSERT_TRUE(EltCnt.isScalable());
 
   // Check that fixed-length vector types aren't scalable.
   EVT V8i32 = EVT::getVectorVT(Ctx, MVT::i32, 8);
@@ -82,8 +82,8 @@
 
   // Check that llvm::ElementCount works for fixed-length types.
   EltCnt = V8i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(ScalableVectorMVTsTest, IRToVTTranslation) {
Index: llvm/lib/Transforms/Utils/FunctionComparator.cpp
===
--- llvm/lib/Transforms/Utils/FunctionComparator.cpp
+++ llvm/lib/Transforms/Utils/FunctionComparator.cpp
@@ -488,12 +488,13 @@
   case Type::ScalableVectorTyID: {
 auto *STyL = cast(TyL);
 auto *STyR = cast(TyR);
-if (STyL->getElementCount().Scalable != STyR->getElementCount().Scalable)
-  return cmpNumbers(STyL->getElementCount().Scalable,
-STyR->getElementCount().Scalable);
-if (STyL->getElementCount().Min != STyR->getElementCount().Min)
-  return cmpNumbers(STyL->getElementCount().Min,
-STyR->getElementCount().Min);
+if (STyL->getElementCount().isScalable() !=
+STyR->getElementCount().isScalable())
+  return cmpNumbers(STyL->getElementCount().isScalable(),
+STyR->getElementCount().isScalable());
+if (STyL->getElementCount() != STyR->getElementCount())
+  return cmpNumbers(STyL->getElementCount().getKnownMinValue(),
+STyR->getElementCount().getKnownMinValue());
 return cmpTypes(STyL->getElementType(), STyR->getElementType());
   }
   }
Index: llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
===
--- llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -34

[PATCH] D86065: [SVE] Make ElementCount members private

2020-08-25 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @fpetrogalli, if you don't mind I think I'll stick with Paul's idea for ogt 
because this matches the IR neatly, i.e. "fcmp ogt". Also, for me personally 
it's much simpler and more intuitive.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86065/new/

https://reviews.llvm.org/D86065

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D86065: [SVE] Make ElementCount members private

2020-08-25 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 287672.
david-arm added a comment.
Herald added a subscriber: rogfer01.

- Changed comparison function from gt to ogt and added a olt (less than) 
comparison function too.
- Instead of adding the ">>=" operator I've added "/=" instead as I think this 
is more common. In places where ">>= 1" was used we now do "/= 2".
- After rebasing it was necessary to add a "*=" operator too for the Loop 
Vectorizer.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86065/new/

https://reviews.llvm.org/D86065

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CodeGenTypes.cpp
  llvm/include/llvm/Analysis/TargetTransformInfo.h
  llvm/include/llvm/Analysis/VectorUtils.h
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/DerivedTypes.h
  llvm/include/llvm/IR/Instructions.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/include/llvm/Support/TypeSize.h
  llvm/lib/Analysis/InstructionSimplify.cpp
  llvm/lib/Analysis/VFABIDemangling.cpp
  llvm/lib/Analysis/ValueTracking.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/CodeGenPrepare.cpp
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/IR/Instructions.cpp
  llvm/lib/IR/IntrinsicInst.cpp
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
  llvm/lib/Transforms/Utils/FunctionComparator.cpp
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/lib/Transforms/Vectorize/VPlan.cpp
  llvm/lib/Transforms/Vectorize/VPlan.h
  llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
  llvm/unittests/IR/VectorTypesTest.cpp

Index: llvm/unittests/IR/VectorTypesTest.cpp
===
--- llvm/unittests/IR/VectorTypesTest.cpp
+++ llvm/unittests/IR/VectorTypesTest.cpp
@@ -119,8 +119,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = V8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, Scalable) {
@@ -215,8 +215,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = ScV8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_TRUE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, BaseVectorType) {
@@ -250,7 +250,7 @@
 // test I == J
 VectorType *VI = VTys[I];
 ElementCount ECI = VI->getElementCount();
-EXPECT_EQ(isa(VI), ECI.Scalable);
+EXPECT_EQ(isa(VI), ECI.isScalable());
 
 for (size_t J = I + 1, JEnd = VTys.size(); J < JEnd; ++J) {
   // test I < J
Index: llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
===
--- llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
+++ llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
@@ -71,8 +71,8 @@
 
   // Check fields inside llvm::ElementCount
   EltCnt = Vnx4i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 4U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 4U);
+  ASSERT_TRUE(EltCnt.isScalable());
 
   // Check that fixed-length vector types aren't scalable.
   EVT V8i32 = EVT::getVectorVT(Ctx, MVT::i32, 8);
@@ -82,8 +82,8 @@
 
   // Check that llvm::ElementCount works for fixed-length types.
   EltCnt = V8i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(ScalableVectorMVTsTest, IRToVTTranslation) {
Index: llvm/lib/Transforms/Vectorize/VPlan.h
===
--- llvm/lib/Transforms/Vectorize/VPlan.h
+++ llvm/lib/Transforms/Vectorize/VPlan.h
@@ -151,14 +151,15 @@
   /// \return True if the map has a scalar entry for \p Key and \p Instance.
   bool hasScalarValue(Value *Key, const VPIteration &Instance) const {
 assert(Instance.Part < UF && "Queried Scalar Part is too large.");
-assert(Instance.Lane < VF.Min && "Queried Scalar Lane is too large.");
-assert(!VF.Scalable && "VF is assumed to be non scalable.");
+assert(Instance.Lane < VF.getKnownMinValue() &&
+   "Queried Scalar Lane is too large.");
+assert(!VF.isScalable() && "VF is assumed to be non scalable.");
 
 if (!hasAnyScalarValue(Key))
   return false;
 const ScalarParts &Entry = ScalarMapStorage.find(Key)->second;
 assert(Entry.size

[PATCH] D86065: [SVE] Make ElementCount members private

2020-08-27 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @ctetreau, ok for now I'm going to completely remove the operators and 
revert the code using those operators to how it was before. I'm not sure what 
you mean about the predicate functions so I've left those for now, since they 
aren't needed for this patch. The purpose of this patch was originally supposed 
to be mechanical anyway - just making members private. I only added the 
operators as an after-thought really, just to be consistent with how TypeSize 
dealt with the identical problem. For what it's worth, I believe that GCC 
solved this exact same problem by adding two types of comparison functions - 
one set that absolutely wanted an answer to ">,<,>=,<=" and asserted if it 
wasn't known at compile time, and another set of comparison functions that 
returned an additional boolean value indicating whether the answer was known or 
not. Perhaps my knowledge is out of date, but I believe this was the accepted 
solution and seemed to work well.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86065/new/

https://reviews.llvm.org/D86065

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D86065: [SVE] Make ElementCount members private

2020-08-27 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 288260.
david-arm edited the summary of this revision.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86065/new/

https://reviews.llvm.org/D86065

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CodeGenTypes.cpp
  llvm/include/llvm/Analysis/TargetTransformInfo.h
  llvm/include/llvm/Analysis/VectorUtils.h
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/DerivedTypes.h
  llvm/include/llvm/IR/Instructions.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/include/llvm/Support/TypeSize.h
  llvm/lib/Analysis/InstructionSimplify.cpp
  llvm/lib/Analysis/VFABIDemangling.cpp
  llvm/lib/Analysis/ValueTracking.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/CodeGenPrepare.cpp
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/IR/Instructions.cpp
  llvm/lib/IR/IntrinsicInst.cpp
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
  llvm/lib/Transforms/Utils/FunctionComparator.cpp
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/lib/Transforms/Vectorize/VPlan.cpp
  llvm/lib/Transforms/Vectorize/VPlan.h
  llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
  llvm/unittests/IR/VectorTypesTest.cpp

Index: llvm/unittests/IR/VectorTypesTest.cpp
===
--- llvm/unittests/IR/VectorTypesTest.cpp
+++ llvm/unittests/IR/VectorTypesTest.cpp
@@ -119,8 +119,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = V8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, Scalable) {
@@ -215,8 +215,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = ScV8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_TRUE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, BaseVectorType) {
@@ -250,7 +250,7 @@
 // test I == J
 VectorType *VI = VTys[I];
 ElementCount ECI = VI->getElementCount();
-EXPECT_EQ(isa(VI), ECI.Scalable);
+EXPECT_EQ(isa(VI), ECI.isScalable());
 
 for (size_t J = I + 1, JEnd = VTys.size(); J < JEnd; ++J) {
   // test I < J
Index: llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
===
--- llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
+++ llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
@@ -71,8 +71,8 @@
 
   // Check fields inside llvm::ElementCount
   EltCnt = Vnx4i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 4U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 4U);
+  ASSERT_TRUE(EltCnt.isScalable());
 
   // Check that fixed-length vector types aren't scalable.
   EVT V8i32 = EVT::getVectorVT(Ctx, MVT::i32, 8);
@@ -82,8 +82,8 @@
 
   // Check that llvm::ElementCount works for fixed-length types.
   EltCnt = V8i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(ScalableVectorMVTsTest, IRToVTTranslation) {
Index: llvm/lib/Transforms/Vectorize/VPlan.h
===
--- llvm/lib/Transforms/Vectorize/VPlan.h
+++ llvm/lib/Transforms/Vectorize/VPlan.h
@@ -151,14 +151,15 @@
   /// \return True if the map has a scalar entry for \p Key and \p Instance.
   bool hasScalarValue(Value *Key, const VPIteration &Instance) const {
 assert(Instance.Part < UF && "Queried Scalar Part is too large.");
-assert(Instance.Lane < VF.Min && "Queried Scalar Lane is too large.");
-assert(!VF.Scalable && "VF is assumed to be non scalable.");
+assert(Instance.Lane < VF.getKnownMinValue() &&
+   "Queried Scalar Lane is too large.");
+assert(!VF.isScalable() && "VF is assumed to be non scalable.");
 
 if (!hasAnyScalarValue(Key))
   return false;
 const ScalarParts &Entry = ScalarMapStorage.find(Key)->second;
 assert(Entry.size() == UF && "ScalarParts has wrong dimensions.");
-assert(Entry[Instance.Part].size() == VF.Min &&
+assert(Entry[Instance.Part].size() == VF.getKnownMinValue() &&
"ScalarParts has wrong dimensions.");
 return Entry[Instance.Part][Instance.Lane] != nullptr;
   }
@@ -197,7 +198,7 @@
   // TODO: Consider storing uniform value

[PATCH] D86065: [SVE] Make ElementCount members private

2020-08-27 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: llvm/include/llvm/Support/TypeSize.h:108
+
+  bool isPowerOf2() const { return isPowerOf2_32(Min); }
 };

paulwalker-arm wrote:
> I don't believe this is safe.  For example we know SVE supported vector 
> lengths only have to be a multiple of 128bits.  So for scalable vectors we 
> cannot know the element count is a power of 2 unless we perform a runtime 
> check.
Ok, but if that's true how is code in llvm/lib/CodeGen/TargetLoweringBase.cpp 
ever safe for scalable vectors? I thought that the question being asked wasn't 
that the total size was a power of 2, but whether or not it was safe to split 
the vector. The answer should be the same even if vscale is 3, for example. I 
thought the problem here is that the legaliser simply needs to know in what way 
it should break down different types, and that whatever approach it took would 
work when scaled up. The vector breakdown algorithm relies upon having an 
answer here - perhaps this is just a case of changing the question and name of 
function?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86065/new/

https://reviews.llvm.org/D86065

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D86065: [SVE] Make ElementCount members private

2020-08-27 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 288370.
david-arm added a comment.

- Removed isPowerOf2() function since this is potentially misleading - it's 
only the known minimum value that we're checking.
- Renamed isEven to isKnownEven to try and make it clear that returning true 
indicates we know definitely that the total number of elements is even, whereas 
returning false could mean either the element count is odd or that we don't 
know.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86065/new/

https://reviews.llvm.org/D86065

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CodeGenTypes.cpp
  llvm/include/llvm/Analysis/TargetTransformInfo.h
  llvm/include/llvm/Analysis/VectorUtils.h
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/DerivedTypes.h
  llvm/include/llvm/IR/Instructions.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/include/llvm/Support/TypeSize.h
  llvm/lib/Analysis/InstructionSimplify.cpp
  llvm/lib/Analysis/VFABIDemangling.cpp
  llvm/lib/Analysis/ValueTracking.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/CodeGenPrepare.cpp
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/IR/Instructions.cpp
  llvm/lib/IR/IntrinsicInst.cpp
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
  llvm/lib/Transforms/Utils/FunctionComparator.cpp
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/lib/Transforms/Vectorize/VPlan.cpp
  llvm/lib/Transforms/Vectorize/VPlan.h
  llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
  llvm/unittests/IR/VectorTypesTest.cpp

Index: llvm/unittests/IR/VectorTypesTest.cpp
===
--- llvm/unittests/IR/VectorTypesTest.cpp
+++ llvm/unittests/IR/VectorTypesTest.cpp
@@ -119,8 +119,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = V8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, Scalable) {
@@ -215,8 +215,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = ScV8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_TRUE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, BaseVectorType) {
@@ -250,7 +250,7 @@
 // test I == J
 VectorType *VI = VTys[I];
 ElementCount ECI = VI->getElementCount();
-EXPECT_EQ(isa(VI), ECI.Scalable);
+EXPECT_EQ(isa(VI), ECI.isScalable());
 
 for (size_t J = I + 1, JEnd = VTys.size(); J < JEnd; ++J) {
   // test I < J
Index: llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
===
--- llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
+++ llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
@@ -71,8 +71,8 @@
 
   // Check fields inside llvm::ElementCount
   EltCnt = Vnx4i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 4U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 4U);
+  ASSERT_TRUE(EltCnt.isScalable());
 
   // Check that fixed-length vector types aren't scalable.
   EVT V8i32 = EVT::getVectorVT(Ctx, MVT::i32, 8);
@@ -82,8 +82,8 @@
 
   // Check that llvm::ElementCount works for fixed-length types.
   EltCnt = V8i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(ScalableVectorMVTsTest, IRToVTTranslation) {
Index: llvm/lib/Transforms/Vectorize/VPlan.h
===
--- llvm/lib/Transforms/Vectorize/VPlan.h
+++ llvm/lib/Transforms/Vectorize/VPlan.h
@@ -151,14 +151,15 @@
   /// \return True if the map has a scalar entry for \p Key and \p Instance.
   bool hasScalarValue(Value *Key, const VPIteration &Instance) const {
 assert(Instance.Part < UF && "Queried Scalar Part is too large.");
-assert(Instance.Lane < VF.Min && "Queried Scalar Lane is too large.");
-assert(!VF.Scalable && "VF is assumed to be non scalable.");
+assert(Instance.Lane < VF.getKnownMinValue() &&
+   "Queried Scalar Lane is too large.");
+assert(!VF.isScalable() && "VF is assumed to be non scalable.");
 
 if (!hasAnyScalarValue(Key))
   return false;
 const ScalarParts &Entry = ScalarMapStorage.find(Key)->second;
 assert(Entry.size(

[PATCH] D86065: [SVE] Make ElementCount members private

2020-08-28 Thread David Sherwood via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGf4257c5832aa: [SVE] Make ElementCount members private 
(authored by david-arm).

Changed prior to commit:
  https://reviews.llvm.org/D86065?vs=288370&id=288594#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86065/new/

https://reviews.llvm.org/D86065

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGDebugInfo.cpp
  clang/lib/CodeGen/CodeGenTypes.cpp
  llvm/include/llvm/Analysis/TargetTransformInfo.h
  llvm/include/llvm/Analysis/VectorUtils.h
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/DerivedTypes.h
  llvm/include/llvm/IR/Instructions.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/include/llvm/Support/TypeSize.h
  llvm/lib/Analysis/InstructionSimplify.cpp
  llvm/lib/Analysis/VFABIDemangling.cpp
  llvm/lib/Analysis/ValueTracking.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/CodeGenPrepare.cpp
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/IR/Instructions.cpp
  llvm/lib/IR/IntrinsicInst.cpp
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
  llvm/lib/Transforms/Utils/FunctionComparator.cpp
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/lib/Transforms/Vectorize/VPlan.cpp
  llvm/lib/Transforms/Vectorize/VPlan.h
  llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
  llvm/unittests/IR/VectorTypesTest.cpp

Index: llvm/unittests/IR/VectorTypesTest.cpp
===
--- llvm/unittests/IR/VectorTypesTest.cpp
+++ llvm/unittests/IR/VectorTypesTest.cpp
@@ -119,8 +119,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = V8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, Scalable) {
@@ -215,8 +215,8 @@
   EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);
 
   EltCnt = ScV8Int64Ty->getElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_TRUE(EltCnt.isScalable());
 }
 
 TEST(VectorTypesTest, BaseVectorType) {
@@ -250,7 +250,7 @@
 // test I == J
 VectorType *VI = VTys[I];
 ElementCount ECI = VI->getElementCount();
-EXPECT_EQ(isa(VI), ECI.Scalable);
+EXPECT_EQ(isa(VI), ECI.isScalable());
 
 for (size_t J = I + 1, JEnd = VTys.size(); J < JEnd; ++J) {
   // test I < J
Index: llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
===
--- llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
+++ llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
@@ -71,8 +71,8 @@
 
   // Check fields inside llvm::ElementCount
   EltCnt = Vnx4i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 4U);
-  ASSERT_TRUE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 4U);
+  ASSERT_TRUE(EltCnt.isScalable());
 
   // Check that fixed-length vector types aren't scalable.
   EVT V8i32 = EVT::getVectorVT(Ctx, MVT::i32, 8);
@@ -82,8 +82,8 @@
 
   // Check that llvm::ElementCount works for fixed-length types.
   EltCnt = V8i32.getVectorElementCount();
-  EXPECT_EQ(EltCnt.Min, 8U);
-  ASSERT_FALSE(EltCnt.Scalable);
+  EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
+  ASSERT_FALSE(EltCnt.isScalable());
 }
 
 TEST(ScalableVectorMVTsTest, IRToVTTranslation) {
Index: llvm/lib/Transforms/Vectorize/VPlan.h
===
--- llvm/lib/Transforms/Vectorize/VPlan.h
+++ llvm/lib/Transforms/Vectorize/VPlan.h
@@ -151,14 +151,15 @@
   /// \return True if the map has a scalar entry for \p Key and \p Instance.
   bool hasScalarValue(Value *Key, const VPIteration &Instance) const {
 assert(Instance.Part < UF && "Queried Scalar Part is too large.");
-assert(Instance.Lane < VF.Min && "Queried Scalar Lane is too large.");
-assert(!VF.Scalable && "VF is assumed to be non scalable.");
+assert(Instance.Lane < VF.getKnownMinValue() &&
+   "Queried Scalar Lane is too large.");
+assert(!VF.isScalable() && "VF is assumed to be non scalable.");
 
 if (!hasAnyScalarValue(Key))
   return false;
 const ScalarParts &Entry = ScalarMapStorage.find(Key)->second;
 assert(Entry.size() == UF && "ScalarParts has wrong dimensions.");
-assert(Entry[Instance.Part].size() ==

[PATCH] D86720: [clang][aarch64] Drop experimental from __ARM_FEATURE_SVE_BITS macro

2020-09-02 Thread David Sherwood via Phabricator via cfe-commits

david-arm accepted this revision.
david-arm added a comment.
This revision is now accepted and ready to land.

LGTM




Comment at: clang/lib/Basic/Targets/AArch64.cpp:381
   if (Opts.ArmSveVectorBits)
-Builder.defineMacro("__ARM_FEATURE_SVE_BITS_EXPERIMENTAL",
+Builder.defineMacro("__ARM_FEATURE_SVE_BITS",
 Twine(Opts.ArmSveVectorBits));

nit: Perhaps just reformat the code before submitting?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86720/new/

https://reviews.llvm.org/D86720

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2020-12-18 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 312802.
david-arm edited the summary of this revision.
Herald added a subscriber: NickHung.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/lib/AST/AttrImpl.cpp
  clang/lib/CodeGen/CGLoopInfo.cpp
  clang/lib/CodeGen/CGLoopInfo.h
  clang/lib/Parse/ParsePragma.cpp
  clang/lib/Sema/SemaStmtAttr.cpp
  clang/test/CodeGenCXX/pragma-loop-pr27643.cpp
  clang/test/CodeGenCXX/pragma-loop.cpp
  clang/test/Parser/pragma-loop.cpp

Index: clang/test/Parser/pragma-loop.cpp
===
--- clang/test/Parser/pragma-loop.cpp
+++ clang/test/Parser/pragma-loop.cpp
@@ -60,7 +60,8 @@
 
 template 
 void test_nontype_template_badarg(int *List, int Length) {
-  /* expected-error {{use of undeclared identifier 'Vec'}} */ #pragma clang loop vectorize_width(Vec) interleave_count(I)
+  /* expected-error {{use of undeclared identifier 'Vec'}} */ #pragma clang loop vectorize_width(Vec) interleave_count(I) /*
+ expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, 'fixed' or 'scalable') where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
   /* expected-error {{use of undeclared identifier 'Int'}} */ #pragma clang loop vectorize_width(V) interleave_count(Int)
   for (int i = 0; i < Length; i++) {
 List[i] = i;
@@ -189,12 +190,15 @@
 /* expected-warning {{extra tokens at end of '#pragma clang loop'}} */ #pragma clang loop vectorize_width(1 +) 1
 /* expected-warning {{extra tokens at end of '#pragma clang loop'}} */ #pragma clang loop vectorize_width(1) +1
 const int VV = 4;
-/* expected-error {{expected expression}} */ #pragma clang loop vectorize_width(VV +/ 2)
-/* expected-error {{use of undeclared identifier 'undefined'}} */ #pragma clang loop vectorize_width(VV+undefined)
+/* expected-error {{expected expression}} */ #pragma clang loop vectorize_width(VV +/ 2) /*
+   expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, 'fixed' or 'scalable') where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
+/* expected-error {{use of undeclared identifier 'undefined'}} */ #pragma clang loop vectorize_width(VV+undefined) /*
+   expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, 'fixed' or 'scalable') where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
 /* expected-error {{expected ')'}} */ #pragma clang loop vectorize_width(1+(^*/2 * ()
 /* expected-warning {{extra tokens at end of '#pragma clang loop' - ignored}} */ #pragma clang loop vectorize_width(1+(-0[0]))
 
-/* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop vectorize_width(badvalue)
+/* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop vectorize_width(badvalue) /*
+   expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, 'fixed' or 'scalable') where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
 /* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop interleave_count(badvalue)
 /* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop unroll_count(badvalue)
   while (i-6 < Length) {
@@ -215,7 +219,7 @@
 /* expected-error {{invalid argument; expected 'enable', 'assume_safety' or 'disable'}} */ #pragma clang loop interleave(*)
 /* expected-error {{invalid argument; expected 'enable', 'full' or 'disable'}} */ #pragma clang loop unroll(=)
 /* expected-error {{invalid argument; expected 'enable' or 'disable'}} */ #pragma clang loop distribute(+)
-/* expected-error {{type name requires a specifier or qualifier}} expected-error {{expected expression}} */ #pragma clang loop vectorize_width(^)
+/* expected-error {{type name requires a specifier or qualifier}} expected-error {{expected expression}} */ #pragma clang loop vectorize_width(^) /* expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, 'fixed' or 'scalable') where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
 /* expected-error {{expected expression}} expected-error {{expected expression}} */ #pragma clang loop interleave_count(/)
 /* expected-error {{expected expression}} expected-error {{expected expression}} */ #pragma clang loop unroll_count(==)
   while (i-8 < Length) {
Index: clang/test/CodeGenCXX/pragma-loop.cpp
===
--- clang/test/CodeGenCXX/pragma-loop.cpp
+++ clang/test/CodeGenCXX/pragma-loop.cpp
@@ -158,51 +158,88 @@
   for_template_constant_expression_test(List, Length);
 }
 
+// Verify for loop is performing fixed width vectorization
+void for_test_fixed_16(int *List, int Length) {
+#pragma clang loop vectorize_width(16, fixed) interleave_count(4) unroll(disab

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2020-12-21 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: clang/include/clang/Basic/Attr.td:3356
   EnumArgument<"State", "LoopHintState",
-   ["enable", "disable", "numeric", "assume_safety", 
"full"],
-   ["Enable", "Disable", "Numeric", "AssumeSafety", 
"Full"]>,
+   ["enable", "disable", "numeric", "fixed_width", 
"scalable_width", "assume_safety", "full"],
+   ["Enable", "Disable", "Numeric", "FixedWidth", 
"ScalableWidth", "AssumeSafety", "Full"]>,

aaron.ballman wrote:
> Should the documentation in AttrDocs.td be updated for this change?
Hi @aaron.ballman I had a look at LoopHintDocs in AttrDocs.td and it didn't 
explicitly mention these states, i.e. "assume_safety", "numeric", etc., so I'm 
not sure if it's necessary to add anything there?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2020-11-04 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 302773.
david-arm marked an inline comment as done.
david-arm edited the summary of this revision.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/include/clang/Basic/TargetInfo.h
  clang/lib/Basic/Targets/AArch64.h
  clang/lib/CodeGen/CGLoopInfo.cpp
  clang/lib/CodeGen/CGLoopInfo.h
  clang/lib/Parse/ParsePragma.cpp
  clang/lib/Sema/SemaStmtAttr.cpp
  clang/test/CodeGenCXX/pragma-loop.cpp
  clang/test/CodeGenCXX/pragma-scalable-loop.cpp

Index: clang/test/CodeGenCXX/pragma-scalable-loop.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/pragma-scalable-loop.cpp
@@ -0,0 +1,18 @@
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -target-feature +sve -std=c++11 -emit-llvm -o - %s | FileCheck %s
+
+// Verify do loop is performing scalable vectorization
+void for_test_scalable(int *List, int Length) {
+#pragma clang loop vectorize_width(16, scalable) interleave_count(4) unroll(disable) distribute(disable)
+  for (int i = 0; i < Length; i++) {
+// CHECK: br label {{.*}}, !llvm.loop ![[LOOP_1:.*]]
+List[i] = i * 2;
+  }
+}
+
+// CHECK: ![[LOOP_1]] = distinct !{![[LOOP_1]], ![[UNROLL_DISABLE:.*]], ![[DISTRIBUTE_DISABLE:.*]], ![[WIDTH_16_SCALABLE:.*]], ![[INTERLEAVE_4:.*]], ![[VECTORIZE_ENABLE:.*]]}
+// CHECK: ![[UNROLL_DISABLE]] = !{!"llvm.loop.unroll.disable"}
+// CHECK: ![[DISTRIBUTE_DISABLE]] = !{!"llvm.loop.distribute.enable", i1 false}
+// CHECK: ![[WIDTH_16_SCALABLE]] = !{!"llvm.loop.vectorize.width", ![[ELEMENT_COUNT_16_SCALABLE:.*]]}
+// CHECK: ![[ELEMENT_COUNT_16_SCALABLE]] = !{i32 16, i1 true}
+// CHECK: ![[INTERLEAVE_4]] = !{!"llvm.loop.interleave.count", i32 4}
+// CHECK: ![[VECTORIZE_ENABLE]] = !{!"llvm.loop.vectorize.enable", i1 true}
Index: clang/test/CodeGenCXX/pragma-loop.cpp
===
--- clang/test/CodeGenCXX/pragma-loop.cpp
+++ clang/test/CodeGenCXX/pragma-loop.cpp
@@ -1,4 +1,5 @@
-// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++11 -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++11 -emit-llvm -o - %s 2>%t | FileCheck %s
+// RUN: FileCheck --check-prefix=CHECK-SCALABLE %s < %t
 
 // Verify while loop is recognized after sequence of pragma clang loop directives.
 void while_test(int *List, int Length) {
@@ -158,6 +159,26 @@
   for_template_constant_expression_test(List, Length);
 }
 
+// Verify for loop is performing fixed width vectorization
+void for_test_fixed(int *List, int Length) {
+#pragma clang loop vectorize_width(16, fixed) interleave_count(4) unroll(disable) distribute(disable)
+  for (int i = 0; i < Length; i++) {
+// CHECK: br label {{.*}}, !llvm.loop ![[LOOP_15:.*]]
+List[i] = i * 2;
+  }
+}
+
+// Verify for loop rejects scalable vectorization due to lack of target support
+// CHECK-SCALABLE: ignoring scalable vectorize_width flag due to lack of target support
+void for_test_scalable(int *List, int Length) {
+#pragma clang loop vectorize_width(16, scalable) interleave_count(4) unroll(disable) distribute(disable)
+  for (int i = 0; i < Length; i++) {
+// CHECK: br label {{.*}}, !llvm.loop ![[LOOP_16:.*]]
+// CHECK-SVE: br label {{.*}}, !llvm.loop ![[LOOP_16_SVE:.*]]
+List[i] = i * 2;
+  }
+}
+
 // CHECK: ![[LOOP_1]] = distinct !{![[LOOP_1]], ![[UNROLL_FULL:.*]]}
 // CHECK: ![[UNROLL_FULL]] = !{!"llvm.loop.unroll.full"}
 
@@ -215,3 +236,8 @@
 
 // CHECK: ![[LOOP_14]] = distinct !{![[LOOP_14]], ![[WIDTH_10:.*]], ![[VECTORIZE_ENABLE]]}
 // CHECK: ![[WIDTH_10]] = !{!"llvm.loop.vectorize.width", i32 10}
+
+// CHECK: ![[LOOP_15]] = distinct !{![[LOOP_15]], ![[UNROLL_DISABLE:.*]], ![[DISTRIBUTE_DISABLE:.*]], ![[WIDTH_16_FIXED:.*]], ![[INTERLEAVE_4:.*]], ![[VECTORIZE_ENABLE:.*]]}
+// CHECK: ![[WIDTH_16_FIXED]] = !{!"llvm.loop.vectorize.width", i32 16}
+
+// CHECK: ![[LOOP_16]] = distinct !{![[LOOP_16]], ![[UNROLL_DISABLE:.*]], ![[DISTRIBUTE_DISABLE:.*]], ![[WIDTH_16_FIXED:.*]], ![[INTERLEAVE_4:.*]], ![[VECTORIZE_ENABLE:.*]]}
Index: clang/lib/Sema/SemaStmtAttr.cpp
===
--- clang/lib/Sema/SemaStmtAttr.cpp
+++ clang/lib/Sema/SemaStmtAttr.cpp
@@ -14,6 +14,7 @@
 #include "clang/Sema/SemaInternal.h"
 #include "clang/AST/ASTContext.h"
 #include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TargetInfo.h"
 #include "clang/Sema/DelayedDiagnostic.h"
 #include "clang/Sema/Lookup.h"
 #include "clang/Sema/ScopeInfo.h"
@@ -139,10 +140,21 @@
LoopHintAttr::PipelineInitiationInterval)
  .Case("distribute", LoopHintAttr::Distribute)
  .Default(LoopHintAttr::Vectorize);
-if (Option == LoopHintAttr::VectorizeWidth ||
-

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2020-11-05 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

I'll hold off on any more changes for now to give @fhahn a chance to reply to 
your comment @sdesmalen about the fallback behaviour when scalable 
vectorisation is unsupported.




Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:939
+def warn_pragma_attribute_scalable_unused : Warning<
+  "ignoring scalable vectorize_width flag due to lack of target support">,
+  InGroup;

sdesmalen wrote:
> From what I can see, the vectorize_width flag is not ignored, only the 
> scalable property is. That means this should be:
>   'scalable' not supported by the target so assuming 'fixed' instead.
OK. I guess it's just when the warning comes out it appears at the start of the 
line so I wanted to emphasise that this relates to the scalable property passed 
to the vectorize_width attribute (rather than other attributes) as there could 
potentially be several pragmas on one line. I think it would be good to mention 
the vectorize_width pragma/attribute somewhere in the warning message to make 
it clear. I'll see if I can reword it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2020-11-11 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 304488.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/include/clang/Basic/TargetInfo.h
  clang/lib/Basic/Targets/AArch64.h
  clang/lib/CodeGen/CGLoopInfo.cpp
  clang/lib/CodeGen/CGLoopInfo.h
  clang/lib/Parse/ParsePragma.cpp
  clang/lib/Sema/SemaStmtAttr.cpp
  clang/test/CodeGenCXX/pragma-loop.cpp
  clang/test/CodeGenCXX/pragma-scalable-loop.cpp

Index: clang/test/CodeGenCXX/pragma-scalable-loop.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/pragma-scalable-loop.cpp
@@ -0,0 +1,18 @@
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -target-feature +sve -std=c++11 -emit-llvm -o - %s | FileCheck %s
+
+// Verify do loop is performing scalable vectorization
+void for_test_scalable(int *List, int Length) {
+#pragma clang loop vectorize_width(16, scalable) interleave_count(4) unroll(disable) distribute(disable)
+  for (int i = 0; i < Length; i++) {
+// CHECK: br label {{.*}}, !llvm.loop ![[LOOP_1:.*]]
+List[i] = i * 2;
+  }
+}
+
+// CHECK: ![[LOOP_1]] = distinct !{![[LOOP_1]], ![[UNROLL_DISABLE:.*]], ![[DISTRIBUTE_DISABLE:.*]], ![[WIDTH_16_SCALABLE:.*]], ![[INTERLEAVE_4:.*]], ![[VECTORIZE_ENABLE:.*]]}
+// CHECK: ![[UNROLL_DISABLE]] = !{!"llvm.loop.unroll.disable"}
+// CHECK: ![[DISTRIBUTE_DISABLE]] = !{!"llvm.loop.distribute.enable", i1 false}
+// CHECK: ![[WIDTH_16_SCALABLE]] = !{!"llvm.loop.vectorize.width", ![[ELEMENT_COUNT_16_SCALABLE:.*]]}
+// CHECK: ![[ELEMENT_COUNT_16_SCALABLE]] = !{i32 16, i1 true}
+// CHECK: ![[INTERLEAVE_4]] = !{!"llvm.loop.interleave.count", i32 4}
+// CHECK: ![[VECTORIZE_ENABLE]] = !{!"llvm.loop.vectorize.enable", i1 true}
Index: clang/test/CodeGenCXX/pragma-loop.cpp
===
--- clang/test/CodeGenCXX/pragma-loop.cpp
+++ clang/test/CodeGenCXX/pragma-loop.cpp
@@ -1,4 +1,5 @@
-// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++11 -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++11 -emit-llvm -o - %s 2>%t | FileCheck %s
+// RUN: FileCheck --check-prefix=CHECK-SCALABLE %s < %t
 
 // Verify while loop is recognized after sequence of pragma clang loop directives.
 void while_test(int *List, int Length) {
@@ -158,6 +159,26 @@
   for_template_constant_expression_test(List, Length);
 }
 
+// Verify for loop is performing fixed width vectorization
+void for_test_fixed(int *List, int Length) {
+#pragma clang loop vectorize_width(16, fixed) interleave_count(4) unroll(disable) distribute(disable)
+  for (int i = 0; i < Length; i++) {
+// CHECK: br label {{.*}}, !llvm.loop ![[LOOP_15:.*]]
+List[i] = i * 2;
+  }
+}
+
+// Verify for loop rejects scalable vectorization due to lack of target support
+// CHECK-SCALABLE: warning: the 'scalable' property of #pragma vectorize_width is unsupported by the target; assuming 'fixed' instead
+void for_test_scalable(int *List, int Length) {
+#pragma clang loop vectorize_width(16, scalable) interleave_count(4) unroll(disable) distribute(disable)
+  for (int i = 0; i < Length; i++) {
+// CHECK: br label {{.*}}, !llvm.loop ![[LOOP_16:.*]]
+// CHECK-SVE: br label {{.*}}, !llvm.loop ![[LOOP_16_SVE:.*]]
+List[i] = i * 2;
+  }
+}
+
 // CHECK: ![[LOOP_1]] = distinct !{![[LOOP_1]], ![[UNROLL_FULL:.*]]}
 // CHECK: ![[UNROLL_FULL]] = !{!"llvm.loop.unroll.full"}
 
@@ -215,3 +236,8 @@
 
 // CHECK: ![[LOOP_14]] = distinct !{![[LOOP_14]], ![[WIDTH_10:.*]], ![[VECTORIZE_ENABLE]]}
 // CHECK: ![[WIDTH_10]] = !{!"llvm.loop.vectorize.width", i32 10}
+
+// CHECK: ![[LOOP_15]] = distinct !{![[LOOP_15]], ![[UNROLL_DISABLE:.*]], ![[DISTRIBUTE_DISABLE:.*]], ![[WIDTH_16_FIXED:.*]], ![[INTERLEAVE_4:.*]], ![[VECTORIZE_ENABLE:.*]]}
+// CHECK: ![[WIDTH_16_FIXED]] = !{!"llvm.loop.vectorize.width", i32 16}
+
+// CHECK: ![[LOOP_16]] = distinct !{![[LOOP_16]], ![[UNROLL_DISABLE:.*]], ![[DISTRIBUTE_DISABLE:.*]], ![[WIDTH_16_FIXED:.*]], ![[INTERLEAVE_4:.*]], ![[VECTORIZE_ENABLE:.*]]}
Index: clang/lib/Sema/SemaStmtAttr.cpp
===
--- clang/lib/Sema/SemaStmtAttr.cpp
+++ clang/lib/Sema/SemaStmtAttr.cpp
@@ -14,6 +14,7 @@
 #include "clang/Sema/SemaInternal.h"
 #include "clang/AST/ASTContext.h"
 #include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TargetInfo.h"
 #include "clang/Sema/DelayedDiagnostic.h"
 #include "clang/Sema/Lookup.h"
 #include "clang/Sema/ScopeInfo.h"
@@ -139,10 +140,21 @@
LoopHintAttr::PipelineInitiationInterval)
  .Case("distribute", LoopHintAttr::Distribute)
  .Default(LoopHintAttr::Vectorize);
-if (Option == LoopHintAttr::VectorizeWidth ||
-Option == LoopHintAttr::InterleaveCoun

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2020-11-12 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @SjoerdMeijer I think that given we now support scalable vectors we thought 
it made sense to be able to specify whether the user wants 'fixed' or 
'scalable' vectorisation along with the vector width, although without 
specifying the additional property the default continues to remain 'fixed'. 
However, what you said about having a vectorize_scalable pragma is correct and 
we are intending to also add a pragma like this in a future patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2020-10-08 Thread David Sherwood via Phabricator via cfe-commits

david-arm created this revision.
david-arm added reviewers: sdesmalen, ctetreau, fhahn, c-rhodes.
Herald added subscribers: cfe-commits, psnobl, tschuett.
Herald added a reviewer: efriedma.
Herald added a reviewer: aaron.ballman.
Herald added a project: clang.
david-arm requested review of this revision.

This patch adds support for an optional second parameter passed to
the vectorize_width pragma, which indicates if the user wishes
to use fixed width or scalable vectorization. For example the user
can now write something like:

  #pragma clang loop vectorize_width(4, fixed)

or

  #pragma clang loop vectorize_width(4, scalable)

I have added a new 'scalable_numeric' state to the LoopHintAttr class
to indicate whether the numeric vectorization width is scalable or
not. When generating IR we make use of the new format for the
llvm.loop.vectorize.width attribute that allows us to effectively pass
an ElementCount that contains the vectorization factor and a scalable
flag.

Tests were added to

  clang/test/CodeGenCXX/pragma-loop.cpp

for both the 'fixed' and 'scalable' optional parameter.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D89031

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/lib/CodeGen/CGLoopInfo.cpp
  clang/lib/CodeGen/CGLoopInfo.h
  clang/lib/Parse/ParsePragma.cpp
  clang/lib/Sema/SemaStmtAttr.cpp
  clang/test/CodeGenCXX/pragma-loop.cpp

Index: clang/test/CodeGenCXX/pragma-loop.cpp
===
--- clang/test/CodeGenCXX/pragma-loop.cpp
+++ clang/test/CodeGenCXX/pragma-loop.cpp
@@ -158,6 +158,30 @@
   for_template_constant_expression_test(List, Length);
 }
 
+// Verify do loop is performing fixed width vectorization
+void do_test_fixed(int *List, int Length) {
+  int i = 0;
+
+#pragma clang loop vectorize_width(16, fixed) interleave_count(4) unroll(disable) distribute(disable)
+  do {
+// CHECK: br i1 {{.*}}, label {{.*}}, label {{.*}}, !llvm.loop ![[LOOP_15:.*]]
+List[i] = i * 2;
+i++;
+  } while (i < Length);
+}
+
+// Verify do loop is performing scalable vectorization
+void do_test_scalable(int *List, int Length) {
+  int i = 0;
+
+#pragma clang loop vectorize_width(16, scalable) interleave_count(4) unroll(disable) distribute(disable)
+  do {
+// CHECK: br i1 {{.*}}, label {{.*}}, label {{.*}}, !llvm.loop ![[LOOP_16:.*]]
+List[i] = i * 2;
+i++;
+  } while (i < Length);
+}
+
 // CHECK: ![[LOOP_1]] = distinct !{![[LOOP_1]], ![[UNROLL_FULL:.*]]}
 // CHECK: ![[UNROLL_FULL]] = !{!"llvm.loop.unroll.full"}
 
@@ -215,3 +239,10 @@
 
 // CHECK: ![[LOOP_14]] = distinct !{![[LOOP_14]], ![[WIDTH_10:.*]], ![[VECTORIZE_ENABLE]]}
 // CHECK: ![[WIDTH_10]] = !{!"llvm.loop.vectorize.width", i32 10}
+
+// CHECK: ![[LOOP_15]] = distinct !{![[LOOP_15]], ![[UNROLL_DISABLE:.*]], ![[DISTRIBUTE_DISABLE:.*]], ![[WIDTH_16_FIXED:.*]], ![[INTERLEAVE_4:.*]], ![[VECTORIZE_ENABLE:.*]]}
+// CHECK: ![[WIDTH_16_FIXED]] = !{!"llvm.loop.vectorize.width", i32 16}
+
+// CHECK: ![[LOOP_16]] = distinct !{![[LOOP_16]], ![[UNROLL_DISABLE:.*]], ![[DISTRIBUTE_DISABLE:.*]], ![[WIDTH_16_SCALABLE:.*]], ![[INTERLEAVE_4:.*]], ![[VECTORIZE_ENABLE:.*]]}
+// CHECK: ![[WIDTH_16_SCALABLE]] = !{!"llvm.loop.vectorize.width", ![[ELEMENT_COUNT_16_SCALABLE:.*]]}
+// CHECK: ![[ELEMENT_COUNT_16_SCALABLE]] = !{i32 16, i32 1}
Index: clang/lib/Sema/SemaStmtAttr.cpp
===
--- clang/lib/Sema/SemaStmtAttr.cpp
+++ clang/lib/Sema/SemaStmtAttr.cpp
@@ -139,10 +139,17 @@
LoopHintAttr::PipelineInitiationInterval)
  .Case("distribute", LoopHintAttr::Distribute)
  .Default(LoopHintAttr::Vectorize);
-if (Option == LoopHintAttr::VectorizeWidth ||
-Option == LoopHintAttr::InterleaveCount ||
-Option == LoopHintAttr::UnrollCount ||
-Option == LoopHintAttr::PipelineInitiationInterval) {
+if (Option == LoopHintAttr::VectorizeWidth) {
+  assert(ValueExpr && "Attribute must have a valid value expression.");
+  if (S.CheckLoopHintExpr(ValueExpr, St->getBeginLoc()))
+return nullptr;
+  if (StateLoc && StateLoc->Ident && StateLoc->Ident->isStr("scalable"))
+State = LoopHintAttr::ScalableNumeric;
+  else
+State = LoopHintAttr::Numeric;
+} else if (Option == LoopHintAttr::InterleaveCount ||
+   Option == LoopHintAttr::UnrollCount ||
+   Option == LoopHintAttr::PipelineInitiationInterval) {
   assert(ValueExpr && "Attribute must have a valid value expression.");
   if (S.CheckLoopHintExpr(ValueExpr, St->getBeginLoc()))
 return nullptr;
Index: clang/lib/Parse/ParsePragma.cpp
===
--- clang/lib/Parse/ParsePragma.cpp
+++ clang/lib/Parse/ParsePragma.cpp
@@ -1093,7 +1093,6 @@
   assert(Tok.is(tok::annot_prag

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2020-10-19 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 299014.
david-arm added a comment.

- Rebase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/lib/CodeGen/CGLoopInfo.cpp
  clang/lib/CodeGen/CGLoopInfo.h
  clang/lib/Parse/ParsePragma.cpp
  clang/lib/Sema/SemaStmtAttr.cpp
  clang/test/CodeGenCXX/pragma-loop.cpp

Index: clang/test/CodeGenCXX/pragma-loop.cpp
===
--- clang/test/CodeGenCXX/pragma-loop.cpp
+++ clang/test/CodeGenCXX/pragma-loop.cpp
@@ -158,6 +158,30 @@
   for_template_constant_expression_test(List, Length);
 }
 
+// Verify do loop is performing fixed width vectorization
+void do_test_fixed(int *List, int Length) {
+  int i = 0;
+
+#pragma clang loop vectorize_width(16, fixed) interleave_count(4) unroll(disable) distribute(disable)
+  do {
+// CHECK: br i1 {{.*}}, label {{.*}}, label {{.*}}, !llvm.loop ![[LOOP_15:.*]]
+List[i] = i * 2;
+i++;
+  } while (i < Length);
+}
+
+// Verify do loop is performing scalable vectorization
+void do_test_scalable(int *List, int Length) {
+  int i = 0;
+
+#pragma clang loop vectorize_width(16, scalable) interleave_count(4) unroll(disable) distribute(disable)
+  do {
+// CHECK: br i1 {{.*}}, label {{.*}}, label {{.*}}, !llvm.loop ![[LOOP_16:.*]]
+List[i] = i * 2;
+i++;
+  } while (i < Length);
+}
+
 // CHECK: ![[LOOP_1]] = distinct !{![[LOOP_1]], ![[UNROLL_FULL:.*]]}
 // CHECK: ![[UNROLL_FULL]] = !{!"llvm.loop.unroll.full"}
 
@@ -215,3 +239,10 @@
 
 // CHECK: ![[LOOP_14]] = distinct !{![[LOOP_14]], ![[WIDTH_10:.*]], ![[VECTORIZE_ENABLE]]}
 // CHECK: ![[WIDTH_10]] = !{!"llvm.loop.vectorize.width", i32 10}
+
+// CHECK: ![[LOOP_15]] = distinct !{![[LOOP_15]], ![[UNROLL_DISABLE:.*]], ![[DISTRIBUTE_DISABLE:.*]], ![[WIDTH_16_FIXED:.*]], ![[INTERLEAVE_4:.*]], ![[VECTORIZE_ENABLE:.*]]}
+// CHECK: ![[WIDTH_16_FIXED]] = !{!"llvm.loop.vectorize.width", i32 16}
+
+// CHECK: ![[LOOP_16]] = distinct !{![[LOOP_16]], ![[UNROLL_DISABLE:.*]], ![[DISTRIBUTE_DISABLE:.*]], ![[WIDTH_16_SCALABLE:.*]], ![[INTERLEAVE_4:.*]], ![[VECTORIZE_ENABLE:.*]]}
+// CHECK: ![[WIDTH_16_SCALABLE]] = !{!"llvm.loop.vectorize.width", ![[ELEMENT_COUNT_16_SCALABLE:.*]]}
+// CHECK: ![[ELEMENT_COUNT_16_SCALABLE]] = !{i32 16, i32 1}
Index: clang/lib/Sema/SemaStmtAttr.cpp
===
--- clang/lib/Sema/SemaStmtAttr.cpp
+++ clang/lib/Sema/SemaStmtAttr.cpp
@@ -139,10 +139,17 @@
LoopHintAttr::PipelineInitiationInterval)
  .Case("distribute", LoopHintAttr::Distribute)
  .Default(LoopHintAttr::Vectorize);
-if (Option == LoopHintAttr::VectorizeWidth ||
-Option == LoopHintAttr::InterleaveCount ||
-Option == LoopHintAttr::UnrollCount ||
-Option == LoopHintAttr::PipelineInitiationInterval) {
+if (Option == LoopHintAttr::VectorizeWidth) {
+  assert(ValueExpr && "Attribute must have a valid value expression.");
+  if (S.CheckLoopHintExpr(ValueExpr, St->getBeginLoc()))
+return nullptr;
+  if (StateLoc && StateLoc->Ident && StateLoc->Ident->isStr("scalable"))
+State = LoopHintAttr::ScalableNumeric;
+  else
+State = LoopHintAttr::Numeric;
+} else if (Option == LoopHintAttr::InterleaveCount ||
+   Option == LoopHintAttr::UnrollCount ||
+   Option == LoopHintAttr::PipelineInitiationInterval) {
   assert(ValueExpr && "Attribute must have a valid value expression.");
   if (S.CheckLoopHintExpr(ValueExpr, St->getBeginLoc()))
 return nullptr;
Index: clang/lib/Parse/ParsePragma.cpp
===
--- clang/lib/Parse/ParsePragma.cpp
+++ clang/lib/Parse/ParsePragma.cpp
@@ -1193,6 +1193,24 @@
 
 ExprResult R = ParseConstantExpression();
 
+if (OptionInfo && OptionInfo->getName() == "vectorize_width" &&
+Tok.is(tok::comma)) {
+  PP.Lex(Tok); // ,
+
+  SourceLocation StateLoc = Tok.getLocation();
+  IdentifierInfo *StateInfo = Tok.getIdentifierInfo();
+  StringRef IsScalableStr = StateInfo->getName();
+
+  if (IsScalableStr != "scalable" && IsScalableStr != "fixed") {
+Diag(Tok.getLocation(), diag::err_pragma_loop_invalid_vectorize_option);
+return false;
+  }
+  PP.Lex(Tok); // Identifier
+
+  Hint.StateLoc =
+  IdentifierLoc::create(Actions.Context, StateLoc, StateInfo);
+}
+
 // Tokens following an error in an ill-formed constant expression will
 // remain in the token stream and must be removed.
 if (Tok.isNot(tok::eof)) {
Index: clang/lib/CodeGen/CGLoopInfo.h
===
--- clang/lib/CodeGen/CGLoopInfo.h
+++ clang/lib/CodeGen/CGLoopInfo.h
@@ -19,6 +

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2020-10-19 Thread David Sherwood via Phabricator via cfe-commits

david-arm marked an inline comment as done.
david-arm added inline comments.



Comment at: clang/lib/Sema/SemaStmtAttr.cpp:144
+  assert(ValueExpr && "Attribute must have a valid value expression.");
+  if (S.CheckLoopHintExpr(ValueExpr, St->getBeginLoc()))
+return nullptr;

fhahn wrote:
> Is there a way to only accept `fixed_width/scalable` for targets that support 
> it? Not sure if we have enough information here, but we might be able to 
> reject it eg per target basis or something
Hi @fhahn, I think if possible we'd prefer not to reject scalable vectors at 
this point. Theoretically there is no reason why we can't perform scalable 
vectorisation for targets that don't have hardware support for scalable 
vectors. In this case it simply means that vscale is 1. If you want we could 
add some kind of opt-remark in the vectoriser that says something like "target 
does not support scalable vectors, vectorising for vscale=1"?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90230: [SVE] Add fatal error for unnamed SVE variadic arguments

2020-10-30 Thread David Sherwood via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGcea69fa4dcc4: [SVE] Add fatal error for unnamed SVE variadic 
arguments (authored by david-arm).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90230/new/

https://reviews.llvm.org/D90230

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/aarch64-varargs-sve.c
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/test/CodeGen/AArch64/sve-varargs-callee-broken.ll
  llvm/test/CodeGen/AArch64/sve-varargs-caller-broken.ll
  llvm/test/CodeGen/AArch64/sve-varargs.ll

Index: llvm/test/CodeGen/AArch64/sve-varargs.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-varargs.ll
@@ -0,0 +1,26 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s 2>%t | FileCheck %s
+; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t
+
+; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
+; WARN-NOT: warning
+
+declare i32 @sve_printf(i8*, , ...)
+
+@.str_1 = internal constant [6 x i8] c"boo!\0A\00"
+
+define void @foo( %x) {
+; CHECK-LABEL: foo:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT:.cfi_def_cfa_offset 16
+; CHECK-NEXT:.cfi_offset w30, -16
+; CHECK-NEXT:adrp x0, .str_1
+; CHECK-NEXT:add x0, x0, :lo12:.str_1
+; CHECK-NEXT:bl sve_printf
+; CHECK-NEXT:ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT:ret
+  %f = getelementptr [6 x i8], [6 x i8]* @.str_1, i64 0, i64 0
+  call i32 (i8*, , ...) @sve_printf(i8* %f,  %x)
+  ret void
+}
Index: llvm/test/CodeGen/AArch64/sve-varargs-caller-broken.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-varargs-caller-broken.ll
@@ -0,0 +1,12 @@
+; RUN: not --crash llc -mtriple aarch64-linux-gnu -mattr=+sve <%s 2>&1 | FileCheck %s
+
+declare i32 @sve_printf(i8*, , ...)
+
+@.str_1 = internal constant [6 x i8] c"boo!\0A\00"
+
+; CHECK: Passing SVE types to variadic functions is currently not supported
+define void @foo( %x) {
+  %f = getelementptr [6 x i8], [6 x i8]* @.str_1, i64 0, i64 0
+  call i32 (i8*, , ...) @sve_printf(i8* %f,  %x,  %x)
+  ret void
+}
Index: llvm/test/CodeGen/AArch64/sve-varargs-callee-broken.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-varargs-callee-broken.ll
@@ -0,0 +1,22 @@
+; RUN: not --crash llc -mtriple arm64-apple-ios7 -mattr=+sve < %s 2>&1 | FileCheck %s
+
+; CHECK: Passing SVE types to variadic functions is currently not supported
+
+@.str = private unnamed_addr constant [4 x i8] c"fmt\00", align 1
+define void @foo(i8* %fmt, ...) nounwind {
+entry:
+  %fmt.addr = alloca i8*, align 8
+  %args = alloca i8*, align 8
+  %vc = alloca i32, align 4
+  %vv = alloca , align 16
+  store i8* %fmt, i8** %fmt.addr, align 8
+  %args1 = bitcast i8** %args to i8*
+  call void @llvm.va_start(i8* %args1)
+  %0 = va_arg i8** %args, i32
+  store i32 %0, i32* %vc, align 4
+  %1 = va_arg i8** %args, 
+  store  %1, * %vv, align 16
+  ret void
+}
+
+declare void @llvm.va_start(i8*) nounwind
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -4807,6 +4807,10 @@
 
 for (unsigned i = 0; i != NumArgs; ++i) {
   MVT ArgVT = Outs[i].VT;
+  if (!Outs[i].IsFixed && ArgVT.isScalableVector())
+report_fatal_error("Passing SVE types to variadic functions is "
+   "currently not supported");
+
   ISD::ArgFlagsTy ArgFlags = Outs[i].Flags;
   CCAssignFn *AssignFn = CCAssignFnForCall(CallConv,
/*IsVarArg=*/ !Outs[i].IsFixed);
@@ -6606,6 +6610,10 @@
   Chain = VAList.getValue(1);
   VAList = DAG.getZExtOrTrunc(VAList, DL, PtrVT);
 
+  if (VT.isScalableVector())
+report_fatal_error("Passing SVE types to variadic functions is "
+   "currently not supported");
+
   if (Align && *Align > MinSlotSize) {
 VAList = DAG.getNode(ISD::ADD, DL, PtrVT, VAList,
  DAG.getConstant(Align->value() - 1, DL, PtrVT));
Index: clang/test/CodeGen/aarch64-varargs-sve.c
===
--- /dev/null
+++ clang/test/CodeGen/aarch64-varargs-sve.c
@@ -0,0 +1,21 @@
+// REQUIRES: aarch64-registered-target
+// RUN: not %clang_cc1 -triple aarch64-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -emit-llvm -o - %s 2>&1 | FileCheck %s
+// RUN: not %clang_cc1 -triple arm64-apple-ios7 -target-

[PATCH] D109883: [Analysis] Add support for vscale in computeKnownBitsFromOperator

2021-09-20 Thread David Sherwood via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGf988f680649a: [Analysis] Add support for vscale in 
computeKnownBitsFromOperator (authored by david-arm).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109883/new/

https://reviews.llvm.org/D109883

Files:
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntd.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cnth.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntw.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_len-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_len.c
  llvm/lib/Analysis/ValueTracking.cpp
  llvm/test/Transforms/InstCombine/icmp-vscale.ll
  llvm/test/Transforms/InstSimplify/vscale.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Index: llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
===
--- llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
+++ llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
@@ -17,8 +17,8 @@
 ; CHECK:   vector.body:
 ; CHECK-NEXT:[[POINTER_PHI:%.*]] = phi i32* [ %c, %vector.ph ], [ %[[PTR_IND:.*]], %vector.body ]
 ; CHECK: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:[[TMP6:%.*]] = shl i64 [[TMP5]], 2
-; CHECK-NEXT:[[TMP7:%.*]] = shl i64 [[TMP5]], 4
+; CHECK-NEXT:[[TMP6:%.*]] = shl nuw nsw i64 [[TMP5]], 2
+; CHECK-NEXT:[[TMP7:%.*]] = shl nuw nsw i64 [[TMP5]], 4
 ; CHECK-NEXT:[[TMP8:%.*]] = call  @llvm.experimental.stepvector.nxv4i64()
 ; CHECK-NEXT:[[VECTOR_GEP:%.*]] = shl  [[TMP8]], shufflevector ( insertelement ( poison, i64 1, i32 0),  poison,  zeroinitializer)
 ; CHECK-NEXT:[[TMP9:%.*]] = getelementptr i32, i32* [[POINTER_PHI]],  [[VECTOR_GEP]]
@@ -80,16 +80,16 @@
 ; CHECK-NEXT:%[[LPTR1:.*]] = bitcast i32* %[[LGEP1]] to *
 ; CHECK-NEXT:%{{.*}} = load , * %[[LPTR1]], align 4
 ; CHECK-NEXT:%[[VSCALE1:.*]] = call i32 @llvm.vscale.i32()
-; CHECK-NEXT:%[[TMP1:.*]] = shl i32 %[[VSCALE1]], 2
-; CHECK-NEXT:%[[TMP2:.*]] = sext i32 %[[TMP1]] to i64
+; CHECK-NEXT:%[[TMP1:.*]] = shl nuw nsw i32 %[[VSCALE1]], 2
+; CHECK-NEXT:%[[TMP2:.*]] = zext i32 %[[TMP1]] to i64
 ; CHECK-NEXT:%[[LGEP2:.*]] = getelementptr i32, i32* %[[LGEP1]], i64 %[[TMP2]]
 ; CHECK-NEXT:%[[LPTR2:.*]] = bitcast i32* %[[LGEP2]] to *
 ; CHECK-NEXT:%{{.*}} = load , * %[[LPTR2]], align 4
 ; CHECK: %[[SPTR1:.*]] = bitcast i32* %[[SGEP1]] to *
 ; CHECK-NEXT:store  %{{.*}}, * %[[SPTR1]], align 4
 ; CHECK-NEXT:%[[VSCALE2:.*]] = call i32 @llvm.vscale.i32()
-; CHECK-NEXT:%[[TMP3:.*]] = shl i32 %[[VSCALE2]], 2
-; CHECK-NEXT:%[[TMP4:.*]] = sext i32 %[[TMP3]] to i64
+; CHECK-NEXT:%[[TMP3:.*]] = shl nuw nsw i32 %[[VSCALE2]], 2
+; CHECK-NEXT:%[[TMP4:.*]] = zext i32 %[[TMP3]] to i64
 ; CHECK-NEXT:%[[SGEP2:.*]] = getelementptr i32, i32* %[[SGEP1]], i64 %[[TMP4]]
 ; CHECK-NEXT:%[[SPTR2:.*]] = bitcast i32* %[[SGEP2]] to *
 ; CHECK-NEXT:store  %{{.*}}, * %[[SPTR2]], align 4
@@ -133,7 +133,7 @@
 ; CHECK-NEXT:  %[[APTRS1:.*]] = getelementptr i32, i32* %a,  %[[VECIND1]]
 ; CHECK-NEXT:  %[[GEPA1:.*]] = getelementptr i32, i32* %a, i64 %[[IDX]]
 ; CHECK-NEXT:  %[[VSCALE64:.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:  %[[VSCALE64X2:.*]] = shl i64 %[[VSCALE64]], 1
+; CHECK-NEXT:  %[[VSCALE64X2:.*]] = shl nuw nsw i64 %[[VSCALE64]], 1
 ; CHECK-NEXT:  %[[TMP3:.*]] = insertelement  poison, i64 %[[VSCALE64X2]], i32 0
 ; CHECK-NEXT:  %[[TMP4:.*]] = shufflevector  %[[TMP3]],  poison,  zeroinitializer
 ; CHECK-NEXT:  %[[TMP5:.*]] = add  %[[TMP4]], %[[STEPVEC]]
@@ -147,8 +147,8 @@
 ; CHECK:   %[[BPTR1:.*]] = bitcast i32** %[[GEPB1]] to *
 ; CHECK-NEXT:  store  %[[APTRS1]], * %[[BPTR1]], align 8
 ; CHECK:   %[[VSCALE32:.*]] = call i32 @llvm.vscale.i32()
-; CHECK-NEXT:  %[[VSCALE32X2:.*]] = shl i32 %[[VSCALE32]], 1
-; CHECK-NEXT:  %[[TMP6:.*]] = sext i32 %[[VSCALE32X2]] to i64
+; CHECK-NEXT:  %[[VSCALE32X2:.*]] = shl nuw nsw i32 %[[VSCALE32]], 1
+; CHECK-NEXT:  %[[TMP6:.*]] = zext i32 %[[VSCALE32X2]] to i64
 ; CHECK-NEXT:  %[[GEPB2:.*]] = getelementptr i32*, i32** %[[GEPB1]], i64 %[[TMP6]]
 ; CHECK-NEXT:  %[[BPTR2:.*]] = bitcast i32** %[[GEPB2]] to *
 ; CHECK-NEXT   store  %[[APTRS2]], * %[[BPTR2]], align 8
Index: llvm/test/Transforms/InstSimplify/vscale.ll
===
--- llvm/test/Transforms/InstSimplify/vscale.ll
+++ llvm/test/Transforms/InstSimplify/vscale.ll
@@ -128,6 +128,18 @@
   ret i32 %r
 }
 
+; Known values of vscale intrinsic
+
+define i64 @vscale64_range4_4() vscale_range(4,4) {
+; CHECK-LABEL: @vscale64_range4_4(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret i64

[PATCH] D110258: [AArch64][Clang] Always add -tune-cpu argument to -cc1 driver

2021-09-22 Thread David Sherwood via Phabricator via cfe-commits

david-arm created this revision.
david-arm added reviewers: sdesmalen, c-rhodes, peterwaller-arm, dmgreen.
Herald added a subscriber: kristof.beyls.
david-arm requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This patch ensures that we always tune for a given CPU on AArch64
targets. If the user explicitly specified the CPU to tune for we
use that, otherwise if the "-mcpu=" flag was not set we tune for
a generic CPU.

Tests added here:

  clang/test/Driver/aarch64-mtune.c


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D110258

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/aarch64-mtune.c


Index: clang/test/Driver/aarch64-mtune.c
===
--- /dev/null
+++ clang/test/Driver/aarch64-mtune.c
@@ -0,0 +1,42 @@
+// Ensure we support the -mtune flag.
+
+// Default mtune should be generic.
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=notune
+// notune: "-tune-cpu" "generic"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mtune=generic 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=generic
+// generic: "-tune-cpu" "generic"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mtune=neoverse-n1 
2>&1 \
+// RUN:   | FileCheck %s -check-prefix=neoverse-n1
+// neoverse-n1: "-tune-cpu" "neoverse-n1"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mtune=thunderx2t99 
2>&1 \
+// RUN:   | FileCheck %s -check-prefix=thunderx2t99
+// thunderx2t99: "-tune-cpu" "thunderx2t99"
+
+// Check interaction between march and mtune.
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -march=armv8-a 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=marcharmv8a
+// marcharmv8a: "-target-cpu" "generic"
+// marcharmv8a: "-tune-cpu" "generic"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -march=armv8-a 
-mtune=cortex-a75 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=marcharmv8a-a75
+// marcharmv8a-a75: "-target-cpu" "generic"
+// marcharmv8a-a75: "-tune-cpu" "cortex-a75"
+
+// Check interaction between mcpu and mtune.
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mcpu=thunderx 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=mcputhunderx
+// mcputhunderx: "-target-cpu" "thunderx"
+// mcputhunderx-NOT: "-tune-cpu"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mcpu=cortex-a75 
-mtune=cortex-a57 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=mcpua75-mtunea57
+// mcpua75-mtunea57: "-target-cpu" "cortex-a75"
+// mcpua75-mtunea57: "-tune-cpu" "cortex-a57"
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -1837,6 +1837,27 @@
   }
 
   AddAAPCSVolatileBitfieldArgs(Args, CmdArgs);
+
+  if (const Arg *A = Args.getLastArg(clang::driver::options::OPT_mtune_EQ)) {
+StringRef Name = A->getValue();
+
+std::string TuneCPU;
+if (Name == "native") {
+  Name = llvm::sys::getHostCPUName();
+  if (!Name.empty())
+TuneCPU = std::string(Name);
+  else
+TuneCPU = "generic";
+} else
+  TuneCPU = std::string(Name);
+
+CmdArgs.push_back("-tune-cpu");
+CmdArgs.push_back(Args.MakeArgString(TuneCPU));
+  }
+  else if (!Args.getLastArg(clang::driver::options::OPT_mcpu_EQ)) {
+CmdArgs.push_back("-tune-cpu");
+CmdArgs.push_back("generic");
+  }
 }
 
 void Clang::AddMIPSTargetArgs(const ArgList &Args,


Index: clang/test/Driver/aarch64-mtune.c
===
--- /dev/null
+++ clang/test/Driver/aarch64-mtune.c
@@ -0,0 +1,42 @@
+// Ensure we support the -mtune flag.
+
+// Default mtune should be generic.
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=notune
+// notune: "-tune-cpu" "generic"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mtune=generic 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=generic
+// generic: "-tune-cpu" "generic"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mtune=neoverse-n1 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=neoverse-n1
+// neoverse-n1: "-tune-cpu" "neoverse-n1"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mtune=thunderx2t99 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=thunderx2t99
+// thunderx2t99: "-tune-cpu" "thunderx2t99"
+
+// Check interaction between march and mtune.
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -march=armv8-a 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=marcharmv8a
+// marcharmv8a: "-target-cpu" "generic"
+// marcharmv8a: "-tune-cpu" "generic"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -march=armv8-a -mtune=cortex-a75 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=marcharmv8a-a75
+// marcharmv8a-a75: "-target

[PATCH] D110258: [AArch64][Clang] Always add -tune-cpu argument to -cc1 driver

2021-09-22 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @dmgreen, this is specifically being introduced for SVE targets to help make 
informed cost model decisions regarding the value of vscale - see D110259 
. We thought that using the "tune-cpu" 
attribute might be a good way of doing this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D110258: [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-09-27 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 375169.
david-arm retitled this revision from "[AArch64][Clang] Always add -tune-cpu 
argument to -cc1 driver" to "[AArch64] Always add -tune-cpu argument to -cc1 
driver".
david-arm edited the summary of this revision.
david-arm added a comment.
Herald added subscribers: llvm-commits, hiraditya.
Herald added a project: LLVM.

- Updated the patch to use mtune properly and set the scheduling model 
accordingly


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/aarch64-mtune.c
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/unittests/Target/AArch64/InstSizes.cpp
  llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp

Index: llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
===
--- llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
+++ llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
@@ -26,6 +26,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/unittests/Target/AArch64/InstSizes.cpp
===
--- llvm/unittests/Target/AArch64/InstSizes.cpp
+++ llvm/unittests/Target/AArch64/InstSizes.cpp
@@ -29,6 +29,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
===
--- llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -354,10 +354,13 @@
 const AArch64Subtarget *
 AArch64TargetMachine::getSubtargetImpl(const Function &F) const {
   Attribute CPUAttr = F.getFnAttribute("target-cpu");
+  Attribute TuneAttr = F.getFnAttribute("tune-cpu");
   Attribute FSAttr = F.getFnAttribute("target-features");
 
   std::string CPU =
   CPUAttr.isValid() ? CPUAttr.getValueAsString().str() : TargetCPU;
+  std::string TuneCPU =
+  TuneAttr.isValid() ? TuneAttr.getValueAsString().str() : CPU;
   std::string FS =
   FSAttr.isValid() ? FSAttr.getValueAsString().str() : TargetFS;
 
@@ -398,6 +401,7 @@
   Key += "SVEMax";
   Key += std::to_string(MaxSVEVectorSize);
   Key += CPU;
+  Key += TuneCPU;
   Key += FS;
 
   auto &I = SubtargetMap[Key];
@@ -406,7 +410,7 @@
 // creation will depend on the TM and the code generation flags on the
 // function that reside in TargetOptions.
 resetTargetOptions(F);
-I = std::make_unique(TargetTriple, CPU, FS, *this,
+I = std::make_unique(TargetTriple, CPU, TuneCPU, FS, *this,
isLittle, MinSVEVectorSize,
MaxSVEVectorSize);
   }
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -293,7 +293,8 @@
   /// passed in feature string so that we can use initializer lists for
   /// subtarget initialization.
   AArch64Subtarget &initializeSubtargetDependencies(StringRef FS,
-StringRef CPUString);
+StringRef CPUString,
+StringRef TuneCPUString);
 
   /// Initialize properties based on the selected processor family.
   void initializeProperties();
@@ -301,7 +302,7 @@
 public:
   /// This constructor initializes the data members to match that
   /// of the specified triple.
-  AArch64Subtarget(const Triple &TT, const std::string &CPU,
+  AArch64Subtarget(const Triple &TT, const std::string &CPU, const std::string &TuneCPU,
const std::string &FS, const TargetMachine &TM,
bool LittleEndian,
unsigned MinSVEVectorSizeInBitsOverride = 0,
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -52,13 +52,17 @@
 
 AArch64Subtarget &
 AArch64Subtarget::initializeSubtargetDependencies(StringRef FS,
-

[PATCH] D110258: [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-09-27 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

In D110258#3023818 , @dmgreen wrote:

> Sounds great.  Glad to see us taking this route.
>
> Unfortunately I think we do need to split the subtargetfeatures up into arch 
> flags and tune flags. Same for the details in 
> AArch64Subtarget::initializeProperties. It is hopefully a fairly mechanical 
> process, but they are an important part of tuning and without them -mtune is 
> only a half-working option.
>
> Are you happy to give that a go too?

Hi @dmgreen, sure I can try. The only problem is that I don't really understand 
what to do here. I used the X86Subtarget as a guidance and the TuneCPU flag 
only seems to be used for scheduling and nothing else. It's not obvious to me 
how the TuneCPU flag decides the features as I thought it was purely for 
scheduling?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D110258: [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-09-27 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @dmgreen, would you be happy for me to do the splitting-out of arch and 
tuning features in a separate follow-on patch? I think it's a good idea and I 
don't object to doing it, but I'm not sure that it really needs to hold up this 
initial patch? I personally think it makes sense to live in a separate patch 
because it seems riskier in terms of possible effects on performance. As far as 
I understand it, this isn't a functional requirement, but more for completeness.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96270: [release][docs] Update contributions to LLVM 12 for scalable vectors.

2021-02-08 Thread David Sherwood via Phabricator via cfe-commits

david-arm created this revision.
david-arm added reviewers: sdesmalen, willlovett, c-rhodes.
david-arm requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D96270

Files:
  clang/docs/ReleaseNotes.rst


Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -144,6 +144,17 @@
 
 - ...
 
+Modified Pragmas in Clang
+-
+
+- The #pragma "clang loop vectorize_width" has been extended to support an
+  optional 'fixed|scalable' argument, which can be used to indicate that the
+  compiler should use fixed-width or scalable vectorization.  Fixed-width is
+  assumed by default.  Scalable vectorization is an experimental feature for
+  targets that support it, such as Arm targets that support the Scalable Vector
+  Extension (SVE) feature. For more information please refer to the
+  Clang Language Extensions documentation.
+
 Attribute Changes in Clang
 --
 


Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -144,6 +144,17 @@
 
 - ...
 
+Modified Pragmas in Clang
+-
+
+- The #pragma "clang loop vectorize_width" has been extended to support an
+  optional 'fixed|scalable' argument, which can be used to indicate that the
+  compiler should use fixed-width or scalable vectorization.  Fixed-width is
+  assumed by default.  Scalable vectorization is an experimental feature for
+  targets that support it, such as Arm targets that support the Scalable Vector
+  Extension (SVE) feature. For more information please refer to the
+  Clang Language Extensions documentation.
+
 Attribute Changes in Clang
 --
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96270: [release][docs] Update contributions to LLVM 12 for scalable vectors.

2021-02-09 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 322316.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96270/new/

https://reviews.llvm.org/D96270

Files:
  clang/docs/ReleaseNotes.rst


Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -144,6 +144,18 @@
 
 - ...
 
+Modified Pragmas in Clang
+-
+
+- The "#pragma clang loop vectorize_width" has been extended to support an
+  optional 'fixed|scalable' argument, which can be used to indicate that the
+  compiler should use fixed-width or scalable vectorization.  Fixed-width is
+  assumed by default.
+
+  Scalable or vector length agnostic vectorization is an experimental feature
+  for targets that support scalable vectors. For more information please refer
+  to the Clang Language Extensions documentation.
+
 Attribute Changes in Clang
 --
 


Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -144,6 +144,18 @@
 
 - ...
 
+Modified Pragmas in Clang
+-
+
+- The "#pragma clang loop vectorize_width" has been extended to support an
+  optional 'fixed|scalable' argument, which can be used to indicate that the
+  compiler should use fixed-width or scalable vectorization.  Fixed-width is
+  assumed by default.
+
+  Scalable or vector length agnostic vectorization is an experimental feature
+  for targets that support scalable vectors. For more information please refer
+  to the Clang Language Extensions documentation.
+
 Attribute Changes in Clang
 --
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96270: [release][docs] Update contributions to LLVM 12 for scalable vectors.

2021-02-18 Thread David Sherwood via Phabricator via cfe-commits

david-arm closed this revision.
david-arm added a comment.

Merged to LLVM Release 12.x branch


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96270/new/

https://reviews.llvm.org/D96270

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D108138: [SimplifyCFG] Remove switch statements before vectorization

2021-08-17 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

In D108138#2947229 , @lebedev.ri 
wrote:

> I'm not sure i'm sold on this, even though i'm aware that selects hurt 
> vectorization.
> How does this Simplify the CFG? I think it would be best to teach LV selects,
> or at worst do this in LV itself.

Hi @lebedev.ri, I'm under the impression that the vectoriser has a policy of 
never making scalar transformations so I doubt it would be acceptable to do 
this in the vectoriser pass. I think the only realistic alternative is to teach 
LV how to vectorise switch statements and create the vector compares and 
selects directly in the code, or scalarise them in the vector loop with 
creation of new blocks. @fhahn and @craig.topper do you have any thoughts on 
this or preference?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108138/new/

https://reviews.llvm.org/D108138

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D108138: [SimplifyCFG] Remove switch statements before vectorization

2021-08-17 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

In D108138#2948975 , @dmgreen wrote:

>> I'm under the impression that the vectoriser has a policy of never making 
>> scalar transformations
>
> I'm not sure what you mean. I've not looked into the details, but it could 
> presumably be done as some sort of VPlan transformation, possibly in the 
> constructions of vplans to treat switches like multiple icmp's/branches?

Hi @dmgreen, I just meant that if LV makes a scalar transformation prior to 
legality/cost-model checks, then for some reason we don't vectorise, we then 
end up with a changed scalar body without any vectorisation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108138/new/

https://reviews.llvm.org/D108138

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D104852: [AArch64][SVEIntrinsicOpts] Convect cntb/h/w/d to vscale intrinsic or constant.

2021-06-28 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @junparser, the patch looks sensible to me! I just had a couple of minor 
comments if that's ok.




Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:658
+return IC.replaceInstUsesWith(II, VScale);
+  } else if (Pattern >= AArch64SVEPredPattern::vl1 &&
+ Pattern <= AArch64SVEPredPattern::vl8 && NumElts >= Pattern) {

nit: Do you need to fix the clang-tidy warning here?



Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:662
+return IC.replaceInstUsesWith(II, StepVal);
+  } else if (Pattern == AArch64SVEPredPattern::vl16 && NumElts == 16) {
+Constant *StepVal = ConstantInt::get(II.getType(), NumElts);

Could you potentially fold these two cases into one somehow? Maybe with a 
switch-case statement? I'm just imagining a situation where we might want other 
patterns too like vl32, vl64, etc.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104852/new/

https://reviews.llvm.org/D104852

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D104852: [AArch64][SVEIntrinsicOpts] Convect cntb/h/w/d to vscale intrinsic or constant.

2021-06-28 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:671
+  case AArch64SVEPredPattern::vl8:
+return NumElts >= Pattern
+   ? Optional(IC.replaceInstUsesWith(

I was actually wondering if we could commonise this code somehow. Perhaps by 
setting a MinNumElts variable in the case statements, i.e.

  unsigned MinNumElts;
  ...
  case AArch64SVEPredPattern::vl8:
MinNumElts = Pattern;
break;
  case AArch64SVEPredPattern::vl16:
MinNumElts = 16;
break;
  }

  if (NumElts < MinNumElts) return None;

  return Optional(IC.replaceInstUsesWith(
   II, ConstantInt::get(II.getType(), NumElts)));



Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:662
+return IC.replaceInstUsesWith(II, StepVal);
+  } else if (Pattern == AArch64SVEPredPattern::vl16 && NumElts == 16) {
+Constant *StepVal = ConstantInt::get(II.getType(), NumElts);

junparser wrote:
> david-arm wrote:
> > Could you potentially fold these two cases into one somehow? Maybe with a 
> > switch-case statement? I'm just imagining a situation where we might want 
> > other patterns too like vl32, vl64, etc.
> > 
> There is no other special pattern except vl16. But I do think switch-case is 
> more straightforward
OK, thanks for making this a switch statement. I was just thinking that in the 
developer manual we say we also support vl1-vl256 so at some point we may add 
more enums in LLVM too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104852/new/

https://reviews.llvm.org/D104852

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D104852: [AArch64][SVEIntrinsicOpts] Convect cntb/h/w/d to vscale intrinsic or constant.

2021-06-30 Thread David Sherwood via Phabricator via cfe-commits

david-arm accepted this revision.
david-arm added a comment.
This revision is now accepted and ready to land.

LGTM! Thanks a lot for making the changes. :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104852/new/

https://reviews.llvm.org/D104852

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D113394: [IR] In ConstantFoldShuffleVectorInstruction use zeroinitializer for splats of 0

2021-11-10 Thread David Sherwood via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
david-arm marked an inline comment as done.
Closed by commit rG2a48b6993a97: [IR] In ConstantFoldShuffleVectorInstruction 
use zeroinitializer for splats of 0 (authored by david-arm).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Changed prior to commit:
  https://reviews.llvm.org/D113394?vs=385764&id=386094#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113394/new/

https://reviews.llvm.org/D113394

Files:
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c
  llvm/lib/IR/ConstantFold.cpp
  llvm/test/Bitcode/vscale-round-trip.ll
  llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-cond-inv-loads.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-store.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
  llvm/test/Transforms/LoopVectorize/scalable-inductions.ll
  llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll

Index: llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll
===
--- llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll
+++ llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll
@@ -7,8 +7,8 @@
 ; CHECK-LABEL: @reduction_add_trunc(
 ; CHECK:   vector.body:
 ; CHECK-NEXT:[[INDEX:%.*]] = phi i32 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.*]], %vector.body ]
-; CHECK-NEXT:[[VEC_PHI:%.*]] = phi  [ insertelement ( shufflevector ( insertelement ( poison, i32 0, i32 0),  poison,  zeroinitializer), i32 255, i32 0), %vector.ph ], [ [[TMP34:%.*]], %vector.body ]
-; CHECK-NEXT:[[VEC_PHI1:%.*]] = phi  [ shufflevector ( insertelement ( poison, i32 0, i32 0),  poison,  zeroinitializer), %vector.ph ], [ [[TMP36:%.*]], %vector.body ]
+; CHECK-NEXT:[[VEC_PHI:%.*]] = phi  [ insertelement ( zeroinitializer, i32 255, i32 0), %vector.ph ], [ [[TMP34:%.*]], %vector.body ]
+; CHECK-NEXT:[[VEC_PHI1:%.*]] = phi  [ zeroinitializer, %vector.ph ], [ [[TMP36:%.*]], %vector.body ]
 ; CHECK: [[TMP14:%.*]] = and  [[VEC_PHI]], shufflevector ( insertelement ( poison, i32 255, i32 0),  poison,  zeroinitializer)
 ; CHECK-NEXT:[[TMP15:%.*]] = and  [[VEC_PHI1]], shufflevector ( insertelement ( poison, i32 255, i32 0),  poison,  zeroinitializer)
 ; CHECK: [[WIDE_LOAD:%.*]] = load , *
Index: llvm/test/Transforms/LoopVectorize/scalable-inductions.ll
===
--- llvm/test/Transforms/LoopVectorize/scalable-inductions.ll
+++ llvm/test/Transforms/LoopVectorize/scalable-inductions.ll
@@ -143,7 +143,7 @@
 ; CHECK:  %[[STEPVEC:.*]] = call  @llvm.experimental.stepvector.nxv4i32()
 ; CHECK-NEXT: %[[TMP1:.*]] = uitofp  %[[STEPVEC]] to 
 ; CHECK-NEXT: %[[TMP2:.*]] = fmul  %[[TMP1]], shufflevector ( insertelement ( poison, float 2.00e+00, i32 0),  poison,  zeroinitializer)
-; CHECK-NEXT: %[[INDINIT:.*]] = fadd  %[[TMP2]], shufflevector ( insertelement ( poison, float 0.00e+00, i32 0),  poison,  zeroinitializer)
+; CHECK-NEXT: %[[INDINIT:.*]] = fadd  %[[TMP2]], zeroinitializer
 ; CHECK-NEXT: %[[VSCALE:.*]] = call i32 @llvm.vscale.i32()
 ; CHECK-NEXT: %[[TMP3:.*]] = shl i32 %8, 2
 ; CHECK-NEXT: %[[TMP4:.*]] = uitofp i32 %[[TMP3]] to float
Index: llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
===
--- llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
+++ llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
@@ -47,7 +47,7 @@
 ; CHECK-NEXT:[[TMP6:%.*]] = call  @llvm.experimental.stepvector.nxv2i64()
 ; CHECK-NEXT:[[DOTSPLATINSERT:%.*]] = insertelement  poison, i64 [[INDEX]], i32 0
 ; CHECK-NEXT:[[DOTSPLAT:%.*]] = shufflevector  [[DOTSPLATINSERT]],  poison,  zeroinitializer
-; CHECK-NEXT:[[TMP7:%.*]] = add  shufflevector ( insertelement ( poison, i64 0, i32 0),  poison,  zeroinitializer), [[TMP6]]
+; CHECK-NEXT:[[TMP7:%.*]] = add  zeroinitializer, [[TMP6]]
 ; CHECK-NEXT:[[TMP8:%.*]] = add  [[DOTSPLAT]], [[TMP7]]
 ; CHECK-NEXT:[[NEXT_GEP4:%.*]] = getelementptr i8, i8* [[START_2]],  [[TMP8]]
 ; CHECK-NEXT:[[TMP9:%.*]] = add i64 [[INDEX]], 0
@@ -126,7 +126,7 @@
 ; CHECK-NEXT:[[TMP5:%.*]] = call  @llvm.experimental.stepvector.nxv2i64()
 ; CHECK-NEXT:[[DOTSPLATINSERT:%.*]] = insertelement  poison, i64 [[INDEX1]], i32 0
 ; CHECK-NEXT:[[DOTSPLAT:%.*]] = shufflevector  [[DOTSPLATINSERT]],  poison,  zeroinitializer
-; CHECK-NEXT:[[TMP6:%.*]] = add  shufflevector ( insertelement ( poison, i64 0, i32 0),  poison,  zeroinitializer), [[TMP5]]
+; CHECK-NEXT:[[TMP6:%.*]] = add  zeroinitializer, [[TMP5]]
 ; CHECK-NEXT:[[TMP7:%.*]] = add  [[DOTSPLAT]

[PATCH] D127910: [Clang][AArch64] Add SME C intrinsics for load and store

2022-10-26 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @sagarkulkarni19, just a gentle ping to see if you are still planning to do 
more work on this patch?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127910/new/

https://reviews.llvm.org/D127910

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D134352: [AArch64] Add Neoverse V2 CPU support

2022-09-21 Thread David Sherwood via Phabricator via cfe-commits

david-arm created this revision.
david-arm added reviewers: sdesmalen, paulwalker-arm, dmgreen, MarkMurrayARM, 
CarolineConcatto.
Herald added subscribers: hiraditya, kristof.beyls.
Herald added a project: All.
david-arm requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

Adds support for the Neoverse V2 CPU to the AArch64 backend.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D134352

Files:
  clang/test/Driver/aarch64-mcpu.c
  clang/test/Misc/target-invalid-cpu-note.c
  llvm/include/llvm/Support/AArch64TargetParser.def
  llvm/lib/Support/Host.cpp
  llvm/lib/Target/AArch64/AArch64.td
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/test/CodeGen/AArch64/cpus.ll
  llvm/unittests/Support/TargetParserTest.cpp

Index: llvm/unittests/Support/TargetParserTest.cpp
===
--- llvm/unittests/Support/TargetParserTest.cpp
+++ llvm/unittests/Support/TargetParserTest.cpp
@@ -1026,6 +1026,16 @@
 AArch64::AEK_PROFILE | AArch64::AEK_RAND |
 AArch64::AEK_FP16FML | AArch64::AEK_I8MM,
 "8.4-A"),
+ARMCPUTestParams(
+"neoverse-v2", "armv9-a", "neon-fp-armv8",
+AArch64::AEK_RAS | AArch64::AEK_SVE | AArch64::AEK_SSBS |
+AArch64::AEK_RCPC | AArch64::AEK_CRC | AArch64::AEK_FP |
+AArch64::AEK_SIMD | AArch64::AEK_RAS | AArch64::AEK_LSE |
+AArch64::AEK_RDM | AArch64::AEK_RCPC | AArch64::AEK_DOTPROD |
+AArch64::AEK_FP16 | AArch64::AEK_BF16 | AArch64::AEK_SVE2 |
+AArch64::AEK_PROFILE | AArch64::AEK_FP16FML |
+AArch64::AEK_I8MM,
+"9-A"),
 ARMCPUTestParams("cortex-r82", "armv8-r", "crypto-neon-fp-armv8",
  AArch64::AEK_CRC | AArch64::AEK_RDM |
  AArch64::AEK_SSBS | AArch64::AEK_DOTPROD |
@@ -1257,7 +1267,7 @@
  AArch64::AEK_LSE | AArch64::AEK_RDM,
  "8.2-A")));
 
-static constexpr unsigned NumAArch64CPUArchs = 54;
+static constexpr unsigned NumAArch64CPUArchs = 55;
 
 TEST(TargetParserTest, testAArch64CPUArchList) {
   SmallVector List;
Index: llvm/test/CodeGen/AArch64/cpus.ll
===
--- llvm/test/CodeGen/AArch64/cpus.ll
+++ llvm/test/CodeGen/AArch64/cpus.ll
@@ -23,6 +23,7 @@
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-512tvb 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v1 2>&1 | FileCheck %s
+; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m3 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m4 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m5 2>&1 | FileCheck %s
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -74,6 +74,7 @@
 NeoverseN2,
 Neoverse512TVB,
 NeoverseV1,
+NeoverseV2,
 Saphira,
 ThunderX2T99,
 ThunderX,
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -210,6 +210,12 @@
 MaxBytesForLoopAlignment = 16;
 VScaleForTuning = 2;
 break;
+  case NeoverseV2:
+PrefFunctionLogAlignment = 4;
+PrefLoopLogAlignment = 5;
+MaxBytesForLoopAlignment = 16;
+VScaleForTuning = 1;
+break;
   case Neoverse512TVB:
 PrefFunctionLogAlignment = 4;
 VScaleForTuning = 1;
Index: llvm/lib/Target/AArch64/AArch64.td
===
--- llvm/lib/Target/AArch64/AArch64.td
+++ llvm/lib/Target/AArch64/AArch64.td
@@ -936,6 +936,10 @@
   FeatureLSLFast,
   FeaturePostRAScheduler]>;
 
+def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2",
+  "Neoverse V2 ARM processors", [
+  FeaturePostRAScheduler]>;
+
 def TuneSaphira  : SubtargetFeature<"saphira", "ARMProcFamily", "Saphira",
"Qualcomm Saphira processors", [
FeatureCustomCheapAsMoveHandling,
@@ -1100,6 +1104,10 @@
FeatureFullFP16, FeatureMatMulInt8, FeatureNEON,
FeaturePerfMon, FeatureRandGen,

[PATCH] D134352: [AArch64] Add Neoverse V2 CPU support

2022-09-21 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 461898.
david-arm added a comment.

- Changed lists of tuning features.
- Removed redundant arch features from list.
- Combined neoverse-v2 and neoverse-n2 cases together in AArch64Subtarget.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134352/new/

https://reviews.llvm.org/D134352

Files:
  clang/test/Driver/aarch64-mcpu.c
  clang/test/Misc/target-invalid-cpu-note.c
  llvm/include/llvm/Support/AArch64TargetParser.def
  llvm/lib/Support/Host.cpp
  llvm/lib/Target/AArch64/AArch64.td
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/test/CodeGen/AArch64/cpus.ll
  llvm/unittests/Support/TargetParserTest.cpp

Index: llvm/unittests/Support/TargetParserTest.cpp
===
--- llvm/unittests/Support/TargetParserTest.cpp
+++ llvm/unittests/Support/TargetParserTest.cpp
@@ -1026,6 +1026,16 @@
 AArch64::AEK_PROFILE | AArch64::AEK_RAND |
 AArch64::AEK_FP16FML | AArch64::AEK_I8MM,
 "8.4-A"),
+ARMCPUTestParams(
+"neoverse-v2", "armv9-a", "neon-fp-armv8",
+AArch64::AEK_RAS | AArch64::AEK_SVE | AArch64::AEK_SSBS |
+AArch64::AEK_RCPC | AArch64::AEK_CRC | AArch64::AEK_FP |
+AArch64::AEK_SIMD | AArch64::AEK_RAS | AArch64::AEK_LSE |
+AArch64::AEK_RDM | AArch64::AEK_RCPC | AArch64::AEK_DOTPROD |
+AArch64::AEK_FP16 | AArch64::AEK_BF16 | AArch64::AEK_SVE2 |
+AArch64::AEK_PROFILE | AArch64::AEK_FP16FML |
+AArch64::AEK_I8MM,
+"9-A"),
 ARMCPUTestParams("cortex-r82", "armv8-r", "crypto-neon-fp-armv8",
  AArch64::AEK_CRC | AArch64::AEK_RDM |
  AArch64::AEK_SSBS | AArch64::AEK_DOTPROD |
@@ -1257,7 +1267,7 @@
  AArch64::AEK_LSE | AArch64::AEK_RDM,
  "8.2-A")));
 
-static constexpr unsigned NumAArch64CPUArchs = 54;
+static constexpr unsigned NumAArch64CPUArchs = 55;
 
 TEST(TargetParserTest, testAArch64CPUArchList) {
   SmallVector List;
Index: llvm/test/CodeGen/AArch64/cpus.ll
===
--- llvm/test/CodeGen/AArch64/cpus.ll
+++ llvm/test/CodeGen/AArch64/cpus.ll
@@ -23,6 +23,7 @@
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-512tvb 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v1 2>&1 | FileCheck %s
+; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m3 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m4 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m5 2>&1 | FileCheck %s
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -74,6 +74,7 @@
 NeoverseN2,
 Neoverse512TVB,
 NeoverseV1,
+NeoverseV2,
 Saphira,
 ThunderX2T99,
 ThunderX,
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -199,6 +199,7 @@
 MaxBytesForLoopAlignment = 16;
 break;
   case NeoverseN2:
+  case NeoverseV2:
 PrefFunctionLogAlignment = 4;
 PrefLoopLogAlignment = 5;
 MaxBytesForLoopAlignment = 16;
Index: llvm/lib/Target/AArch64/AArch64.td
===
--- llvm/lib/Target/AArch64/AArch64.td
+++ llvm/lib/Target/AArch64/AArch64.td
@@ -936,6 +936,12 @@
   FeatureLSLFast,
   FeaturePostRAScheduler]>;
 
+def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2",
+  "Neoverse V2 ARM processors", [
+  FeatureFuseAES,
+  FeatureLSLFast,
+  FeaturePostRAScheduler]>;
+
 def TuneSaphira  : SubtargetFeature<"saphira", "ARMProcFamily", "Saphira",
"Qualcomm Saphira processors", [
FeatureCustomCheapAsMoveHandling,
@@ -1100,6 +1106,10 @@
FeatureFullFP16, FeatureMatMulInt8, FeatureNEON,
FeaturePerfMon, FeatureRandGen, FeatureSPE,
FeatureSSBS, FeatureSVE];
+  list NeoverseV2 = [HasV9_0aOps, FeatureBF16, FeatureSPE,
+

[PATCH] D134352: [AArch64] Add Neoverse V2 CPU support

2022-09-21 Thread David Sherwood via Phabricator via cfe-commits

david-arm marked 3 inline comments as done.
david-arm added a comment.

Thanks for the quick review @dmgreen!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134352/new/

https://reviews.llvm.org/D134352

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D134352: [AArch64] Add Neoverse V2 CPU support

2022-09-22 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 462124.
david-arm added a comment.

- Added FEAT_RNG to the neoverse-v2 CPU.
- Added message to release notes.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134352/new/

https://reviews.llvm.org/D134352

Files:
  clang/docs/ReleaseNotes.rst
  clang/test/Driver/aarch64-mcpu.c
  clang/test/Misc/target-invalid-cpu-note.c
  llvm/include/llvm/Support/AArch64TargetParser.def
  llvm/lib/Support/Host.cpp
  llvm/lib/Target/AArch64/AArch64.td
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/test/CodeGen/AArch64/cpus.ll
  llvm/unittests/Support/TargetParserTest.cpp

Index: llvm/unittests/Support/TargetParserTest.cpp
===
--- llvm/unittests/Support/TargetParserTest.cpp
+++ llvm/unittests/Support/TargetParserTest.cpp
@@ -1026,6 +1026,16 @@
 AArch64::AEK_PROFILE | AArch64::AEK_RAND |
 AArch64::AEK_FP16FML | AArch64::AEK_I8MM,
 "8.4-A"),
+ARMCPUTestParams(
+"neoverse-v2", "armv9-a", "neon-fp-armv8",
+AArch64::AEK_RAS | AArch64::AEK_SVE | AArch64::AEK_SSBS |
+AArch64::AEK_RCPC | AArch64::AEK_CRC | AArch64::AEK_FP |
+AArch64::AEK_SIMD | AArch64::AEK_RAS | AArch64::AEK_LSE |
+AArch64::AEK_RDM | AArch64::AEK_RCPC | AArch64::AEK_DOTPROD |
+AArch64::AEK_FP16 | AArch64::AEK_BF16 | AArch64::AEK_SVE2 |
+AArch64::AEK_PROFILE | AArch64::AEK_FP16FML |
+AArch64::AEK_I8MM | AArch64::AEK_RAND,
+"9-A"),
 ARMCPUTestParams("cortex-r82", "armv8-r", "crypto-neon-fp-armv8",
  AArch64::AEK_CRC | AArch64::AEK_RDM |
  AArch64::AEK_SSBS | AArch64::AEK_DOTPROD |
@@ -1257,7 +1267,7 @@
  AArch64::AEK_LSE | AArch64::AEK_RDM,
  "8.2-A")));
 
-static constexpr unsigned NumAArch64CPUArchs = 54;
+static constexpr unsigned NumAArch64CPUArchs = 55;
 
 TEST(TargetParserTest, testAArch64CPUArchList) {
   SmallVector List;
Index: llvm/test/CodeGen/AArch64/cpus.ll
===
--- llvm/test/CodeGen/AArch64/cpus.ll
+++ llvm/test/CodeGen/AArch64/cpus.ll
@@ -23,6 +23,7 @@
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-512tvb 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v1 2>&1 | FileCheck %s
+; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m3 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m4 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m5 2>&1 | FileCheck %s
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -74,6 +74,7 @@
 NeoverseN2,
 Neoverse512TVB,
 NeoverseV1,
+NeoverseV2,
 Saphira,
 ThunderX2T99,
 ThunderX,
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -199,6 +199,7 @@
 MaxBytesForLoopAlignment = 16;
 break;
   case NeoverseN2:
+  case NeoverseV2:
 PrefFunctionLogAlignment = 4;
 PrefLoopLogAlignment = 5;
 MaxBytesForLoopAlignment = 16;
Index: llvm/lib/Target/AArch64/AArch64.td
===
--- llvm/lib/Target/AArch64/AArch64.td
+++ llvm/lib/Target/AArch64/AArch64.td
@@ -936,6 +936,12 @@
   FeatureLSLFast,
   FeaturePostRAScheduler]>;
 
+def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2",
+  "Neoverse V2 ARM processors", [
+  FeatureFuseAES,
+  FeatureLSLFast,
+  FeaturePostRAScheduler]>;
+
 def TuneSaphira  : SubtargetFeature<"saphira", "ARMProcFamily", "Saphira",
"Qualcomm Saphira processors", [
FeatureCustomCheapAsMoveHandling,
@@ -1100,6 +1106,10 @@
FeatureFullFP16, FeatureMatMulInt8, FeatureNEON,
FeaturePerfMon, FeatureRandGen, FeatureSPE,
FeatureSSBS, FeatureSVE];
+  list NeoverseV2 = [HasV9_0aOps, FeatureBF16, FeatureSPE,
+   FeaturePerfMon, Fea

[PATCH] D134352: [AArch64] Add Neoverse V2 CPU support

2022-09-22 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

In D134352#3806896 , @tschuett wrote:

> `VScaleForTuning` is 1 for N2 and V2. It is 2 for V1. I though the V2 is more 
> like the V1 than the N2?
> Sorry. This is throughput right?

Hi @tschuett, Neoverse V2 will have 128-bit SVE vector lengths, hence 
VScaleForTuning should be 1.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134352/new/

https://reviews.llvm.org/D134352

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D134352: [AArch64] Add Neoverse V2 CPU support

2022-09-23 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64.td:1112
+   FeatureNEON, FeatureSVE2BitPerm, 
FeatureFP16FML,
+   FeatureMTE, FeatureRandGen];
   list Saphira= [HasV8_4aOps, FeatureCrypto, 
FeatureFPARMv8,

Matt wrote:
> Shouldn't `FeatureMTE` (Enable Memory Tagging Extension) be present, too (as 
> in NeoverseN2)?
It is already included in the patch, i.e. see above:

  FeatureMTE, FeatureRandGen];


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134352/new/

https://reviews.llvm.org/D134352

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D134352: [AArch64] Add Neoverse V2 CPU support

2022-09-23 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 462420.
david-arm added a comment.

- Added AEK_MTE to target parser flags.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134352/new/

https://reviews.llvm.org/D134352

Files:
  clang/docs/ReleaseNotes.rst
  clang/test/Driver/aarch64-mcpu.c
  clang/test/Misc/target-invalid-cpu-note.c
  llvm/include/llvm/Support/AArch64TargetParser.def
  llvm/lib/Support/Host.cpp
  llvm/lib/Target/AArch64/AArch64.td
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/test/CodeGen/AArch64/cpus.ll
  llvm/unittests/Support/TargetParserTest.cpp

Index: llvm/unittests/Support/TargetParserTest.cpp
===
--- llvm/unittests/Support/TargetParserTest.cpp
+++ llvm/unittests/Support/TargetParserTest.cpp
@@ -1026,6 +1026,16 @@
 AArch64::AEK_PROFILE | AArch64::AEK_RAND |
 AArch64::AEK_FP16FML | AArch64::AEK_I8MM,
 "8.4-A"),
+ARMCPUTestParams(
+"neoverse-v2", "armv9-a", "neon-fp-armv8",
+AArch64::AEK_RAS | AArch64::AEK_SVE | AArch64::AEK_SSBS |
+AArch64::AEK_RCPC | AArch64::AEK_CRC | AArch64::AEK_FP |
+AArch64::AEK_SIMD | AArch64::AEK_MTE | AArch64::AEK_LSE |
+AArch64::AEK_RDM | AArch64::AEK_RCPC | AArch64::AEK_DOTPROD |
+AArch64::AEK_FP16 | AArch64::AEK_BF16 | AArch64::AEK_SVE2 |
+AArch64::AEK_PROFILE | AArch64::AEK_FP16FML |
+AArch64::AEK_I8MM | AArch64::AEK_RAND,
+"9-A"),
 ARMCPUTestParams("cortex-r82", "armv8-r", "crypto-neon-fp-armv8",
  AArch64::AEK_CRC | AArch64::AEK_RDM |
  AArch64::AEK_SSBS | AArch64::AEK_DOTPROD |
@@ -1257,7 +1267,7 @@
  AArch64::AEK_LSE | AArch64::AEK_RDM,
  "8.2-A")));
 
-static constexpr unsigned NumAArch64CPUArchs = 54;
+static constexpr unsigned NumAArch64CPUArchs = 55;
 
 TEST(TargetParserTest, testAArch64CPUArchList) {
   SmallVector List;
Index: llvm/test/CodeGen/AArch64/cpus.ll
===
--- llvm/test/CodeGen/AArch64/cpus.ll
+++ llvm/test/CodeGen/AArch64/cpus.ll
@@ -23,6 +23,7 @@
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-512tvb 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v1 2>&1 | FileCheck %s
+; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m3 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m4 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m5 2>&1 | FileCheck %s
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -74,6 +74,7 @@
 NeoverseN2,
 Neoverse512TVB,
 NeoverseV1,
+NeoverseV2,
 Saphira,
 ThunderX2T99,
 ThunderX,
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -199,6 +199,7 @@
 MaxBytesForLoopAlignment = 16;
 break;
   case NeoverseN2:
+  case NeoverseV2:
 PrefFunctionLogAlignment = 4;
 PrefLoopLogAlignment = 5;
 MaxBytesForLoopAlignment = 16;
Index: llvm/lib/Target/AArch64/AArch64.td
===
--- llvm/lib/Target/AArch64/AArch64.td
+++ llvm/lib/Target/AArch64/AArch64.td
@@ -936,6 +936,12 @@
   FeatureLSLFast,
   FeaturePostRAScheduler]>;
 
+def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2",
+  "Neoverse V2 ARM processors", [
+  FeatureFuseAES,
+  FeatureLSLFast,
+  FeaturePostRAScheduler]>;
+
 def TuneSaphira  : SubtargetFeature<"saphira", "ARMProcFamily", "Saphira",
"Qualcomm Saphira processors", [
FeatureCustomCheapAsMoveHandling,
@@ -1100,6 +1106,10 @@
FeatureFullFP16, FeatureMatMulInt8, FeatureNEON,
FeaturePerfMon, FeatureRandGen, FeatureSPE,
FeatureSSBS, FeatureSVE];
+  list NeoverseV2 = [HasV9_0aOps, FeatureBF16, FeatureSPE,
+   FeaturePerfMon, FeatureETE, FeatureMatMulInt8,
+

[PATCH] D134352: [AArch64] Add Neoverse V2 CPU support

2022-09-26 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 462833.
david-arm added a comment.

- Added SVE2BITPERM to AArch64TargetParser.def and updated the unit test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134352/new/

https://reviews.llvm.org/D134352

Files:
  clang/docs/ReleaseNotes.rst
  clang/test/Driver/aarch64-mcpu.c
  clang/test/Misc/target-invalid-cpu-note.c
  llvm/include/llvm/Support/AArch64TargetParser.def
  llvm/lib/Support/Host.cpp
  llvm/lib/Target/AArch64/AArch64.td
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/test/CodeGen/AArch64/cpus.ll
  llvm/unittests/Support/TargetParserTest.cpp

Index: llvm/unittests/Support/TargetParserTest.cpp
===
--- llvm/unittests/Support/TargetParserTest.cpp
+++ llvm/unittests/Support/TargetParserTest.cpp
@@ -1026,6 +1026,17 @@
 AArch64::AEK_PROFILE | AArch64::AEK_RAND |
 AArch64::AEK_FP16FML | AArch64::AEK_I8MM,
 "8.4-A"),
+ARMCPUTestParams(
+"neoverse-v2", "armv9-a", "neon-fp-armv8",
+AArch64::AEK_RAS | AArch64::AEK_SVE | AArch64::AEK_SSBS |
+AArch64::AEK_RCPC | AArch64::AEK_CRC | AArch64::AEK_FP |
+AArch64::AEK_SIMD | AArch64::AEK_MTE | AArch64::AEK_LSE |
+AArch64::AEK_RDM | AArch64::AEK_RCPC | AArch64::AEK_DOTPROD |
+AArch64::AEK_FP16 | AArch64::AEK_BF16 | AArch64::AEK_SVE2 |
+AArch64::AEK_PROFILE | AArch64::AEK_FP16FML |
+AArch64::AEK_I8MM | AArch64::AEK_SVE2BITPERM |
+AArch64::AEK_RAND,
+"9-A"),
 ARMCPUTestParams("cortex-r82", "armv8-r", "crypto-neon-fp-armv8",
  AArch64::AEK_CRC | AArch64::AEK_RDM |
  AArch64::AEK_SSBS | AArch64::AEK_DOTPROD |
@@ -1257,7 +1268,7 @@
  AArch64::AEK_LSE | AArch64::AEK_RDM,
  "8.2-A")));
 
-static constexpr unsigned NumAArch64CPUArchs = 54;
+static constexpr unsigned NumAArch64CPUArchs = 55;
 
 TEST(TargetParserTest, testAArch64CPUArchList) {
   SmallVector List;
Index: llvm/test/CodeGen/AArch64/cpus.ll
===
--- llvm/test/CodeGen/AArch64/cpus.ll
+++ llvm/test/CodeGen/AArch64/cpus.ll
@@ -23,6 +23,7 @@
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-512tvb 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v1 2>&1 | FileCheck %s
+; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m3 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m4 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m5 2>&1 | FileCheck %s
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -74,6 +74,7 @@
 NeoverseN2,
 Neoverse512TVB,
 NeoverseV1,
+NeoverseV2,
 Saphira,
 ThunderX2T99,
 ThunderX,
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -199,6 +199,7 @@
 MaxBytesForLoopAlignment = 16;
 break;
   case NeoverseN2:
+  case NeoverseV2:
 PrefFunctionLogAlignment = 4;
 PrefLoopLogAlignment = 5;
 MaxBytesForLoopAlignment = 16;
Index: llvm/lib/Target/AArch64/AArch64.td
===
--- llvm/lib/Target/AArch64/AArch64.td
+++ llvm/lib/Target/AArch64/AArch64.td
@@ -936,6 +936,12 @@
   FeatureLSLFast,
   FeaturePostRAScheduler]>;
 
+def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2",
+  "Neoverse V2 ARM processors", [
+  FeatureFuseAES,
+  FeatureLSLFast,
+  FeaturePostRAScheduler]>;
+
 def TuneSaphira  : SubtargetFeature<"saphira", "ARMProcFamily", "Saphira",
"Qualcomm Saphira processors", [
FeatureCustomCheapAsMoveHandling,
@@ -1100,6 +1106,10 @@
FeatureFullFP16, FeatureMatMulInt8, FeatureNEON,
FeaturePerfMon, FeatureRandGen, FeatureSPE,
FeatureSSBS, FeatureSVE];
+  list NeoverseV2 = [HasV9_0aOps, FeatureBF16, FeatureSPE,
+

[PATCH] D134352: [AArch64] Add Neoverse V2 CPU support

2022-09-26 Thread David Sherwood via Phabricator via cfe-commits

david-arm marked 3 inline comments as done.
david-arm added inline comments.



Comment at: llvm/include/llvm/Support/AArch64TargetParser.def:239
+AARCH64_CPU_NAME("neoverse-v2", ARMV9A, FK_NEON_FP_ARMV8, false,
+ (AArch64::AEK_SVE | AArch64::AEK_SVE2 | AArch64::AEK_SSBS |
+  AArch64::AEK_FP16 | AArch64::AEK_BF16 | AArch64::AEK_RAND |

Matt wrote:
> Should `AEK_SVE2BITPERM` be present? (Noticed that N2 has ` AArch64::AEK_SVE2 
> | AArch64::AEK_SVE2BITPERM`).
Good spot! I've made this consistent with the definition in AArch64.td.



Comment at: llvm/lib/Target/AArch64/AArch64.td:
+   FeaturePerfMon, FeatureETE, 
FeatureMatMulInt8,
+   FeatureNEON, FeatureSVE2BitPerm, 
FeatureFP16FML,
+   FeatureMTE, FeatureRandGen];

Matt wrote:
> `FeatureNEON` may be redundant (note that it's in `HasV8_3aOps`).
> 
> OTOH, `NeoverseV1` also has `FeatureCrypto`: is this no longer the case for 
> `NeoverseV2`?
HasV8_3aOps does imply FeatureNEON, but only indirectly through 
FeatureComplxNum. I thought it was clearer to explicitly add it, since it 
doesn't do any harm.

With regards to FeatureCrypto, I am following the precedent set for 
Cortex-A510, Cortex-A710 and Cortex-X2 where it also wasn't enabled by default. 
The user can always enable crypto with -mcpu=neoverse-v2+crypto if required.



Comment at: llvm/lib/Target/AArch64/AArch64Subtarget.cpp:202
   case NeoverseN2:
+  case NeoverseV2:
 PrefFunctionLogAlignment = 4;

Matt wrote:
> Are `CacheLineSize` (`= 0` by default) and `MaxInterleaveFactor` (`= 2` by 
> default) the same / correct for both N2 and V2?
I don't know the answer for neoverse-v2 I'm afraid, and neoverse-n2 isn't part 
of this patch. Performance tuning can be done in a later patch I think.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134352/new/

https://reviews.llvm.org/D134352

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D134352: [AArch64] Add Neoverse V2 CPU support

2022-09-27 Thread David Sherwood via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
david-arm marked 3 inline comments as done.
Closed by commit rGfbb119412f14: [AArch64] Add Neoverse V2 CPU support 
(authored by david-arm).

Changed prior to commit:
  https://reviews.llvm.org/D134352?vs=462833&id=463132#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134352/new/

https://reviews.llvm.org/D134352

Files:
  clang/docs/ReleaseNotes.rst
  clang/test/Driver/aarch64-mcpu.c
  clang/test/Misc/target-invalid-cpu-note.c
  llvm/include/llvm/Support/AArch64TargetParser.def
  llvm/lib/Support/Host.cpp
  llvm/lib/Target/AArch64/AArch64.td
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/test/CodeGen/AArch64/cpus.ll
  llvm/unittests/Support/TargetParserTest.cpp

Index: llvm/unittests/Support/TargetParserTest.cpp
===
--- llvm/unittests/Support/TargetParserTest.cpp
+++ llvm/unittests/Support/TargetParserTest.cpp
@@ -1026,6 +1026,17 @@
 AArch64::AEK_PROFILE | AArch64::AEK_RAND |
 AArch64::AEK_FP16FML | AArch64::AEK_I8MM,
 "8.4-A"),
+ARMCPUTestParams(
+"neoverse-v2", "armv9-a", "neon-fp-armv8",
+AArch64::AEK_RAS | AArch64::AEK_SVE | AArch64::AEK_SSBS |
+AArch64::AEK_RCPC | AArch64::AEK_CRC | AArch64::AEK_FP |
+AArch64::AEK_SIMD | AArch64::AEK_MTE | AArch64::AEK_LSE |
+AArch64::AEK_RDM | AArch64::AEK_RCPC | AArch64::AEK_DOTPROD |
+AArch64::AEK_FP16 | AArch64::AEK_BF16 | AArch64::AEK_SVE2 |
+AArch64::AEK_PROFILE | AArch64::AEK_FP16FML |
+AArch64::AEK_I8MM | AArch64::AEK_SVE2BITPERM |
+AArch64::AEK_RAND,
+"9-A"),
 ARMCPUTestParams("cortex-r82", "armv8-r", "crypto-neon-fp-armv8",
  AArch64::AEK_CRC | AArch64::AEK_RDM |
  AArch64::AEK_SSBS | AArch64::AEK_DOTPROD |
@@ -1284,7 +1295,7 @@
  AArch64::AEK_LSE | AArch64::AEK_RDM,
  "8.2-A")));
 
-static constexpr unsigned NumAArch64CPUArchs = 57;
+static constexpr unsigned NumAArch64CPUArchs = 58;
 
 TEST(TargetParserTest, testAArch64CPUArchList) {
   SmallVector List;
Index: llvm/test/CodeGen/AArch64/cpus.ll
===
--- llvm/test/CodeGen/AArch64/cpus.ll
+++ llvm/test/CodeGen/AArch64/cpus.ll
@@ -23,6 +23,7 @@
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-512tvb 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v1 2>&1 | FileCheck %s
+; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=neoverse-v2 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m3 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m4 2>&1 | FileCheck %s
 ; RUN: llc < %s -mtriple=arm64-unknown-unknown -mcpu=exynos-m5 2>&1 | FileCheck %s
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -76,6 +76,7 @@
 NeoverseN2,
 Neoverse512TVB,
 NeoverseV1,
+NeoverseV2,
 Saphira,
 ThunderX2T99,
 ThunderX,
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -201,6 +201,7 @@
 MaxBytesForLoopAlignment = 16;
 break;
   case NeoverseN2:
+  case NeoverseV2:
 PrefFunctionLogAlignment = 4;
 PrefLoopLogAlignment = 5;
 MaxBytesForLoopAlignment = 16;
Index: llvm/lib/Target/AArch64/AArch64.td
===
--- llvm/lib/Target/AArch64/AArch64.td
+++ llvm/lib/Target/AArch64/AArch64.td
@@ -984,6 +984,12 @@
   FeatureLSLFast,
   FeaturePostRAScheduler]>;
 
+def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2",
+  "Neoverse V2 ARM processors", [
+  FeatureFuseAES,
+  FeatureLSLFast,
+  FeaturePostRAScheduler]>;
+
 def TuneSaphira  : SubtargetFeature<"saphira", "ARMProcFamily", "Saphira",
"Qualcomm Saphira processors", [
FeatureCustomCheapAsMoveHandling,
@@ -1155,6 +1161,10 @@
FeatureFullFP16, FeatureMatMulI

[PATCH] D127910: [Clang][AArch64] Add SME C intrinsics for load and store

2022-10-05 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @sagarkulkarni19, thank you for working on the ACLE builtins for SME! I've 
had a look through and I have a few comments, mostly around how the code is 
structured. It would be good if you could try to separate SVE from SME in this 
implementation, in the same way that NEON and SVE are distinct. It's possible 
to do this whilst reusing much of the code in SveEmitter.cpp.

Comment at: clang/include/clang/Basic/arm_sve.td:210
+def IsSME : FlagType<0x8>;
+def IsSMELd1  : FlagType<0x10>;
+def IsSMESt1  : FlagType<0x20>;

We don't need these new flags 'IsSMELd1' and 'IsSMESt1'. Please can you reuse 
the existing `IsLoad` and `IsStore` flags?

Comment at: clang/include/clang/Basic/arm_sve.td:549

+def SVLD1_HOR_ZA8 : MInst<"svld1_hor_za8", "vimiPQ", "", [IsOverloadNone, 
IsSME, IsSMELd1], MemEltTyDefault, "aarch64_sme_ld1b_horiz">;
+def SVLD1_HOR_ZA16 : MInst<"svld1_hor_za16", "vimiPQ", "", [IsOverloadNone, 
IsSME, IsSMELd1], MemEltTyDefault, "aarch64_sme_ld1h_horiz">;

SME is really a distinct architecture and so I think it should live in it's own 
arm_sme.td file in the same way that we have arm_neon.td and arm_sve.td. It's 
possible to do this and still reuse the SveEmitter.cpp code. If you look at 
SveEmitter.cpp you'll see these functions:

  void EmitSveHeader(RecordKeeper &Records, raw_ostream &OS) {
SVEEmitter(Records).createHeader(OS);
  }

  void EmitSveBuiltins(RecordKeeper &Records, raw_ostream &OS) {
SVEEmitter(Records).createBuiltins(OS);
  }

  void EmitSveBuiltinCG(RecordKeeper &Records, raw_ostream &OS) {
SVEEmitter(Records).createCodeGenMap(OS);
  }

  void EmitSveRangeChecks(RecordKeeper &Records, raw_ostream &OS) {
SVEEmitter(Records).createRangeChecks(OS);
  }

It would be good to add similar ones for SME, i.e. `EmitSmeRangeChecks`, etc.

Comment at: clang/include/clang/Basic/arm_sve.td:209
 def IsTupleSet: FlagType<0x4>;
+def IsSME : FlagType<0x8>;
+def IsSMELoadStore: FlagType<0x10>;

sagarkulkarni19 wrote:
> sdesmalen wrote:
> > Is there value in having both `IsSME` and `IsSMELoadStore`?
> `IsSME` flag is checked in the SveEmitter.cpp : createSMEHeader(...) to 
> generate arm_sme.h with only the SME intrinsics, whereas `IsSMELoadStore` is 
> used to correctly CodeGen (CGBuiltins.cpp) load and store intrinsics. 
You don't need the `IsSME` flag either because in 
`CodeGenFunction::EmitAArch64BuiltinExpr` you can do exactly the same thing as 
SVE, i.e. something like

  if (BuiltinID >= AArch64::FirstSMEBuiltin &&
  BuiltinID <= AArch64::LastSMEBuiltin)
return EmitAArch64SMEBuiltinExpr(BuiltinID, E);

Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9230
 return EmitSVEMaskedStore(E, Ops, Builtin->LLVMIntrinsic);
+  else if (TypeFlags.isSMELd1() || TypeFlags.isSMESt1())
+return EmitSMELd1St1(TypeFlags, Ops, Builtin->LLVMIntrinsic);

I would prefer this to be handled inside it's own `EmitAArch64SMEBuiltinExpr`, 
since it's a separate architecture.

Comment at: clang/utils/TableGen/SveEmitter.cpp:867
+if (this->Flags & Emitter.getEnumValueForFlag("IsSMELd1"))
+  this->SMEAttributes = "arm_streaming, arm_shared_za";
+else if (this->Flags & Emitter.getEnumValueForFlag("IsSMESt1"))

I would prefer this to be done more precisely via separate attribute flags, 
i.e. in the .td file decorate each ACLE builtin with something like `IsShared`, 
`IsStreaming`, `IsStreamingCompatible`, etc. otherwise you'll end up needing 
loads of flags for all different possible combinations. That way you can do:

  if (this->Flags & Emitter.getEnumValueForFlag("IsStreaming"))
this->SMEAttributes += "arm_streaming";

etc.

Comment at: clang/utils/TableGen/SveEmitter.cpp:1042

+  bool SMEFlag = Flags & getEnumValueForFlag("IsSME");
+  if (SMEFlag != IsSME)

If you move the builtins to their own file in arm_sme.td and emit the records 
using interfaces like EmitSmeBuiltins, etc. then you already know they are SME 
builtins and so don't need the flag.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127910/new/

https://reviews.llvm.org/D127910

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D140983: [IRBuilder] Use canonical i64 type for insertelement index used by vector splats.

2023-01-05 Thread David Sherwood via Phabricator via cfe-commits

david-arm accepted this revision.
david-arm added a comment.
This revision is now accepted and ready to land.

LGTM! The change seems sensible to me, although perhaps worth waiting a day or 
two in case anyone else has objections? I noticed that we still create inserts 
using 32-bit indices in a few other places such as the SLPVectorizer, but I 
imagine these can be tidied up over time if necessary.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140983/new/

https://reviews.llvm.org/D140983

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141056: [SVE][CGBuiltins] Remove need for instcombine from ACLE tests.

2023-01-11 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9092
+  if (BytesPerElt > 1) {
+Value *Scale = ConstantInt::get(Int64Ty, Log2_32(BytesPerElt));
+Ops[2] = Builder.CreateShl(Ops[2], Scale);

Given this seems a frequent idiom is it worth putting this into a helper 
routine? i.e. something like

  Ops[2] = getScaledOffset(Ops[2], BytesPerElt);

where

  Value *getScaledOffset(SDValue Offset, unsigned Bytes) {
Value *Scale = ConstantInt::get(Int64Ty, Log2_32(Bytes));
return Builder.CreateShl(Offset, Scale);
  }


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141056/new/

https://reviews.llvm.org/D141056

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D127910: [Clang][AArch64][SME] Add vector load/store (ld1/st1) intrinsics

2023-01-30 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi @bryanpkc, this is looking a lot better now and thanks for addressing the 
comments! I've not reviewed all of the patch yet, but I do have a few more 
comments. The most important ones are about performing immediate range checks 
for the builtins and not declaring the __ARM_FEATURE_SME yet.




Comment at: clang/include/clang/Basic/TargetBuiltins.h:312
+  /// Flags to identify the types for overloaded SME builtins.
+  class SMETypeFlags {
+uint64_t Flags;

I actually don't think you need to add this class - we should be able to just 
reuse the existing SVETypeFlags structure. I think this is fine because you've 
commonised the flags between SME and SVE.



Comment at: clang/include/clang/Basic/arm_sme.td:21
+
+def SVLD1_HOR_ZA8 : MInst<"svld1_hor_za8", "vimiPQ", "c", [IsLoad, 
IsOverloadNone, IsStreaming, IsSharedZA], MemEltTyDefault, 
"aarch64_sme_ld1b_horiz">;
+def SVLD1_HOR_ZA16 : MInst<"svld1_hor_za16", "vimiPQ", "s", [IsLoad, 
IsOverloadNone, IsStreaming, IsSharedZA], MemEltTyDefault, 
"aarch64_sme_ld1h_horiz">;

I think all the load and store instructions need immediate checks for the tile 
and slice_offset here such as:

[ImmCheck<0, ImmCheck0>, ImmCheck<2, ImmCheck0_15>]

for SVLD1_HOR_ZA8 and the others. It's mentioned in the ACLE - 
https://arm-software.github.io/acle/main/acle.html#sme-language-extensions-and-intrinsics:

  15.4.3.1 Common Rules

  ...
  Every argument named tile, slice_offset or tile_mask must be an integer 
constant expression in the range of the underlying instruction.




Comment at: clang/include/clang/Basic/arm_sve_sme_incl.td:126
+// Z: const pointer to uint64_t
+
+class MergeType {

Please can you add a comment here for the new Prototype modifier you added - 
'%'?



Comment at: clang/lib/Basic/Targets/AArch64.cpp:438
 
+  if (HasSME)
+Builder.defineMacro("__ARM_FEATURE_SME", "1");

Can you remove this please? We can't really set this macro until the SME ABI 
and ACLE is feature complete.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9315
+  unsigned IntID) {
+  switch (IntID) {
+  case Intrinsic::aarch64_sme_ld1h_horiz:

I think that instead of this switch statement you should just be able to write 
something like:

  Ops[3] = EmitSVEPredicateCast(Ops[3],  
getSVEVectorForElementType(SVEBuiltinMemEltTy(TypeFlags)))




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9357
+Function *StreamingVectorLength =
+CGM.getIntrinsic(Intrinsic::aarch64_sme_cntsb, {});
+llvm::Value *StreamingVectorLengthCall =

I think you can just call

  CGM.getIntrinsic(Intrinsic::aarch64_sme_cntsb)



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9359
+llvm::Value *StreamingVectorLengthCall =
+Builder.CreateCall(StreamingVectorLength, {});
+llvm::Value *Mulvl =

Again, I think you can just do

  Builder.CreateCall(StreamingVectorLength)



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9368
+  NewOps.push_back(EmitTileslice(Ops[2], Ops[1]));
+  Function *F = CGM.getIntrinsic(IntID, {});
+  return Builder.CreateCall(F, NewOps);

nit: `Function *F = CGM.getIntrinsic(IntID);`



Comment at: 
clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_int_const_expr_error.c:5
+
+__attribute__((arm_streaming)) void test_svld1_hor_za8(uint64_t tile, uint32_t 
slice_base, uint64_t slice_offset, svbool_t pg, void *ptr) {
+  svld1_hor_za8(tile, slice_base, 0, pg, ptr);  // expected-error 
{{argument to 'svld1_hor_za8' must be a constant integer}}

Once you've added the immediate range checks for the loads and stores it would 
be good add checks here for immediates outside the range for each instruction.



Comment at: clang/utils/TableGen/SveEmitter.cpp:1634
+void SVEEmitter::createSMETypeFlags(raw_ostream &OS) {
+  OS << "#ifdef LLVM_GET_SME_TYPEFLAGS\n";
+  for (auto &KV : FlagTypes)

If you reuse the existing SVETypeFlags rather than create a new SMETypeFlags 
then you only need the LLVM_GET_SME_IMMCHECKTYPES bit.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127910/new/

https://reviews.llvm.org/D127910

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141056: [SVE][CGBuiltins] Remove need for instcombine from ACLE tests.

2023-01-12 Thread David Sherwood via Phabricator via cfe-commits

david-arm accepted this revision.
david-arm added a comment.
This revision is now accepted and ready to land.

LGTM! Thanks for making the changes @paulwalker-arm. :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141056/new/

https://reviews.llvm.org/D141056

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D156115: [Clang][SVE] Permit specific predicate-as-counter registers in inline assembly

2023-07-24 Thread David Sherwood via Phabricator via cfe-commits

david-arm created this revision.
david-arm added reviewers: sdesmalen, MattDevereau, hassnaa-arm.
Herald added subscribers: ctetreau, psnobl, kristof.beyls.
Herald added a reviewer: efriedma.
Herald added a project: All.
david-arm requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This patch adds the predicate-as-counter registers pn0-pn15 to the
list of supported registers used when writing inline assembly.

Tests added to

  clang/test/CodeGen/aarch64-sve-inline-asm.c


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D156115

Files:
  clang/lib/Basic/Targets/AArch64.cpp
  clang/test/CodeGen/aarch64-sve-inline-asm.c


Index: clang/test/CodeGen/aarch64-sve-inline-asm.c
===
--- clang/test/CodeGen/aarch64-sve-inline-asm.c
+++ clang/test/CodeGen/aarch64-sve-inline-asm.c
@@ -1,4 +1,5 @@
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -o - %s | 
FileCheck %s -check-prefix=CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +sve2p1 \
+// RUN:   -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK
 
 void test_sve_asm(void) {
   asm volatile(
@@ -9,5 +10,13 @@
   :
   :
   : "z0", "z31", "p0", "p15");
+  // CHECK-LABEL: @test_sve_asm
   // CHECK: "~{z0},~{z31},~{p0},~{p15}"
 }
+
+void test_sve2p1_asm(void) {
+  register __SVCount_t x2 asm("pn0");
+  asm("ptrue pn8.b" ::: "pn8");
+  // CHECK-LABEL: @test_sve2p1_asm
+  // CHECK: "~{pn8}"
+}
Index: clang/lib/Basic/Targets/AArch64.cpp
===
--- clang/lib/Basic/Targets/AArch64.cpp
+++ clang/lib/Basic/Targets/AArch64.cpp
@@ -1164,7 +1164,11 @@
 
 // SVE predicate registers
 "p0",  "p1",  "p2",  "p3",  "p4",  "p5",  "p6",  "p7",  "p8",  "p9",  
"p10",
-"p11", "p12", "p13", "p14", "p15"
+"p11", "p12", "p13", "p14", "p15",
+
+// SVE predicate-as-counter registers
+"pn0",  "pn1",  "pn2",  "pn3",  "pn4",  "pn5",  "pn6",  "pn7",  "pn8",
+"pn9",  "pn10", "pn11", "pn12", "pn13", "pn14", "pn15"
 };
 
 ArrayRef AArch64TargetInfo::getGCCRegNames() const {


Index: clang/test/CodeGen/aarch64-sve-inline-asm.c
===
--- clang/test/CodeGen/aarch64-sve-inline-asm.c
+++ clang/test/CodeGen/aarch64-sve-inline-asm.c
@@ -1,4 +1,5 @@
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sve2p1 \
+// RUN:   -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK
 
 void test_sve_asm(void) {
   asm volatile(
@@ -9,5 +10,13 @@
   :
   :
   : "z0", "z31", "p0", "p15");
+  // CHECK-LABEL: @test_sve_asm
   // CHECK: "~{z0},~{z31},~{p0},~{p15}"
 }
+
+void test_sve2p1_asm(void) {
+  register __SVCount_t x2 asm("pn0");
+  asm("ptrue pn8.b" ::: "pn8");
+  // CHECK-LABEL: @test_sve2p1_asm
+  // CHECK: "~{pn8}"
+}
Index: clang/lib/Basic/Targets/AArch64.cpp
===
--- clang/lib/Basic/Targets/AArch64.cpp
+++ clang/lib/Basic/Targets/AArch64.cpp
@@ -1164,7 +1164,11 @@
 
 // SVE predicate registers
 "p0",  "p1",  "p2",  "p3",  "p4",  "p5",  "p6",  "p7",  "p8",  "p9",  "p10",
-"p11", "p12", "p13", "p14", "p15"
+"p11", "p12", "p13", "p14", "p15",
+
+// SVE predicate-as-counter registers
+"pn0",  "pn1",  "pn2",  "pn3",  "pn4",  "pn5",  "pn6",  "pn7",  "pn8",
+"pn9",  "pn10", "pn11", "pn12", "pn13", "pn14", "pn15"
 };
 
 ArrayRef AArch64TargetInfo::getGCCRegNames() const {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D156115: [Clang][SVE] Permit specific predicate-as-counter registers in inline assembly

2023-07-24 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 543529.
david-arm added a comment.

- Addressed review comments on tests


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156115/new/

https://reviews.llvm.org/D156115

Files:
  clang/lib/Basic/Targets/AArch64.cpp
  clang/test/CodeGen/aarch64-sve-inline-asm.c


Index: clang/test/CodeGen/aarch64-sve-inline-asm.c
===
--- clang/test/CodeGen/aarch64-sve-inline-asm.c
+++ clang/test/CodeGen/aarch64-sve-inline-asm.c
@@ -1,4 +1,8 @@
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -o - %s | 
FileCheck %s -check-prefix=CHECK
+// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +sve2p1 \
+// RUN:   -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +sve2p1 \
+// RUN:   -S -o /dev/null
 
 void test_sve_asm(void) {
   asm volatile(
@@ -9,5 +13,16 @@
   :
   :
   : "z0", "z31", "p0", "p15");
+  // CHECK-LABEL: @test_sve_asm
   // CHECK: "~{z0},~{z31},~{p0},~{p15}"
 }
+
+void test_sve2p1_asm(void) {
+  asm("pfalse pn0.b\n"
+  "ptrue pn8.d\n"
+  "ptrue pn15.b\n"
+  "pext p3.b, pn8[1]\n"
+  ::: "pn0", "pn8", "pn15", "p3");
+  // CHECK-LABEL: @test_sve2p1_asm
+  // CHECK: "~{pn0},~{pn8},~{pn15},~{p3}"
+}
Index: clang/lib/Basic/Targets/AArch64.cpp
===
--- clang/lib/Basic/Targets/AArch64.cpp
+++ clang/lib/Basic/Targets/AArch64.cpp
@@ -1164,7 +1164,11 @@
 
 // SVE predicate registers
 "p0",  "p1",  "p2",  "p3",  "p4",  "p5",  "p6",  "p7",  "p8",  "p9",  
"p10",
-"p11", "p12", "p13", "p14", "p15"
+"p11", "p12", "p13", "p14", "p15",
+
+// SVE predicate-as-counter registers
+"pn0",  "pn1",  "pn2",  "pn3",  "pn4",  "pn5",  "pn6",  "pn7",  "pn8",
+"pn9",  "pn10", "pn11", "pn12", "pn13", "pn14", "pn15"
 };
 
 ArrayRef AArch64TargetInfo::getGCCRegNames() const {


Index: clang/test/CodeGen/aarch64-sve-inline-asm.c
===
--- clang/test/CodeGen/aarch64-sve-inline-asm.c
+++ clang/test/CodeGen/aarch64-sve-inline-asm.c
@@ -1,4 +1,8 @@
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK
+// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sve2p1 \
+// RUN:   -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sve2p1 \
+// RUN:   -S -o /dev/null
 
 void test_sve_asm(void) {
   asm volatile(
@@ -9,5 +13,16 @@
   :
   :
   : "z0", "z31", "p0", "p15");
+  // CHECK-LABEL: @test_sve_asm
   // CHECK: "~{z0},~{z31},~{p0},~{p15}"
 }
+
+void test_sve2p1_asm(void) {
+  asm("pfalse pn0.b\n"
+  "ptrue pn8.d\n"
+  "ptrue pn15.b\n"
+  "pext p3.b, pn8[1]\n"
+  ::: "pn0", "pn8", "pn15", "p3");
+  // CHECK-LABEL: @test_sve2p1_asm
+  // CHECK: "~{pn0},~{pn8},~{pn15},~{p3}"
+}
Index: clang/lib/Basic/Targets/AArch64.cpp
===
--- clang/lib/Basic/Targets/AArch64.cpp
+++ clang/lib/Basic/Targets/AArch64.cpp
@@ -1164,7 +1164,11 @@
 
 // SVE predicate registers
 "p0",  "p1",  "p2",  "p3",  "p4",  "p5",  "p6",  "p7",  "p8",  "p9",  "p10",
-"p11", "p12", "p13", "p14", "p15"
+"p11", "p12", "p13", "p14", "p15",
+
+// SVE predicate-as-counter registers
+"pn0",  "pn1",  "pn2",  "pn3",  "pn4",  "pn5",  "pn6",  "pn7",  "pn8",
+"pn9",  "pn10", "pn11", "pn12", "pn13", "pn14", "pn15"
 };
 
 ArrayRef AArch64TargetInfo::getGCCRegNames() const {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D156115: [Clang][SVE] Permit specific predicate-as-counter registers in inline assembly

2023-07-25 Thread David Sherwood via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG4cf11d8a65df: [Clang][SVE] Permit specific 
predicate-as-counter registers in inline assembly (authored by david-arm).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156115/new/

https://reviews.llvm.org/D156115

Files:
  clang/lib/Basic/Targets/AArch64.cpp
  clang/test/CodeGen/aarch64-sve-inline-asm.c


Index: clang/test/CodeGen/aarch64-sve-inline-asm.c
===
--- clang/test/CodeGen/aarch64-sve-inline-asm.c
+++ clang/test/CodeGen/aarch64-sve-inline-asm.c
@@ -1,4 +1,8 @@
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -o - %s | 
FileCheck %s -check-prefix=CHECK
+// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +sve2p1 \
+// RUN:   -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +sve2p1 \
+// RUN:   -S -o /dev/null
 
 void test_sve_asm(void) {
   asm volatile(
@@ -9,5 +13,16 @@
   :
   :
   : "z0", "z31", "p0", "p15");
+  // CHECK-LABEL: @test_sve_asm
   // CHECK: "~{z0},~{z31},~{p0},~{p15}"
 }
+
+void test_sve2p1_asm(void) {
+  asm("pfalse pn0.b\n"
+  "ptrue pn8.d\n"
+  "ptrue pn15.b\n"
+  "pext p3.b, pn8[1]\n"
+  ::: "pn0", "pn8", "pn15", "p3");
+  // CHECK-LABEL: @test_sve2p1_asm
+  // CHECK: "~{pn0},~{pn8},~{pn15},~{p3}"
+}
Index: clang/lib/Basic/Targets/AArch64.cpp
===
--- clang/lib/Basic/Targets/AArch64.cpp
+++ clang/lib/Basic/Targets/AArch64.cpp
@@ -1164,7 +1164,11 @@
 
 // SVE predicate registers
 "p0",  "p1",  "p2",  "p3",  "p4",  "p5",  "p6",  "p7",  "p8",  "p9",  
"p10",
-"p11", "p12", "p13", "p14", "p15"
+"p11", "p12", "p13", "p14", "p15",
+
+// SVE predicate-as-counter registers
+"pn0",  "pn1",  "pn2",  "pn3",  "pn4",  "pn5",  "pn6",  "pn7",  "pn8",
+"pn9",  "pn10", "pn11", "pn12", "pn13", "pn14", "pn15"
 };
 
 ArrayRef AArch64TargetInfo::getGCCRegNames() const {


Index: clang/test/CodeGen/aarch64-sve-inline-asm.c
===
--- clang/test/CodeGen/aarch64-sve-inline-asm.c
+++ clang/test/CodeGen/aarch64-sve-inline-asm.c
@@ -1,4 +1,8 @@
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK
+// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sve2p1 \
+// RUN:   -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sve2p1 \
+// RUN:   -S -o /dev/null
 
 void test_sve_asm(void) {
   asm volatile(
@@ -9,5 +13,16 @@
   :
   :
   : "z0", "z31", "p0", "p15");
+  // CHECK-LABEL: @test_sve_asm
   // CHECK: "~{z0},~{z31},~{p0},~{p15}"
 }
+
+void test_sve2p1_asm(void) {
+  asm("pfalse pn0.b\n"
+  "ptrue pn8.d\n"
+  "ptrue pn15.b\n"
+  "pext p3.b, pn8[1]\n"
+  ::: "pn0", "pn8", "pn15", "p3");
+  // CHECK-LABEL: @test_sve2p1_asm
+  // CHECK: "~{pn0},~{pn8},~{pn15},~{p3}"
+}
Index: clang/lib/Basic/Targets/AArch64.cpp
===
--- clang/lib/Basic/Targets/AArch64.cpp
+++ clang/lib/Basic/Targets/AArch64.cpp
@@ -1164,7 +1164,11 @@
 
 // SVE predicate registers
 "p0",  "p1",  "p2",  "p3",  "p4",  "p5",  "p6",  "p7",  "p8",  "p9",  "p10",
-"p11", "p12", "p13", "p14", "p15"
+"p11", "p12", "p13", "p14", "p15",
+
+// SVE predicate-as-counter registers
+"pn0",  "pn1",  "pn2",  "pn3",  "pn4",  "pn5",  "pn6",  "pn7",  "pn8",
+"pn9",  "pn10", "pn11", "pn12", "pn13", "pn14", "pn15"
 };
 
 ArrayRef AArch64TargetInfo::getGCCRegNames() const {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D159188: [AArch64][SME] Make the overloaded svreinterpret_* functions streaming-compatible.

2023-08-31 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: clang/include/clang/Basic/Attr.td:418
 def TargetARM : TargetArch<["arm", "thumb", "armeb", "thumbeb"]>;
-def TargetAArch64 : TargetArch<["aarch64"]>;
+def TargetAArch64 : TargetArch<["aarch64", "aarch64_be"]>;
 def TargetAnyArm : TargetArch;

I'm not sure why this change is included in this patch?



Comment at: clang/test/CodeGen/attr-arm-sve-vector-bits-types.c:6
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +bf16 -mvscale-min=16 -mvscale-max=16 -S -emit-llvm -o - %s | 
FileCheck %s --check-prefix=CHECK-2048
-// RUN: %clang_cc1 -triple aarch64_32-unknown-darwin -target-feature +sve 
-target-feature +bf16 -mvscale-min=4 -mvscale-max=4 -S -emit-llvm -o - %s | 
FileCheck %s --check-prefix=CHECK-ILP32
 

I can see this line looks redundant, but is it supposed to be related to the 
other changes?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159188/new/

https://reviews.llvm.org/D159188

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D138788: [SVE] Change some bfloat lane intrinsics to use i32 immediates

2022-11-28 Thread David Sherwood via Phabricator via cfe-commits

david-arm created this revision.
david-arm added reviewers: sdesmalen, paulwalker-arm, kmclaughlin.
Herald added subscribers: ctetreau, psnobl, hiraditya, tschuett.
Herald added a reviewer: efriedma.
Herald added a project: All.
david-arm requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

Almost all of the other SVE LLVM IR intrinsics take i32 values
for lane indices or other immediates. We should bring the bfloat
intrinsics in line with that. It will also make it easier to
add support for the SVE2.1 float intrinsics in future, since
they reuse the same underlying instruction classes.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D138788

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalt.c
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
@@ -19,7 +19,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -28,7 +28,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -37,7 +37,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i32 2)
   ret  %out
 }
 
@@ -46,7 +46,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[3]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 3)
+  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i32 3)
   ret  %out
 }
 
@@ -68,7 +68,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -77,7 +77,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -86,7 +86,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i32 2)
   ret  %out
 }
 
@@ -95,7 +95,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[3]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 3)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i32 3)
   ret  %out
 }
 
@@ -104,7 +104,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[4]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 4)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i32 4)
   ret  %out
 }
 
@@ -113,7 +113,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[5]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 5)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i32 5)
   ret  %out
 }
 
@@ -122,7 +122,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[6]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 6)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i32 6)
   ret  %out
 }
 
@@ -131,7 +131,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[7]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 7)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i32 7)
   ret  %out
 }
 
@@ -153,7 +153,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalt z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalt.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfmlalt.lane( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -162,7 +162,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalt z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = cal

[PATCH] D138788: [SVE] Change some bfloat lane intrinsics to use i32 immediates

2022-11-29 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 478494.
david-arm edited the summary of this revision.
david-arm added a comment.

- Changed patch to use autoupgrade mechanism.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138788/new/

https://reviews.llvm.org/D138788

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalt.c
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/IR/AutoUpgrade.cpp
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
@@ -19,7 +19,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.i32( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -28,7 +28,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.i32( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -37,7 +37,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.i32( %a,  %b,  %c, i32 2)
   ret  %out
 }
 
@@ -46,7 +46,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[3]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 3)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.i32( %a,  %b,  %c, i32 3)
   ret  %out
 }
 
@@ -68,7 +68,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -77,7 +77,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -86,7 +86,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 2)
   ret  %out
 }
 
@@ -95,7 +95,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[3]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 3)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 3)
   ret  %out
 }
 
@@ -104,7 +104,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[4]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 4)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 4)
   ret  %out
 }
 
@@ -113,7 +113,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[5]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 5)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 5)
   ret  %out
 }
 
@@ -122,7 +122,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[6]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 6)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 6)
   ret  %out
 }
 
@@ -131,7 +131,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[7]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 7)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 7)
   ret  %out
 }
 
@@ -153,7 +153,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalt z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalt.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfmlalt.lane.i32( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -162,7 +162,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalt z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalt.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfmlalt.lane.i32( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -171,7 +171,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalt z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalt.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfmlalt.lane.i32( %a,

[PATCH] D138788: [SVE] Change some bfloat lane intrinsics to use i32 immediates

2022-11-30 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 478868.
david-arm added a comment.

- Actually done the upgrade stuff properly this time. :)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138788/new/

https://reviews.llvm.org/D138788

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalt.c
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/IR/AutoUpgrade.cpp
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/Bitcode/upgrade-aarch64-sve-intrinsics.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
@@ -19,7 +19,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.i32( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -28,7 +28,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.i32( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -37,7 +37,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.i32( %a,  %b,  %c, i32 2)
   ret  %out
 }
 
@@ -46,7 +46,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[3]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 3)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.i32( %a,  %b,  %c, i32 3)
   ret  %out
 }
 
@@ -68,7 +68,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -77,7 +77,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -86,7 +86,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 2)
   ret  %out
 }
 
@@ -95,7 +95,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[3]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 3)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 3)
   ret  %out
 }
 
@@ -104,7 +104,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[4]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 4)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 4)
   ret  %out
 }
 
@@ -113,7 +113,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[5]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 5)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 5)
   ret  %out
 }
 
@@ -122,7 +122,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[6]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 6)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 6)
   ret  %out
 }
 
@@ -131,7 +131,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[7]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 7)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.i32( %a,  %b,  %c, i32 7)
   ret  %out
 }
 
@@ -153,7 +153,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalt z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalt.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfmlalt.lane.i32( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -162,7 +162,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalt z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalt.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfmlalt.lane.i32( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -171,7 +171,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalt z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalt.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfmla

[PATCH] D138788: [SVE] Change some bfloat lane intrinsics to use i32 immediates

2022-11-30 Thread David Sherwood via Phabricator via cfe-commits

david-arm marked an inline comment as done.
david-arm added inline comments.



Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:2527
 
-def int_aarch64_sve_bfdot_lane   : SVE_4Vec_BF16_Indexed;
-def int_aarch64_sve_bfmlalb_lane : SVE_4Vec_BF16_Indexed;
-def int_aarch64_sve_bfmlalt_lane : SVE_4Vec_BF16_Indexed;
+def int_aarch64_sve_bfdot_lane   : SVE_4Vec_BF16_Indexed;
+def int_aarch64_sve_bfdot_lane_i32   : SVE_4Vec_BF16_Indexed_I32;

sdesmalen wrote:
> do you also want to remove the old intrinsics?
Good spot! Turns out that not only had I done this wrong, but I'd also missed 
out upgrades for bfmlalb/t too. :)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138788/new/

https://reviews.llvm.org/D138788

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D150953: [Clang][SVE2.1] Add clang support for prototypes using svcount_t

2023-05-23 Thread David Sherwood via Phabricator via cfe-commits

david-arm accepted this revision.
david-arm added a comment.
This revision is now accepted and ready to land.

LGTM!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150953/new/

https://reviews.llvm.org/D150953

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D127910: [Clang][AArch64][SME] Add vector load/store (ld1/st1) intrinsics

2023-02-09 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Thanks a lot for making all the changes @bryanpkc - it's looking really good 
now! I just have a few minor comments/suggestions and then I think it looks 
good to go.




Comment at: clang/include/clang/Basic/arm_sme.td:22
+let TargetGuard = "sme" in {
+  def SVLD1_HOR_ZA8 : MInst<"svld1_hor_za8", "vimiPQ", "c", [IsLoad, 
IsOverloadNone, IsStreaming, IsSharedZA], MemEltTyDefault, 
"aarch64_sme_ld1b_horiz", [ImmCheck<0, ImmCheck0_0>, ImmCheck<2, 
ImmCheck0_15>]>;
+  def SVLD1_HOR_ZA16 : MInst<"svld1_hor_za16", "vimiPQ", "s", [IsLoad, 
IsOverloadNone, IsStreaming, IsSharedZA], MemEltTyDefault, 
"aarch64_sme_ld1h_horiz", [ImmCheck<0, ImmCheck0_1>, ImmCheck<2, ImmCheck0_7>]>;

This is just a suggestion, but you could reduce the lines of code here if you 
want by creating a multiclass that creates both the horizontal and vertical 
variants for each size, i.e. something like

  multiclass MyMultiClass<..> {
def NAME # _H : MInst<...>
def NAME # _V : MInst<...>
  }

  defm SVLD1_ZA8 : MyMultiClass<...>;

or whatever naming scheme you prefer, and same for the stores. Feel free to 
ignore this suggestion though if it doesn't help you!



Comment at: clang/lib/Basic/Targets/AArch64.cpp:438
 
+  if (HasSME)
+Builder.defineMacro("__ARM_FEATURE_SME", "1");

bryanpkc wrote:
> david-arm wrote:
> > Can you remove this please? We can't really set this macro until the SME 
> > ABI and ACLE is feature complete.
> OK. Could you educate me what else is needed for SME ABI and ACLE to be 
> feature-complete? How can I help?
It should have complete support for the SME ABI and ACLE in terms of supporting 
the C/C++ level attributes as described here  - 
https://arm-software.github.io/acle/main/acle.html#sme-language-extensions-and-intrinsics.
 For example, the compiler should support cases where a normal function calls a 
`arm_streaming` function and sets up the state correctly, etc. You can see 
@sdesmalen's patch to add the clang-side attributes here D127762. There should 
also be full support for all of the SME ACLE builtins.



Comment at: clang/test/Sema/aarch64-sme-intrinsics/acle_sme_imm.cpp:3
+
+// RUN: %clang_cc1 -D__ARM_FEATURE_SME -triple aarch64-none-linux-gnu 
-target-feature +sve -target-feature +sme -fsyntax-only -verify 
-verify-ignore-unexpected=error %s
+// RUN: %clang_cc1 -D__ARM_FEATURE_SME -DSVE_OVERLOADED_FORMS -triple 
aarch64-none-linux-gnu -target-feature +sve -target-feature +sme -fsyntax-only 
-verify -verify-ignore-unexpected=error %s

I think you can remove the `-target-feature +sve` flags from the RUN lines 
because `+sme` should imply that.



Comment at: clang/test/Sema/aarch64-sme-intrinsics/acle_sme_imm.cpp:16
+__attribute__((arm_streaming))
+void test_range_0_0(svbool_t pg, void *ptr) {
+  // expected-error-re@+1 {{argument value 0 is outside the valid range [0, 
0]}}

These tests are great! Thanks for doing this. Could you also add the `_vnum` 
variants too?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127910/new/

https://reviews.llvm.org/D127910

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D151461: [Clang][SVE2.1] Add builtins and intrinsics for SVBFMLSLB/T

2023-06-05 Thread David Sherwood via Phabricator via cfe-commits

david-arm accepted this revision.
david-arm added a comment.
This revision is now accepted and ready to land.

LGTM! I left a nit, which you could just address before landing the patch?




Comment at: clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_bfmlsl.c:48
+//
+svfloat32_t test_bfmlslb_idx(svfloat32_t zda, svbfloat16_t zn, svbfloat16_t zm)
+{

nit: Just a minor issue, but perhaps worth calling this `test_bfmlslb_lane` to 
be consistent with the builtin?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151461/new/

https://reviews.llvm.org/D151461

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D138788: [SVE] Change some bfloat lane intrinsics to use i32 immediates

2022-12-06 Thread David Sherwood via Phabricator via cfe-commits

david-arm marked an inline comment as done.
david-arm added inline comments.



Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:2527
 
-def int_aarch64_sve_bfdot_lane   : SVE_4Vec_BF16_Indexed;
-def int_aarch64_sve_bfmlalb_lane : SVE_4Vec_BF16_Indexed;
-def int_aarch64_sve_bfmlalt_lane : SVE_4Vec_BF16_Indexed;
+def int_aarch64_sve_bfdot_lane   : SVE_4Vec_BF16_Indexed;
+def int_aarch64_sve_bfdot_lane_i32   : SVE_4Vec_BF16_Indexed_I32;

paulwalker-arm wrote:
> david-arm wrote:
> > sdesmalen wrote:
> > > do you also want to remove the old intrinsics?
> > Good spot! Turns out that not only had I done this wrong, but I'd also 
> > missed out upgrades for bfmlalb/t too. :)
> Having `_i32` in the name is confusing because it'll come out as `.i32` when 
> printed in IR, which looks like a type suffix but in this case it's actually 
> part of the name.
> 
> With that said, do you have to change the name?  That seems unfortunate given 
> this is a bug fix.
> 
> If it's absolutely necessary then I suggestion using `_v2` to signify this is 
> the second version of this intrinsic.
Sadly @paulwalker-arm the auto-upgrade mechanism forbids you from upgrading to 
an intrinsic of the same name. I don't really know why we have that restriction 
though, since in theory I could decide to upgrade only when the index is a i64 
type. I'll use _v2 then!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138788/new/

https://reviews.llvm.org/D138788

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D138788: [SVE] Change some bfloat lane intrinsics to use i32 immediates

2022-12-06 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 480451.
david-arm added a comment.

I did try to avoid having to create an intrinsic with a different name, but I 
was thwarted at every turn! It is with great regret that I report these two 
problems:

1. When you write an IR test file containing a declaration of the same 
intrinsic with a different type and try to use the upgrade mechanism the 
compiler crashes in `llvm::Intrinsic::getDeclaration`. This is presumably 
because we've declared an intrinsic in the module with a different signature to 
that in IntrinsicsAArch64.td and it seems deeply unhappy about it.
2. Secondly, in `llvm::UpgradeIntrinsicFunction` we would also hit the assert 
`assert(F != NewFn && "Intrinsic function upgraded to the same function");`

Let us bemoan the evil that befalls us @paulwalker-arm!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138788/new/

https://reviews.llvm.org/D138788

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalt.c
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/IR/AutoUpgrade.cpp
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/Bitcode/upgrade-aarch64-sve-intrinsics.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
@@ -19,7 +19,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.v2( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -28,7 +28,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.v2( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -37,7 +37,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.v2( %a,  %b,  %c, i32 2)
   ret  %out
 }
 
@@ -46,7 +46,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[3]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 3)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.v2( %a,  %b,  %c, i32 3)
   ret  %out
 }
 
@@ -68,7 +68,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -77,7 +77,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -86,7 +86,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 2)
   ret  %out
 }
 
@@ -95,7 +95,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[3]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 3)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 3)
   ret  %out
 }
 
@@ -104,7 +104,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[4]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 4)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 4)
   ret  %out
 }
 
@@ -113,7 +113,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[5]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 5)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 5)
   ret  %out
 }
 
@@ -122,7 +122,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[6]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 6)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 6)
   ret  %out
 }
 
@@ -131,7 +131,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[7]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 7)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 7)
   ret  %out
 }
 
@@ -153,7 +153,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalt z0.s, z1.h, z2.h[0]

[PATCH] D138788: [SVE] Change some bfloat lane intrinsics to use i32 immediates

2022-12-07 Thread David Sherwood via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGbfb6f47e9ea4: [SVE] Change some bfloat lane intrinsics to 
use i32 immediates (authored by david-arm).

Changed prior to commit:
  https://reviews.llvm.org/D138788?vs=480451&id=480808#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138788/new/

https://reviews.llvm.org/D138788

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalb.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalt.c
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/IR/AutoUpgrade.cpp
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/Bitcode/upgrade-aarch64-sve-intrinsics.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll
@@ -19,7 +19,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.v2( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -28,7 +28,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.v2( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -37,7 +37,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.v2( %a,  %b,  %c, i32 2)
   ret  %out
 }
 
@@ -46,7 +46,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfdot z0.s, z1.h, z2.h[3]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfdot.lane( %a,  %b,  %c, i64 3)
+  %out = call  @llvm.aarch64.sve.bfdot.lane.v2( %a,  %b,  %c, i32 3)
   ret  %out
 }
 
@@ -68,7 +68,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -77,7 +77,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 1)
   ret  %out
 }
 
@@ -86,7 +86,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[2]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 2)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 2)
   ret  %out
 }
 
@@ -95,7 +95,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[3]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 3)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 3)
   ret  %out
 }
 
@@ -104,7 +104,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[4]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 4)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 4)
   ret  %out
 }
 
@@ -113,7 +113,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[5]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 5)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 5)
   ret  %out
 }
 
@@ -122,7 +122,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[6]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 6)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 6)
   ret  %out
 }
 
@@ -131,7 +131,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalb z0.s, z1.h, z2.h[7]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalb.lane( %a,  %b,  %c, i64 7)
+  %out = call  @llvm.aarch64.sve.bfmlalb.lane.v2( %a,  %b,  %c, i32 7)
   ret  %out
 }
 
@@ -153,7 +153,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalt z0.s, z1.h, z2.h[0]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalt.lane( %a,  %b,  %c, i64 0)
+  %out = call  @llvm.aarch64.sve.bfmlalt.lane.v2( %a,  %b,  %c, i32 0)
   ret  %out
 }
 
@@ -162,7 +162,7 @@
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:bfmlalt z0.s, z1.h, z2.h[1]
 ; CHECK-NEXT:ret
-  %out = call  @llvm.aarch64.sve.bfmlalt.lane( %a,  %b,  %c, i64 1)
+  %out = call  @llvm.aarch64.sve.bfmlalt.lane.v2( %a,  %b,  %c, i32 1)
   ret  %out

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2021-01-06 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi everyone, I realise that most people have probably been on holiday recently, 
but just a gentle ping here to see if anyone could take another look? Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2021-01-07 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: clang/lib/CodeGen/CGLoopInfo.cpp:751-753
+  // they effectively want vectorization disabled. We leave the
+  // scalable flag unspecified in this case to avoid setting the
+  // vectorize.enable flag later on.

sdesmalen wrote:
> is that not something to fix in the code that conditionally sets 
> vectorize.enable later on instead of working around it here?
I did originally try to do that, but I had trouble with it and found it broke 
other places too. It ended up being simpler to fix it here, but I can play 
around with it again. Even if this is still the simplest solution I can come 
back with a more detailed explanation at least!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2021-01-07 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 315121.
david-arm added a comment.

- Updated documentation as per review comments.
- Fixed an issue with using value->prettyPrint on a null ptr.
- Reworked the code that sets vectorize.enable.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/lib/AST/AttrImpl.cpp
  clang/lib/CodeGen/CGLoopInfo.cpp
  clang/lib/CodeGen/CGLoopInfo.h
  clang/lib/Parse/ParsePragma.cpp
  clang/lib/Sema/SemaStmtAttr.cpp
  clang/test/AST/ast-print-pragmas.cpp
  clang/test/CodeGenCXX/pragma-loop-pr27643.cpp
  clang/test/CodeGenCXX/pragma-loop.cpp
  clang/test/Parser/pragma-loop.cpp

Index: clang/test/Parser/pragma-loop.cpp
===
--- clang/test/Parser/pragma-loop.cpp
+++ clang/test/Parser/pragma-loop.cpp
@@ -60,7 +60,8 @@
 
 template 
 void test_nontype_template_badarg(int *List, int Length) {
-  /* expected-error {{use of undeclared identifier 'Vec'}} */ #pragma clang loop vectorize_width(Vec) interleave_count(I)
+  /* expected-error {{use of undeclared identifier 'Vec'}} */ #pragma clang loop vectorize_width(Vec) interleave_count(I) /*
+ expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, fixed) or vectorize_width(X, scalable) where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
   /* expected-error {{use of undeclared identifier 'Int'}} */ #pragma clang loop vectorize_width(V) interleave_count(Int)
   for (int i = 0; i < Length; i++) {
 List[i] = i;
@@ -189,12 +190,15 @@
 /* expected-warning {{extra tokens at end of '#pragma clang loop'}} */ #pragma clang loop vectorize_width(1 +) 1
 /* expected-warning {{extra tokens at end of '#pragma clang loop'}} */ #pragma clang loop vectorize_width(1) +1
 const int VV = 4;
-/* expected-error {{expected expression}} */ #pragma clang loop vectorize_width(VV +/ 2)
-/* expected-error {{use of undeclared identifier 'undefined'}} */ #pragma clang loop vectorize_width(VV+undefined)
+/* expected-error {{expected expression}} */ #pragma clang loop vectorize_width(VV +/ 2) /*
+   expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, fixed) or vectorize_width(X, scalable) where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
+/* expected-error {{use of undeclared identifier 'undefined'}} */ #pragma clang loop vectorize_width(VV+undefined) /*
+   expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, fixed) or vectorize_width(X, scalable) where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
 /* expected-error {{expected ')'}} */ #pragma clang loop vectorize_width(1+(^*/2 * ()
 /* expected-warning {{extra tokens at end of '#pragma clang loop' - ignored}} */ #pragma clang loop vectorize_width(1+(-0[0]))
 
-/* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop vectorize_width(badvalue)
+/* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop vectorize_width(badvalue) /*
+   expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, fixed) or vectorize_width(X, scalable) where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
 /* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop interleave_count(badvalue)
 /* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop unroll_count(badvalue)
   while (i-6 < Length) {
@@ -215,7 +219,7 @@
 /* expected-error {{invalid argument; expected 'enable', 'assume_safety' or 'disable'}} */ #pragma clang loop interleave(*)
 /* expected-error {{invalid argument; expected 'enable', 'full' or 'disable'}} */ #pragma clang loop unroll(=)
 /* expected-error {{invalid argument; expected 'enable' or 'disable'}} */ #pragma clang loop distribute(+)
-/* expected-error {{type name requires a specifier or qualifier}} expected-error {{expected expression}} */ #pragma clang loop vectorize_width(^)
+/* expected-error {{type name requires a specifier or qualifier}} expected-error {{expected expression}} */ #pragma clang loop vectorize_width(^) /* expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, fixed) or vectorize_width(X, scalable) where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
 /* expected-error {{expected expression}} expected-error {{expected expression}} */ #pragma clang loop interleave_count(/)
 /* expected-error {{expected expression}} expected-error {{expected expression}} */ #pragma clang loop unroll_count(==)
   while (i-8 < Length) {
Index: clang/test/CodeGenCXX/pragma-loop.cpp
===
--- clang/test/CodeGenCXX/pragma-loop.cpp
+++ clang/test/CodeGenCXX/pragma-loop.cpp
@@ -158,51 +158,97 @@
   for_template_constant_ex

[PATCH] D89031: [SVE] Add support to vectorize_width loop pragma for scalable vectors

2021-01-08 Thread David Sherwood via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG38d18d93534d: [SVE] Add support to vectorize_width loop 
pragma for scalable vectors (authored by david-arm).

Changed prior to commit:
  https://reviews.llvm.org/D89031?vs=315121&id=315341#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89031/new/

https://reviews.llvm.org/D89031

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/lib/AST/AttrImpl.cpp
  clang/lib/CodeGen/CGLoopInfo.cpp
  clang/lib/CodeGen/CGLoopInfo.h
  clang/lib/Parse/ParsePragma.cpp
  clang/lib/Sema/SemaStmtAttr.cpp
  clang/test/AST/ast-print-pragmas.cpp
  clang/test/CodeGenCXX/pragma-loop-pr27643.cpp
  clang/test/CodeGenCXX/pragma-loop.cpp
  clang/test/Parser/pragma-loop.cpp

Index: clang/test/Parser/pragma-loop.cpp
===
--- clang/test/Parser/pragma-loop.cpp
+++ clang/test/Parser/pragma-loop.cpp
@@ -60,7 +60,8 @@
 
 template 
 void test_nontype_template_badarg(int *List, int Length) {
-  /* expected-error {{use of undeclared identifier 'Vec'}} */ #pragma clang loop vectorize_width(Vec) interleave_count(I)
+  /* expected-error {{use of undeclared identifier 'Vec'}} */ #pragma clang loop vectorize_width(Vec) interleave_count(I) /*
+ expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, fixed) or vectorize_width(X, scalable) where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
   /* expected-error {{use of undeclared identifier 'Int'}} */ #pragma clang loop vectorize_width(V) interleave_count(Int)
   for (int i = 0; i < Length; i++) {
 List[i] = i;
@@ -74,6 +75,11 @@
   for (int i = 0; i < Length; i++) {
 List[i] = i;
   }
+
+  /* expected-error {{invalid value '-1'; must be positive}} */ #pragma clang loop vectorize_width(Value, fixed)
+  for (int i = 0; i < Length; i++) {
+List[i] = i;
+  }
 }
 
 void test(int *List, int Length) {
@@ -189,12 +195,15 @@
 /* expected-warning {{extra tokens at end of '#pragma clang loop'}} */ #pragma clang loop vectorize_width(1 +) 1
 /* expected-warning {{extra tokens at end of '#pragma clang loop'}} */ #pragma clang loop vectorize_width(1) +1
 const int VV = 4;
-/* expected-error {{expected expression}} */ #pragma clang loop vectorize_width(VV +/ 2)
-/* expected-error {{use of undeclared identifier 'undefined'}} */ #pragma clang loop vectorize_width(VV+undefined)
+/* expected-error {{expected expression}} */ #pragma clang loop vectorize_width(VV +/ 2) /*
+   expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, fixed) or vectorize_width(X, scalable) where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
+/* expected-error {{use of undeclared identifier 'undefined'}} */ #pragma clang loop vectorize_width(VV+undefined) /*
+   expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, fixed) or vectorize_width(X, scalable) where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
 /* expected-error {{expected ')'}} */ #pragma clang loop vectorize_width(1+(^*/2 * ()
 /* expected-warning {{extra tokens at end of '#pragma clang loop' - ignored}} */ #pragma clang loop vectorize_width(1+(-0[0]))
 
-/* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop vectorize_width(badvalue)
+/* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop vectorize_width(badvalue) /*
+   expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, fixed) or vectorize_width(X, scalable) where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
 /* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop interleave_count(badvalue)
 /* expected-error {{use of undeclared identifier 'badvalue'}} */ #pragma clang loop unroll_count(badvalue)
   while (i-6 < Length) {
@@ -215,7 +224,7 @@
 /* expected-error {{invalid argument; expected 'enable', 'assume_safety' or 'disable'}} */ #pragma clang loop interleave(*)
 /* expected-error {{invalid argument; expected 'enable', 'full' or 'disable'}} */ #pragma clang loop unroll(=)
 /* expected-error {{invalid argument; expected 'enable' or 'disable'}} */ #pragma clang loop distribute(+)
-/* expected-error {{type name requires a specifier or qualifier}} expected-error {{expected expression}} */ #pragma clang loop vectorize_width(^)
+/* expected-error {{type name requires a specifier or qualifier}} expected-error {{expected expression}} */ #pragma clang loop vectorize_width(^) /* expected-note {{vectorize_width loop hint malformed; use vectorize_width(X, fixed) or vectorize_width(X, scalable) where X is an integer, or vectorize_width('fixed' or 'scalable')}} */
 /* expected-error {{expected expression}} expected-error {{expected expression}} */ #pragma clang loop interleave_count(/)
 /*

[PATCH] D110258: [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-10-08 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 378180.
david-arm marked an inline comment as done.
david-arm added a comment.

- Fixed formatting issues.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/aarch64-mtune.c
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/unittests/Target/AArch64/InstSizes.cpp
  llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp

Index: llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
===
--- llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
+++ llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
@@ -26,6 +26,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/unittests/Target/AArch64/InstSizes.cpp
===
--- llvm/unittests/Target/AArch64/InstSizes.cpp
+++ llvm/unittests/Target/AArch64/InstSizes.cpp
@@ -29,6 +29,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
===
--- llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -355,10 +355,13 @@
 const AArch64Subtarget *
 AArch64TargetMachine::getSubtargetImpl(const Function &F) const {
   Attribute CPUAttr = F.getFnAttribute("target-cpu");
+  Attribute TuneAttr = F.getFnAttribute("tune-cpu");
   Attribute FSAttr = F.getFnAttribute("target-features");
 
   std::string CPU =
   CPUAttr.isValid() ? CPUAttr.getValueAsString().str() : TargetCPU;
+  std::string TuneCPU =
+  TuneAttr.isValid() ? TuneAttr.getValueAsString().str() : CPU;
   std::string FS =
   FSAttr.isValid() ? FSAttr.getValueAsString().str() : TargetFS;
 
@@ -399,6 +402,7 @@
   Key += "SVEMax";
   Key += std::to_string(MaxSVEVectorSize);
   Key += CPU;
+  Key += TuneCPU;
   Key += FS;
 
   auto &I = SubtargetMap[Key];
@@ -407,8 +411,8 @@
 // creation will depend on the TM and the code generation flags on the
 // function that reside in TargetOptions.
 resetTargetOptions(F);
-I = std::make_unique(TargetTriple, CPU, FS, *this,
-   isLittle, MinSVEVectorSize,
+I = std::make_unique(TargetTriple, CPU, TuneCPU, FS,
+   *this, isLittle, MinSVEVectorSize,
MaxSVEVectorSize);
   }
   return I.get();
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -293,7 +293,8 @@
   /// passed in feature string so that we can use initializer lists for
   /// subtarget initialization.
   AArch64Subtarget &initializeSubtargetDependencies(StringRef FS,
-StringRef CPUString);
+StringRef CPUString,
+StringRef TuneCPUString);
 
   /// Initialize properties based on the selected processor family.
   void initializeProperties();
@@ -302,8 +303,8 @@
   /// This constructor initializes the data members to match that
   /// of the specified triple.
   AArch64Subtarget(const Triple &TT, const std::string &CPU,
-   const std::string &FS, const TargetMachine &TM,
-   bool LittleEndian,
+   const std::string &TuneCPU, const std::string &FS,
+   const TargetMachine &TM, bool LittleEndian,
unsigned MinSVEVectorSizeInBitsOverride = 0,
unsigned MaxSVEVectorSizeInBitsOverride = 0);
 
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -50,15 +50,17 @@
 static cl::opt UseAA("aarch64-use-aa", cl::init(true),
cl::desc("Enable the use of AA during codegen."));
 
-AArch64Subtarget &
-AArch64Subtarget::initializeSubtargetDependencies(StringRef

[PATCH] D110258: [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-10-08 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Gentle ping!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D110258: [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-10-12 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

In D110258#3055488 , @dmgreen wrote:

> If D111551  was folded into this patch, 
> would it be possible to add tests for -tune-cpu enabling/disabling features 
> at the correct times?

Similar to the comment I left on D110259 , I 
don't want D111551  to hold up the cost model 
changes, which are critical. I'd prefer them to be independent. I expect 
D111551  to take longer to get approval, and 
even once approved/merged if for any reason D111551 
 causes issues after merging we only have to 
revert that one change to AArch64.td.




Comment at: clang/test/Driver/aarch64-mtune.c:3
+
+// There shouldn't be a default -mtune.
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s 2>&1 \

dmgreen wrote:
> Why do we not want to add a default tune-cpu?
This was in response to an earlier review comment by @paulwalker-arm asking 
what benefit "-mtune=generic" provded and about restricting the patch to only 
add `tune-cpu` when the user has explicitly specified one.



Comment at: clang/test/Driver/aarch64-mtune.c:34
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mcpu=thunderx 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=mcputhunderx

dmgreen wrote:
> My understanding is that -mcpu=cpu is the same as -march=something + 
> -mtune=cpu. Why would this case not add a -tune-cpu too? Is it because that 
> gets handled in llvm?
I thought this was pretty standard behaviour? We're already adding -target-cpu, 
which implies the arch + tuning, so isn't adding -tune-cpu redundant? Not sure 
what value "-target-cpu=thunderx -tune-cpu=thunderx" adds really.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D110258: [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-10-18 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 380325.
david-arm added a comment.

- Added something to the ReleaseNotes file.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/aarch64-mtune.c
  llvm/docs/ReleaseNotes.rst
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/unittests/Target/AArch64/InstSizes.cpp
  llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp

Index: llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
===
--- llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
+++ llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
@@ -26,6 +26,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/unittests/Target/AArch64/InstSizes.cpp
===
--- llvm/unittests/Target/AArch64/InstSizes.cpp
+++ llvm/unittests/Target/AArch64/InstSizes.cpp
@@ -29,6 +29,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
===
--- llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -355,10 +355,13 @@
 const AArch64Subtarget *
 AArch64TargetMachine::getSubtargetImpl(const Function &F) const {
   Attribute CPUAttr = F.getFnAttribute("target-cpu");
+  Attribute TuneAttr = F.getFnAttribute("tune-cpu");
   Attribute FSAttr = F.getFnAttribute("target-features");
 
   std::string CPU =
   CPUAttr.isValid() ? CPUAttr.getValueAsString().str() : TargetCPU;
+  std::string TuneCPU =
+  TuneAttr.isValid() ? TuneAttr.getValueAsString().str() : CPU;
   std::string FS =
   FSAttr.isValid() ? FSAttr.getValueAsString().str() : TargetFS;
 
@@ -399,6 +402,7 @@
   Key += "SVEMax";
   Key += std::to_string(MaxSVEVectorSize);
   Key += CPU;
+  Key += TuneCPU;
   Key += FS;
 
   auto &I = SubtargetMap[Key];
@@ -407,8 +411,8 @@
 // creation will depend on the TM and the code generation flags on the
 // function that reside in TargetOptions.
 resetTargetOptions(F);
-I = std::make_unique(TargetTriple, CPU, FS, *this,
-   isLittle, MinSVEVectorSize,
+I = std::make_unique(TargetTriple, CPU, TuneCPU, FS,
+   *this, isLittle, MinSVEVectorSize,
MaxSVEVectorSize);
   }
   return I.get();
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -297,7 +297,8 @@
   /// passed in feature string so that we can use initializer lists for
   /// subtarget initialization.
   AArch64Subtarget &initializeSubtargetDependencies(StringRef FS,
-StringRef CPUString);
+StringRef CPUString,
+StringRef TuneCPUString);
 
   /// Initialize properties based on the selected processor family.
   void initializeProperties();
@@ -306,8 +307,8 @@
   /// This constructor initializes the data members to match that
   /// of the specified triple.
   AArch64Subtarget(const Triple &TT, const std::string &CPU,
-   const std::string &FS, const TargetMachine &TM,
-   bool LittleEndian,
+   const std::string &TuneCPU, const std::string &FS,
+   const TargetMachine &TM, bool LittleEndian,
unsigned MinSVEVectorSizeInBitsOverride = 0,
unsigned MaxSVEVectorSizeInBitsOverride = 0);
 
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -50,15 +50,17 @@
 static cl::opt UseAA("aarch64-use-aa", cl::init(true),
cl::desc("Enable the use of AA during codegen."));
 
-AArch64Subtarget &
-AArch64Subtarget::initializeSubtargetDependencies(StringRe

[PATCH] D110258: [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-10-19 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 380649.
david-arm added a comment.

- Added release notes for both clang and llvm.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

Files:
  clang/docs/ReleaseNotes.rst
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/aarch64-mtune.c
  llvm/docs/ReleaseNotes.rst
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/unittests/Target/AArch64/InstSizes.cpp
  llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp

Index: llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
===
--- llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
+++ llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
@@ -26,6 +26,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/unittests/Target/AArch64/InstSizes.cpp
===
--- llvm/unittests/Target/AArch64/InstSizes.cpp
+++ llvm/unittests/Target/AArch64/InstSizes.cpp
@@ -29,6 +29,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
===
--- llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -355,10 +355,13 @@
 const AArch64Subtarget *
 AArch64TargetMachine::getSubtargetImpl(const Function &F) const {
   Attribute CPUAttr = F.getFnAttribute("target-cpu");
+  Attribute TuneAttr = F.getFnAttribute("tune-cpu");
   Attribute FSAttr = F.getFnAttribute("target-features");
 
   std::string CPU =
   CPUAttr.isValid() ? CPUAttr.getValueAsString().str() : TargetCPU;
+  std::string TuneCPU =
+  TuneAttr.isValid() ? TuneAttr.getValueAsString().str() : CPU;
   std::string FS =
   FSAttr.isValid() ? FSAttr.getValueAsString().str() : TargetFS;
 
@@ -399,6 +402,7 @@
   Key += "SVEMax";
   Key += std::to_string(MaxSVEVectorSize);
   Key += CPU;
+  Key += TuneCPU;
   Key += FS;
 
   auto &I = SubtargetMap[Key];
@@ -407,8 +411,8 @@
 // creation will depend on the TM and the code generation flags on the
 // function that reside in TargetOptions.
 resetTargetOptions(F);
-I = std::make_unique(TargetTriple, CPU, FS, *this,
-   isLittle, MinSVEVectorSize,
+I = std::make_unique(TargetTriple, CPU, TuneCPU, FS,
+   *this, isLittle, MinSVEVectorSize,
MaxSVEVectorSize);
   }
   return I.get();
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -297,7 +297,8 @@
   /// passed in feature string so that we can use initializer lists for
   /// subtarget initialization.
   AArch64Subtarget &initializeSubtargetDependencies(StringRef FS,
-StringRef CPUString);
+StringRef CPUString,
+StringRef TuneCPUString);
 
   /// Initialize properties based on the selected processor family.
   void initializeProperties();
@@ -306,8 +307,8 @@
   /// This constructor initializes the data members to match that
   /// of the specified triple.
   AArch64Subtarget(const Triple &TT, const std::string &CPU,
-   const std::string &FS, const TargetMachine &TM,
-   bool LittleEndian,
+   const std::string &TuneCPU, const std::string &FS,
+   const TargetMachine &TM, bool LittleEndian,
unsigned MinSVEVectorSizeInBitsOverride = 0,
unsigned MaxSVEVectorSizeInBitsOverride = 0);
 
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -50,15 +50,17 @@
 static cl::opt UseAA("aarch64-use-aa", cl::init(true),
cl::desc("Enable the use of AA during codegen."));
 
-AArch64Subtarget &
-AArch64Subtarget::initial

[PATCH] D110258: [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-10-19 Thread David Sherwood via Phabricator via cfe-commits

david-arm added inline comments.



Comment at: clang/test/Driver/aarch64-mtune.c:5
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=notune
+// notune-NOT: "-tune-cpu" "generic"

sdesmalen wrote:
> nit: Did you make these prefixes lower-case on purpose?
Sort of, in that I copied this test from x86-mtune.c, which has the same CHECK 
prefixes. :) I can capitalise them.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D110258: [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-10-19 Thread David Sherwood via Phabricator via cfe-commits

david-arm updated this revision to Diff 380663.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

Files:
  clang/docs/ReleaseNotes.rst
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/aarch64-mtune.c
  llvm/docs/ReleaseNotes.rst
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/unittests/Target/AArch64/InstSizes.cpp
  llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp

Index: llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
===
--- llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
+++ llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
@@ -26,6 +26,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/unittests/Target/AArch64/InstSizes.cpp
===
--- llvm/unittests/Target/AArch64/InstSizes.cpp
+++ llvm/unittests/Target/AArch64/InstSizes.cpp
@@ -29,6 +29,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
===
--- llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -355,10 +355,13 @@
 const AArch64Subtarget *
 AArch64TargetMachine::getSubtargetImpl(const Function &F) const {
   Attribute CPUAttr = F.getFnAttribute("target-cpu");
+  Attribute TuneAttr = F.getFnAttribute("tune-cpu");
   Attribute FSAttr = F.getFnAttribute("target-features");
 
   std::string CPU =
   CPUAttr.isValid() ? CPUAttr.getValueAsString().str() : TargetCPU;
+  std::string TuneCPU =
+  TuneAttr.isValid() ? TuneAttr.getValueAsString().str() : CPU;
   std::string FS =
   FSAttr.isValid() ? FSAttr.getValueAsString().str() : TargetFS;
 
@@ -399,6 +402,7 @@
   Key += "SVEMax";
   Key += std::to_string(MaxSVEVectorSize);
   Key += CPU;
+  Key += TuneCPU;
   Key += FS;
 
   auto &I = SubtargetMap[Key];
@@ -407,8 +411,8 @@
 // creation will depend on the TM and the code generation flags on the
 // function that reside in TargetOptions.
 resetTargetOptions(F);
-I = std::make_unique(TargetTriple, CPU, FS, *this,
-   isLittle, MinSVEVectorSize,
+I = std::make_unique(TargetTriple, CPU, TuneCPU, FS,
+   *this, isLittle, MinSVEVectorSize,
MaxSVEVectorSize);
   }
   return I.get();
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -297,7 +297,8 @@
   /// passed in feature string so that we can use initializer lists for
   /// subtarget initialization.
   AArch64Subtarget &initializeSubtargetDependencies(StringRef FS,
-StringRef CPUString);
+StringRef CPUString,
+StringRef TuneCPUString);
 
   /// Initialize properties based on the selected processor family.
   void initializeProperties();
@@ -306,8 +307,8 @@
   /// This constructor initializes the data members to match that
   /// of the specified triple.
   AArch64Subtarget(const Triple &TT, const std::string &CPU,
-   const std::string &FS, const TargetMachine &TM,
-   bool LittleEndian,
+   const std::string &TuneCPU, const std::string &FS,
+   const TargetMachine &TM, bool LittleEndian,
unsigned MinSVEVectorSizeInBitsOverride = 0,
unsigned MaxSVEVectorSizeInBitsOverride = 0);
 
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -50,15 +50,17 @@
 static cl::opt UseAA("aarch64-use-aa", cl::init(true),
cl::desc("Enable the use of AA during codegen."));
 
-AArch64Subtarget &
-AArch64Subtarget::initializeSubtargetDependencies(StringRef FS,
-

[PATCH] D110258: [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-10-19 Thread David Sherwood via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG607fb1bb8c91: [AArch64] Always add -tune-cpu argument to 
-cc1 driver (authored by david-arm).

Changed prior to commit:
  https://reviews.llvm.org/D110258?vs=380663&id=380681#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110258/new/

https://reviews.llvm.org/D110258

Files:
  clang/docs/ReleaseNotes.rst
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/aarch64-mtune.c
  llvm/docs/ReleaseNotes.rst
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/unittests/Target/AArch64/InstSizes.cpp
  llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp

Index: llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
===
--- llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
+++ llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp
@@ -26,6 +26,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/unittests/Target/AArch64/InstSizes.cpp
===
--- llvm/unittests/Target/AArch64/InstSizes.cpp
+++ llvm/unittests/Target/AArch64/InstSizes.cpp
@@ -29,6 +29,7 @@
 
 std::unique_ptr createInstrInfo(TargetMachine *TM) {
   AArch64Subtarget ST(TM->getTargetTriple(), std::string(TM->getTargetCPU()),
+  std::string(TM->getTargetCPU()),
   std::string(TM->getTargetFeatureString()), *TM,
   /* isLittle */ false);
   return std::make_unique(ST);
Index: llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
===
--- llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -355,10 +355,13 @@
 const AArch64Subtarget *
 AArch64TargetMachine::getSubtargetImpl(const Function &F) const {
   Attribute CPUAttr = F.getFnAttribute("target-cpu");
+  Attribute TuneAttr = F.getFnAttribute("tune-cpu");
   Attribute FSAttr = F.getFnAttribute("target-features");
 
   std::string CPU =
   CPUAttr.isValid() ? CPUAttr.getValueAsString().str() : TargetCPU;
+  std::string TuneCPU =
+  TuneAttr.isValid() ? TuneAttr.getValueAsString().str() : CPU;
   std::string FS =
   FSAttr.isValid() ? FSAttr.getValueAsString().str() : TargetFS;
 
@@ -399,6 +402,7 @@
   Key += "SVEMax";
   Key += std::to_string(MaxSVEVectorSize);
   Key += CPU;
+  Key += TuneCPU;
   Key += FS;
 
   auto &I = SubtargetMap[Key];
@@ -407,8 +411,8 @@
 // creation will depend on the TM and the code generation flags on the
 // function that reside in TargetOptions.
 resetTargetOptions(F);
-I = std::make_unique(TargetTriple, CPU, FS, *this,
-   isLittle, MinSVEVectorSize,
+I = std::make_unique(TargetTriple, CPU, TuneCPU, FS,
+   *this, isLittle, MinSVEVectorSize,
MaxSVEVectorSize);
   }
   return I.get();
Index: llvm/lib/Target/AArch64/AArch64Subtarget.h
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -298,7 +298,8 @@
   /// passed in feature string so that we can use initializer lists for
   /// subtarget initialization.
   AArch64Subtarget &initializeSubtargetDependencies(StringRef FS,
-StringRef CPUString);
+StringRef CPUString,
+StringRef TuneCPUString);
 
   /// Initialize properties based on the selected processor family.
   void initializeProperties();
@@ -307,8 +308,8 @@
   /// This constructor initializes the data members to match that
   /// of the specified triple.
   AArch64Subtarget(const Triple &TT, const std::string &CPU,
-   const std::string &FS, const TargetMachine &TM,
-   bool LittleEndian,
+   const std::string &TuneCPU, const std::string &FS,
+   const TargetMachine &TM, bool LittleEndian,
unsigned MinSVEVectorSizeInBitsOverride = 0,
unsigned MaxSVEVectorSizeInBitsOverride = 0);
 
Index: llvm/lib/Target/AArch64/AArch64Subtarget.cpp
===
--- llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ llvm

[PATCH] D108138: [SimplifyCFG] Remove switch statements before vectorization

2021-08-26 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

In D108138#2967100 , @lebedev.ri 
wrote:

> IMO anything other than enhancing LV is wrong.

Hi @lebedev.ri  I personally disagree here. Adding support to LV for this is 
significantly more work (and IMO unnecessary) because there are cases when LV 
has to handle a lot more than just the obvious flattened vectorisation case 
using vector comparisons and select instructions. We will also need to add 
support for vectorisation factors of 1 (with interleaving) and cases where 
VF>1,but we have to scalarise the switch statement. These latter two cases 
require basically doing exactly the same thing as @kmclaughlin's patch does 
here, i.e. unswitching the switch statement into compares/branches and new 
blocks. It seems far simpler to have a small pass that runs prior to the 
vectoriser (when enabled) that unswitches.

Not sure what others think here?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108138/new/

https://reviews.llvm.org/D108138

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D108138: [SimplifyCFG] Remove switch statements before vectorization

2021-08-26 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

In D108138#2967133 , @lebedev.ri 
wrote:

> How is it conceptually different to break apart IR in LV itself, or do the 
> same in a special pass just before that?
> If we want to go this road, we need to completely make `switch`es 
> illegal/non-canonical before LV.

If I understand correctly you're suggesting that LV makes a scalar 
transformation prior to legalisation checks/cost model analysis? If that's the 
case then I don't think we can do that as this is beyond LV's remit and I don't 
see how that's any different to making a scalar transformation in a separate 
pass prior to LV.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108138/new/

https://reviews.llvm.org/D108138

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107764: [OpenMP][OpenMPIRBuilder] Implement loop unrolling.

2021-09-02 Thread David Sherwood via Phabricator via cfe-commits

david-arm added a comment.

Hi, this has broken the build for me too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107764/new/

https://reviews.llvm.org/D107764

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

99 matches

Mail list logo