[PATCH] D151433: [Clang][SVE2.1] Add builtins for Multi-vector load and store

2023-10-19 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin accepted this revision.
kmclaughlin added a comment.
This revision is now accepted and ready to land.

Thanks @CarolineConcatto, LGTM!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151433/new/

https://reviews.llvm.org/D151433

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D151433: [Clang][SVE2.1] Add builtins for Multi-vector load and store

2023-10-18 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added inline comments.



Comment at: clang/include/clang/Basic/arm_sve.td:1920
+
+def SVST1B_X2 : MInst<"svst1[_{2}_x2]", "v}p2", "cUc", [IsStructStore,], 
MemEltTyDefault, "aarch64_sve_st1_pn_x2">;
+def SVST1H_X2 : MInst<"svst1[_{2}_x2]", "v}p2", "sUshb", [IsStructStore,], 
MemEltTyDefault, "aarch64_sve_st1_pn_x2">;

Please can you remove the extra comma from these lists?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151433/new/

https://reviews.llvm.org/D151433

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D151307: [Clang][SVE2.1] Add svwhile (predicate-as-counter) builtins

2023-10-18 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin accepted this revision.
kmclaughlin added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151307/new/

https://reviews.llvm.org/D151307

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D151197: [Clang][SVE2p1] Add svpsel builtins

2023-10-18 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin accepted this revision.
kmclaughlin added a comment.
This revision is now accepted and ready to land.

Thank you for updating this @CarolineConcatto, LGTM




Comment at: clang/include/clang/Basic/arm_sve.td:1886
 
+
+

nit: extra whitespace


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151197/new/

https://reviews.llvm.org/D151197

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128648: [Clang][AArch64][SME] Add vector read/write (mova) intrinsics

2023-03-15 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin accepted this revision.
kmclaughlin added a comment.
This revision is now accepted and ready to land.

Thank you @bryanpkc, this LGTM




Comment at: clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_read.c:2
+// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme 
-target-feature +sve -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s 
-check-prefixes=CHECK,CHECK-C
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme 
-target-feature +sve -S -O1 -Werror -emit-llvm -o - -x c++ %s | FileCheck %s 
-check-prefixes=CHECK,CHECK-CXX

bryanpkc wrote:
> kmclaughlin wrote:
> > I think `-target-feature +sve` can be removed from this test and 
> > `acle_sme_write.c`
> Doing that will cause errors like these:
> ```
> error: SVE vector type 'svbool_t' (aka '__SVBool_t') cannot be used in a 
> target without sve
> ```
> As I have explained in [D127910](https://reviews.llvm.org/D127910#4137844), 
> `-target-feature +sme` does not imply `-target-feature +sve`. But `-march=` 
> processing will work as expected when D142702 lands.
Apologies for leaving the same comment, I missed this on the previous patch


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128648/new/

https://reviews.llvm.org/D128648

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128648: [Clang][AArch64][SME] Add vector read/write (mova) intrinsics

2023-02-28 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added a comment.

Hi @bryanpkc, thank you for updating this patch & applying the previous review 
comments here too.
I just have a couple of minor suggestions:




Comment at: clang/include/clang/Basic/arm_sme.td:103
+def NAME # _H : SInst<"svwrite_hor_" # n_suffix # "[_{d}]", "vimiPd", t, 
MergeOp1,
+  "aarch64_sme_write" # !cond(!eq(n_suffix, "za128") : 
"q", true: "") # "_horiz",
+  [IsWrite, IsStreaming, IsSharedZA], ch>;

This is only a suggestion, but would it make the multiclasses simpler to just 
pass in either `"q"` or `""` depending on the instruction, and append this to 
`aarch64_sme_read/write`?



Comment at: clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_read.c:2
+// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme 
-target-feature +sve -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s 
-check-prefixes=CHECK,CHECK-C
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme 
-target-feature +sve -S -O1 -Werror -emit-llvm -o - -x c++ %s | FileCheck %s 
-check-prefixes=CHECK,CHECK-CXX

I think `-target-feature +sve` can be removed from this test and 
`acle_sme_write.c`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128648/new/

https://reviews.llvm.org/D128648

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D127910: [Clang][AArch64][SME] Add vector load/store (ld1/st1) intrinsics

2023-02-23 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin accepted this revision.
kmclaughlin added a comment.
This revision is now accepted and ready to land.

Thank you for checking and removing EltTypeBool128. I think you have addressed 
all of the other comments on this patch too, so it looks good to me!

Please can you update the commit message before landing, to change the 
reference to `arm_sme_experimental.h` with 
`arm_sme_draft_spec_subject_to_change.h`?




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:8874
   case SVETypeFlags::EltTyBool64:
+  case SVETypeFlags::EltTyBool128:
 return Builder.getInt1Ty();

bryanpkc wrote:
> kmclaughlin wrote:
> > Is it necessary to add an `EltTypeBool128`? I think the 
> > EmitSVEPredicateCast call in EmitSMELd1St1 is creating a vector based on 
> > the memory element type and not the predicate type?
> You are right, `EltTypeBool128` is not used right now; I can remove it. When 
> we started working on the pre-ACLE intrinsics in our downstream compiler, we 
> were planning to add something like `svptrue_b128`, even though the hardware 
> `PTRUE` instruction doesn't support the Q size specifier. It is still not 
> clear to me how the ACLE wants users to create a predicate vector for a call 
> to a `_za128` intrinsic.
I think the reason why `svptrue_b128` wasn't added is because there is no 
`ptrue.q` instruction as you say, and it is possible to use any of the other 
forms (.b, .h, etc) with the 128 bit instructions since they only require every 
16th lane.
It would be possible for the user to create a 128 bit predicate with either 
svunpk or svzip of svbool_t with pfalse where necessary. Otherwise using the 
other forms, e.g `svptrue_b64()`, will achieve the same result and require 
fewer instructions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127910/new/

https://reviews.llvm.org/D127910

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D127910: [Clang][AArch64][SME] Add vector load/store (ld1/st1) intrinsics

2023-02-21 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:8874
   case SVETypeFlags::EltTyBool64:
+  case SVETypeFlags::EltTyBool128:
 return Builder.getInt1Ty();

Is it necessary to add an `EltTypeBool128`? I think the EmitSVEPredicateCast 
call in EmitSMELd1St1 is creating a vector based on the memory element type and 
not the predicate type?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127910/new/

https://reviews.llvm.org/D127910

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70253: [AArch64][SVE2] Implement remaining SVE2 floating-point intrinsics

2022-12-16 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added inline comments.



Comment at: 
llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll:31
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.flogb.nxv2f64( %a,
+  %pg,

Allen wrote:
> hi,  kmclaughlin: 
>   Sorry for the naive question:
>   flogb is an unary instruction showed in assemble . Why shall we need %a as 
> an **input** operand in the instrinsic? can it be similar with
> ```
> %a = call  @llvm.aarch64.sve.flogb.nxv2f64( i1> %pg, %b)
> ```
Hi @Allen,
The first input to this intrinsic is the passthru, which contains the values 
used for inactive lanes of the predicate `%pg`. The inactive lanes can be set 
to zero, merged with separate vector or set to unknown.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70253/new/

https://reviews.llvm.org/D70253

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108138: [WIP] Remove switch statements before vectorization

2021-10-08 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin abandoned this revision.
kmclaughlin added a comment.

I just wanted to give an update on this patch, which I'm abandoning for the 
time being:

@lebedev.ri raised some good questions about the approach taken and whether the 
additional compile time spent would be worth the additional opportunities for 
vectorisation. After posting the last update, I collected some benchmark 
results using Spec2017 to get a better understanding of the impact of these 
changes and found that several benchmarks showed performance regressions for 
fixed-width.

The biggest outliers (in terms of percentage runtime change) were:
520.omnetpp_r: -3.00%
500.perlbench_r: -2.00%
502.gcc_r: -1.52%

I also collected the results after adding in a threshold number of cases to be 
unswitched (set to 4), as was included in the first draft of this patch. This 
also showed some regressions in the benchmarks run and no significant 
improvements. Both sets of results showed increased compile times for many 
benchmarks.

The same benchmarks as above, with the threshold of 4 set:
520.omnetpp_r: -3.46%
500.perlbench_r: -1.20%
502.gcc_r: -1.22%

Results were collected on a Neoverse-N1 machine. Given that these results 
indicate this isn't the best approach to take, I'm abandoning the patch for 
now. When this is picked up in future, it will likely be better to follow 
either the suggestion to prevent canonicalisation of branches & compares into 
switch statements (under a given number of cases) in the first place, or to 
teach the loop vectoriser to recognise switches.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108138/new/

https://reviews.llvm.org/D108138

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108138: [WIP] Remove switch statements before vectorization

2021-09-15 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added a comment.

Hi all, I've updated this to take a different approach - the new patch runs 
LowerSwitch just before the vectoriser, where it will only consider simple 
switches which are part of a loop. For these switches, the pass will create a 
series of branches and compares which SimplifyCFG is able replace with a switch 
again later if the vectoriser did not make any changes.

I'm happy to split this patch up to make it easier to review, but I thought I 
would first post the changes I have so far to gather some thoughts on whether 
this is a better direction than before? Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108138/new/

https://reviews.llvm.org/D108138

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108138: [WIP] Remove switch statements before vectorization

2021-09-15 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 372706.
kmclaughlin retitled this revision from "[SimplifyCFG] Remove switch statements 
before vectorization" to "[WIP] Remove switch statements before vectorization".
kmclaughlin edited the summary of this revision.
kmclaughlin added a comment.
Herald added subscribers: kerbowa, nhaehnle, jvesely.

- Removed changes to SimplifyCFG and instead run LowerSwitch before 
vectorisation.
- Added `SimpleSwitchConvert` to LowerSwitch which is used if the pass is run 
before vectorisation - this only considers simple switches (where each 
destination block is unique) which are also part of a loop.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108138/new/

https://reviews.llvm.org/D108138

Files:
  clang/test/Frontend/optimization-remark-analysis.c
  llvm/include/llvm/Transforms/Utils/LowerSwitch.h
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
  llvm/lib/Transforms/Utils/FixIrreducible.cpp
  llvm/lib/Transforms/Utils/LowerSwitch.cpp
  llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
  llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
  llvm/test/Other/new-pm-defaults.ll
  llvm/test/Other/new-pm-lto-defaults.ll
  llvm/test/Other/new-pm-thinlto-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-remove-switches.ll
  llvm/test/Transforms/LoopVectorize/remove-switches.ll
  llvm/test/Transforms/LowerSwitch/simple-switches.ll
  llvm/test/Transforms/StructurizeCFG/workarounds/needs-fr-ule.ll

Index: llvm/test/Transforms/StructurizeCFG/workarounds/needs-fr-ule.ll
===
--- llvm/test/Transforms/StructurizeCFG/workarounds/needs-fr-ule.ll
+++ llvm/test/Transforms/StructurizeCFG/workarounds/needs-fr-ule.ll
@@ -13,32 +13,32 @@
 ; CHECK-NEXT:[[PRED11_INV:%.*]] = xor i1 [[PRED11:%.*]], true
 ; CHECK-NEXT:[[PRED12_INV:%.*]] = xor i1 [[PRED12:%.*]], true
 ; CHECK-NEXT:[[PRED13_INV:%.*]] = xor i1 [[PRED13:%.*]], true
-; CHECK-NEXT:br i1 [[PRED0_INV]], label [[IF_THEN:%.*]], label [[FLOW19:%.*]]
-; CHECK:   Flow19:
+; CHECK-NEXT:br i1 [[PRED0_INV]], label [[IF_THEN:%.*]], label [[FLOW18:%.*]]
+; CHECK:   Flow18:
 ; CHECK-NEXT:[[TMP0:%.*]] = phi i1 [ false, [[FLOW3:%.*]] ], [ true, [[ENTRY:%.*]] ]
-; CHECK-NEXT:br i1 [[TMP0]], label [[IF_END:%.*]], label [[FLOW20:%.*]]
+; CHECK-NEXT:br i1 [[TMP0]], label [[IF_END:%.*]], label [[FLOW19:%.*]]
 ; CHECK:   if.end:
-; CHECK-NEXT:br i1 [[PRED1_INV]], label [[IF_ELSE:%.*]], label [[FLOW18:%.*]]
-; CHECK:   Flow18:
+; CHECK-NEXT:br i1 [[PRED1_INV]], label [[IF_ELSE:%.*]], label [[FLOW17:%.*]]
+; CHECK:   Flow17:
 ; CHECK-NEXT:[[TMP1:%.*]] = phi i1 [ false, [[IF_ELSE]] ], [ true, [[IF_END]] ]
 ; CHECK-NEXT:br i1 [[TMP1]], label [[IF_THEN7:%.*]], label [[IF_END16:%.*]]
 ; CHECK:   if.then7:
 ; CHECK-NEXT:br label [[IF_END16]]
 ; CHECK:   if.else:
-; CHECK-NEXT:br label [[FLOW18]]
-; CHECK:   Flow20:
+; CHECK-NEXT:br label [[FLOW17]]
+; CHECK:   Flow19:
 ; CHECK-NEXT:br label [[EXIT:%.*]]
 ; CHECK:   if.end16:
-; CHECK-NEXT:br i1 [[PRED2_INV]], label [[IF_THEN39:%.*]], label [[FLOW16:%.*]]
-; CHECK:   Flow16:
+; CHECK-NEXT:br i1 [[PRED2_INV]], label [[IF_THEN39:%.*]], label [[FLOW15:%.*]]
+; CHECK:   Flow15:
 ; CHECK-NEXT:[[TMP2:%.*]] = phi i1 [ false, [[FLOW5:%.*]] ], [ true, [[IF_END16]] ]
-; CHECK-NEXT:br i1 [[TMP2]], label [[WHILE_COND_PREHEADER:%.*]], label [[FLOW17:%.*]]
+; CHECK-NEXT:br i1 [[TMP2]], label [[WHILE_COND_PREHEADER:%.*]], label [[FLOW16:%.*]]
 ; CHECK:   while.cond.preheader:
 ; CHECK-NEXT:br label [[WHILE_COND:%.*]]
-; CHECK:   Flow17:
-; CHECK-NEXT:br label [[FLOW20]]
+; CHECK:   Flow16:
+; CHECK-NEXT:br label [[FLOW19]]
 ; CHECK:   while.cond:
-; CHECK-NEXT:br i1 [[PRED3_INV]], label [[LOR_RHS:%.*]], label [[FLOW12:%.*]]
+; CHECK-NEXT:br i1 [[PRED3_INV]], label [[LOR_RHS:%.*]], label [[FLOW11:%.*]]
 ; CHECK:   Flow7:
 ; CHECK-NEXT:[[TMP3:%.*]] = phi i1 [ [[PRED7:%.*]], [[COND_END61:%.*]] ], [ false, [[IRR_GUARD:%.*]] ]
 ; CHECK-NEXT:[[TMP4:%.*]] = phi i1 [ false, [[COND_END61]] ], [ true, [[IRR_GUARD]] ]
@@ -46,30 +46,30 @@
 ; CHECK:   cond.true49:
 ; CHECK-NEXT:br label [[FLOW8]]
 ; CHECK:   Flow8:
-; CHECK-NEXT:[[TMP5:%.*]] = phi i1 [ false, [[COND_TRUE49]] ], [ true, [[FLOW7:%.*]] ]
-; CHECK-NEXT:[[TMP6:%.*]] = phi i1 [ [[PRED4_INV]], [[COND_TRUE49]] ], [ [[TMP3]], [[FLOW7]] ]
-; CHECK-NEXT:br i1 [[TMP6]], label [[WHILE_BODY63:%.*]], label [[FLOW9:%.*]]
+; CHECK-NEXT:[[TMP5:%.*]] = phi i1 [ true, [[COND_TRUE49]] ], [ false, [[FLOW7:%.*]] ]
+; CHECK-NEXT:[[TMP6:%.*]] = phi i1 [ false, [[COND_TRUE49]] ], [ true, [[FLOW7]] ]
+; CHECK-NEXT:[[TMP7:%.*]] = phi i1 [ [[PRED4_INV]], [[COND_TRUE49]] ], [ 

[PATCH] D108138: [SimplifyCFG] Remove switch statements before vectorization

2021-08-26 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added a comment.

Thanks all for the suggestions on this patch :)

I had a look at the LowerSwitch pass as suggested by @junparser, and I did find 
that running it before vectorisation transforms the switch and allows the same 
loops to be vectorised. However, I did find that if the loop is not vectorised 
then the switch is not created again later by SimplifyCFG (possibly because the 
pass is also arbitrarily splitting cases into ranges and creating multiple 
branches to the default block?). Tests such as 
Transforms/PhaseOrdering/X86/simplifycfg-late.ll then fail, which attempts to 
convert a switch statement into a lookup table.

For example, running the @switch_no_vectorize test (from remove-switches.ll) 
with -lowerswitch results in:

  for.body: ; preds = %L3, %entry
%i = phi i64 [ %inc, %L3 ], [ 0, %entry ]
%sum.033 = phi float [ %conv20, %L3 ], [ 2.00e+00, %entry ]
%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
%0 = load i32, i32* %arrayidx, align 4
br label %NodeBlock
  
  NodeBlock:; preds = %for.body
%Pivot = icmp slt i32 %0, 3
br i1 %Pivot, label %LeafBlock, label %LeafBlock1
  
  LeafBlock1:   ; preds = %NodeBlock
%SwitchLeaf2 = icmp eq i32 %0, 3
br i1 %SwitchLeaf2, label %L3, label %NewDefault
  
  LeafBlock:; preds = %NodeBlock
%SwitchLeaf = icmp eq i32 %0, 2
br i1 %SwitchLeaf, label %L2, label %NewDefault
  
  NewDefault:   ; preds = %LeafBlock1, 
%LeafBlock
br label %L1

I also found that any weights assigned to the switch statement are ignored when 
creating the new branches in LowerSwitch.

I'm not sure what the best approach to this is - I could try to change 
LowerSwitch to create branches which SimplifyCFG will be able to recognise and 
replace with a switch, or try to change SimplifyCFG to recognise this pattern 
of compares & branches. Alternatively, the changes in this patch could be used 
as the basis for a new pass which runs before the vectoriser. I wondered if 
anyone has any thoughts or preferences on which would be the best option here?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108138/new/

https://reviews.llvm.org/D108138

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108138: [SimplifyCFG] Remove switch statements before vectorization

2021-08-16 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: david-arm, fhahn, dmgreen, craig.topper, 
lebedev.ri.
Herald added subscribers: ctetreau, ormris, wenlei, steven_wu, hiraditya, 
kristof.beyls.
kmclaughlin requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

This patch adds a new function, TurnSmallSwitchIntoICmps, to SimplifyCFG
which attempts to replace small switch statements with a series of conditional
branches and compares. The purpose of this is to allow vectorization of loops
which is not possible at the moment due to the presence of switch statements.
We now run SimplifyCFG to unswitch just before the vectorizer; if we didn't
vectorize the loop then the switch is added back afterwards.

Two new options have been added, the first is `-remove-switch-blocks` which
enables/disables this feature and is on by default. The second is
`-switch-removal-threshold`, which sets the threshold number of switch cases
which we will convert to branches & compares, above which we will not attempt
to convert the switch. If unspecified, the default value used here initially is 
4.

The following tests have been added:

- SimplifyCFG/remove-switches.ll: Tests the changes to SimplifyCFG to replace 
switch statments & ensures branch weights are updated correctly if provided.
- LoopVectorize/AArch64/sve-remove-switches.ll: Tests that we can vectorize 
loops with switch statements with scalable vectors. Also tests that where 
vectorization is not possible, that the switch statement is created again.
- LoopVectorize/remove-switches.ll: Ensures that we do not vectorize the loop 
if the target doesn't support masked loads & stores, where the cost would be 
too high.

Patch originally by David Sherwood


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D108138

Files:
  clang/test/Frontend/optimization-remark-analysis.c
  llvm/include/llvm/Transforms/Utils/SimplifyCFGOptions.h
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
  llvm/lib/Transforms/Utils/SimplifyCFG.cpp
  llvm/test/Other/new-pm-defaults.ll
  llvm/test/Other/new-pm-lto-defaults.ll
  llvm/test/Other/new-pm-thinlto-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
  llvm/test/Transforms/LoopVectorize/AArch64/sve-remove-switches.ll
  llvm/test/Transforms/LoopVectorize/remove-switches.ll
  llvm/test/Transforms/SimplifyCFG/nomerge.ll
  llvm/test/Transforms/SimplifyCFG/remove-switches.ll

Index: llvm/test/Transforms/SimplifyCFG/remove-switches.ll
===
--- /dev/null
+++ llvm/test/Transforms/SimplifyCFG/remove-switches.ll
@@ -0,0 +1,142 @@
+; RUN: opt < %s -simplifycfg -switch-removal-threshold=4 -S | FileCheck %s
+
+define void @unswitch(i32* nocapture %a, i32* nocapture readonly %b, i32* nocapture readonly %c, i64 %N){
+; CHECK-LABEL: @unswitch(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[FOR_BODY:%.*]]
+; CHECK:   for.body:
+; CHECK-NEXT:[[I:%.*]] = phi i64 [ [[INC:%.*]], [[L4:%.*]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[I]]
+; CHECK-NEXT:[[TMP0:%.*]] = load i32, i32* [[ARRAYIDX]], align 4
+; CHECK-NEXT:[[SWITCH:%.*]] = icmp eq i32 [[TMP0]], 4
+; CHECK-NEXT:br i1 [[SWITCH]], label [[L4]], label [[FOR_BODY_SWITCH:%.*]], !prof !0
+; CHECK:   for.body.switch:
+; CHECK-NEXT:[[SWITCH1:%.*]] = icmp eq i32 [[TMP0]], 2
+; CHECK-NEXT:br i1 [[SWITCH1]], label [[L2:%.*]], label [[FOR_BODY_SWITCH2:%.*]], !prof !1
+; CHECK:   for.body.switch2:
+; CHECK-NEXT:[[SWITCH3:%.*]] = icmp eq i32 [[TMP0]], 3
+; CHECK-NEXT:br i1 [[SWITCH3]], label [[L3:%.*]], label [[FOR_BODY_SWITCH4:%.*]], !prof !2
+; CHECK:   for.body.switch4:
+; CHECK-NEXT:[[ARRAYIDX5:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 [[I]]
+; CHECK-NEXT:[[TMP1:%.*]] = load i32, i32* [[ARRAYIDX5]], align 4
+; CHECK-NEXT:[[MUL:%.*]] = mul nsw i32 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:[[ADD:%.*]] = add nsw i32 [[MUL]], [[TMP0]]
+; CHECK-NEXT:store i32 [[ADD]], i32* [[ARRAYIDX]], align 4
+; CHECK-NEXT:br label [[L2]]
+entry:
+  br label %for.body
+
+for.body:
+  %i = phi i64 [ %inc, %L4 ], [ 0, %entry ]
+  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
+  %0 = load i32, i32* %arrayidx
+  switch i32 %0, label %L1 [
+  i32 4, label %L4
+  i32 2, label %L2
+  i32 3, label %L3
+  ], !prof !0
+
+L1:
+  %arrayidx5 = getelementptr inbounds i32, i32* %b, i64 %i
+  %1 = load i32, i32* %arrayidx5
+  %mul = mul nsw i32 %1, %0
+  %add = add nsw i32 %mul, %0
+  store i32 %add, i32* %arrayidx
+  br label %L2
+
+L2:
+  %2 = phi i32 [ %0, %for.body ], [ %add, %L1 ]
+  %arrayidx7 = getelementptr inbounds i32, i32* %b, i64 %i
+  %3 = load i32, i32* %arrayidx7, align 4
+  %mul9 = mul nsw i32 %3, %3
+  %add11 

[PATCH] D100294: [AArch64][SVE] Fix dup/dupq intrinsics for C++.

2021-04-12 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin accepted this revision.
kmclaughlin added a comment.
This revision is now accepted and ready to land.

LGTM!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100294/new/

https://reviews.llvm.org/D100294

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82943: [SVE] Add more warnings checks to clang and LLVM SVE tests

2020-07-06 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin accepted this revision.
kmclaughlin added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82943/new/

https://reviews.llvm.org/D82943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82448: [AArch64][SVE] Add bfloat16 support to store intrinsics

2020-06-26 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGedcfef8fee13: [AArch64][SVE] Add bfloat16 support to store 
intrinsics (authored by kmclaughlin).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82448/new/

https://reviews.llvm.org/D82448

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_stnt1-bfloat.c
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll
  llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
  llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll
  
llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
  
llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll

Index: llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
===
--- llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
+++ llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
@@ -94,6 +94,20 @@
   ret void
 }
 
+define void @test_masked_ldst_sv8bf16(bfloat* %base,  %mask, i64 %offset) nounwind #0 {
+; CHECK-LABEL: test_masked_ldst_sv8bf16:
+; CHECK-NEXT: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, x1, lsl #1]
+; CHECK-NEXT: stnt1h { z[[DATA]].h }, p0, [x0, x1, lsl #1]
+; CHECK-NEXT: ret
+  %gep = getelementptr bfloat, bfloat* %base, i64 %offset
+  %data = call  @llvm.aarch64.sve.ldnt1.nxv8bf16( %mask,
+  bfloat* %gep)
+  call void @llvm.aarch64.sve.stnt1.nxv8bf16( %data,
+  %mask,
+ bfloat* %gep)
+  ret void
+}
+
 ; 16-lane non-temporal load/stores.
 
 define void @test_masked_ldst_sv16i8(i8* %base,  %mask, i64 %offset) nounwind {
@@ -121,6 +135,7 @@
 ; 8-element non-temporal loads.
 declare  @llvm.aarch64.sve.ldnt1.nxv8i16(, i16*)
 declare  @llvm.aarch64.sve.ldnt1.nxv8f16(, half*)
+declare  @llvm.aarch64.sve.ldnt1.nxv8bf16(, bfloat*)
 
 ; 16-element non-temporal loads.
 declare  @llvm.aarch64.sve.ldnt1.nxv16i8(, i8*)
@@ -128,14 +143,18 @@
 ; 2-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv2i64(, , i64*)
 declare void @llvm.aarch64.sve.stnt1.nxv2f64(, , double*)
-  
-; 4-element non-temporal stores.
+
+; 4-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv4i32(, , i32*)
 declare void @llvm.aarch64.sve.stnt1.nxv4f32(, , float*)
-  
-; 8-element non-temporal stores.
+
+; 8-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv8i16(, , i16*)
 declare void @llvm.aarch64.sve.stnt1.nxv8f16(, , half*)
+declare void @llvm.aarch64.sve.stnt1.nxv8bf16(, , bfloat*)
 
 ; 16-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv16i8(, , i8*)
+
+; +bf16 is required for the bfloat version.
+attributes #0 = { "target-features"="+sve,+bf16" }
Index: llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
===
--- llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
+++ llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
@@ -139,6 +139,23 @@
   ret void
 }
 
+define void @test_masked_ldst_sv8bf16( * %base,  %mask) nounwind #0 {
+; CHECK-LABEL: test_masked_ldst_sv8bf16:
+; CHECK-NEXT: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, #-1, mul vl]
+; CHECK-NEXT: stnt1h { z[[DATA]].h }, p0, [x0, #2, mul vl]
+; CHECK-NEXT: ret
+  %base_load = getelementptr , * %base, i64 -1
+  %base_load_bc = bitcast * %base_load to bfloat*
+  %data = call  @llvm.aarch64.sve.ldnt1.nxv8bf16( %mask,
+  bfloat* %base_load_bc)
+  %base_store = getelementptr ,  * %base, i64 2
+  %base_store_bc = bitcast * %base_store to bfloat*
+  call void @llvm.aarch64.sve.stnt1.nxv8bf16( %data,
+  %mask,
+ bfloat* %base_store_bc)
+  ret void
+}
+
 ; 16-lane non-temporal load/stores.
 
 define void @test_masked_ldst_sv16i8( * %base,  %mask) nounwind {
@@ -169,6 +186,7 @@
 ; 

[PATCH] D82448: [AArch64][SVE] Add bfloat16 support to store intrinsics

2020-06-25 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 273331.
kmclaughlin added a comment.

- Added HasSVE to Predicates in AArch64SVEInstrInfo.td
- Removed unnecessary indentation changes in AArch64SVEInstrInfo.td
- Removed hasBF16 variable from performST1Combine/performSTNT1Combine


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82448/new/

https://reviews.llvm.org/D82448

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_stnt1-bfloat.c
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll
  llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
  llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll
  
llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
  
llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll

Index: llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
===
--- llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
+++ llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
@@ -94,6 +94,20 @@
   ret void
 }
 
+define void @test_masked_ldst_sv8bf16(bfloat* %base,  %mask, i64 %offset) nounwind #0 {
+; CHECK-LABEL: test_masked_ldst_sv8bf16:
+; CHECK-NEXT: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, x1, lsl #1]
+; CHECK-NEXT: stnt1h { z[[DATA]].h }, p0, [x0, x1, lsl #1]
+; CHECK-NEXT: ret
+  %gep = getelementptr bfloat, bfloat* %base, i64 %offset
+  %data = call  @llvm.aarch64.sve.ldnt1.nxv8bf16( %mask,
+  bfloat* %gep)
+  call void @llvm.aarch64.sve.stnt1.nxv8bf16( %data,
+  %mask,
+ bfloat* %gep)
+  ret void
+}
+
 ; 16-lane non-temporal load/stores.
 
 define void @test_masked_ldst_sv16i8(i8* %base,  %mask, i64 %offset) nounwind {
@@ -121,6 +135,7 @@
 ; 8-element non-temporal loads.
 declare  @llvm.aarch64.sve.ldnt1.nxv8i16(, i16*)
 declare  @llvm.aarch64.sve.ldnt1.nxv8f16(, half*)
+declare  @llvm.aarch64.sve.ldnt1.nxv8bf16(, bfloat*)
 
 ; 16-element non-temporal loads.
 declare  @llvm.aarch64.sve.ldnt1.nxv16i8(, i8*)
@@ -128,14 +143,18 @@
 ; 2-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv2i64(, , i64*)
 declare void @llvm.aarch64.sve.stnt1.nxv2f64(, , double*)
-  
-; 4-element non-temporal stores.
+
+; 4-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv4i32(, , i32*)
 declare void @llvm.aarch64.sve.stnt1.nxv4f32(, , float*)
-  
-; 8-element non-temporal stores.
+
+; 8-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv8i16(, , i16*)
 declare void @llvm.aarch64.sve.stnt1.nxv8f16(, , half*)
+declare void @llvm.aarch64.sve.stnt1.nxv8bf16(, , bfloat*)
 
 ; 16-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv16i8(, , i8*)
+
+; +bf16 is required for the bfloat version.
+attributes #0 = { "target-features"="+sve,+bf16" }
Index: llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
===
--- llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
+++ llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
@@ -139,6 +139,23 @@
   ret void
 }
 
+define void @test_masked_ldst_sv8bf16( * %base,  %mask) nounwind #0 {
+; CHECK-LABEL: test_masked_ldst_sv8bf16:
+; CHECK-NEXT: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, #-1, mul vl]
+; CHECK-NEXT: stnt1h { z[[DATA]].h }, p0, [x0, #2, mul vl]
+; CHECK-NEXT: ret
+  %base_load = getelementptr , * %base, i64 -1
+  %base_load_bc = bitcast * %base_load to bfloat*
+  %data = call  @llvm.aarch64.sve.ldnt1.nxv8bf16( %mask,
+  bfloat* %base_load_bc)
+  %base_store = getelementptr ,  * %base, i64 2
+  %base_store_bc = bitcast * %base_store to bfloat*
+  call void @llvm.aarch64.sve.stnt1.nxv8bf16( %data,
+  %mask,
+ bfloat* %base_store_bc)
+  ret void
+}
+
 ; 16-lane non-temporal load/stores.
 
 define void @test_masked_ldst_sv16i8( * %base,  

[PATCH] D82448: [AArch64][SVE] Add bfloat16 support to store intrinsics

2020-06-25 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin marked 4 inline comments as done.
kmclaughlin added a comment.

Thanks for reviewing this again, @fpetrogalli!




Comment at: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st1-bfloat.c:4
+// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC 
-triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 
-fallow-half-arguments-and-returns -fsyntax-only -verify 
-verify-ignore-unexpected=error -verify-ignore-unexpected=note %s
+
+#include 

fpetrogalli wrote:
> Nit: is it worth adding the `ASM-NOT: warning` check that is used in other 
> tests? Of course, only if it doesn't fail, for in such case we would have to 
> address the problem in a separate patch.
> 
> (Same for all the new C tests added in this patch).
Adding the check doesn't fail, but I will add these checks to the load & store 
tests in a separate patch


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82448/new/

https://reviews.llvm.org/D82448



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82448: [AArch64][SVE] Add bfloat16 support to store intrinsics

2020-06-24 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 273092.
kmclaughlin added a comment.

- Added [HasBF16] predicate to new store pattern in AArch64SVEInstrInfo.td
- Check hasBF16() is true for bfloat16 types in 
performST1Combine/performSTNT1Combine
- Added bfloat16 test to sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82448/new/

https://reviews.llvm.org/D82448

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_stnt1-bfloat.c
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll
  llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
  llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll
  
llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
  
llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll

Index: llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
===
--- llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
+++ llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
@@ -1,4 +1,4 @@
-; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve --asm-verbose=false < %s 2>%t | FileCheck %s
+; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve,+bf16 --asm-verbose=false < %s 2>%t | FileCheck %s
 ; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t
 
 ; WARN-NOT: warning
@@ -94,6 +94,20 @@
   ret void
 }
 
+define void @test_masked_ldst_sv8bf16(bfloat* %base,  %mask, i64 %offset) nounwind {
+; CHECK-LABEL: test_masked_ldst_sv8bf16:
+; CHECK-NEXT: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, x1, lsl #1]
+; CHECK-NEXT: stnt1h { z[[DATA]].h }, p0, [x0, x1, lsl #1]
+; CHECK-NEXT: ret
+  %gep = getelementptr bfloat, bfloat* %base, i64 %offset
+  %data = call  @llvm.aarch64.sve.ldnt1.nxv8bf16( %mask,
+  bfloat* %gep)
+  call void @llvm.aarch64.sve.stnt1.nxv8bf16( %data,
+  %mask,
+ bfloat* %gep)
+  ret void
+}
+
 ; 16-lane non-temporal load/stores.
 
 define void @test_masked_ldst_sv16i8(i8* %base,  %mask, i64 %offset) nounwind {
@@ -121,6 +135,7 @@
 ; 8-element non-temporal loads.
 declare  @llvm.aarch64.sve.ldnt1.nxv8i16(, i16*)
 declare  @llvm.aarch64.sve.ldnt1.nxv8f16(, half*)
+declare  @llvm.aarch64.sve.ldnt1.nxv8bf16(, bfloat*)
 
 ; 16-element non-temporal loads.
 declare  @llvm.aarch64.sve.ldnt1.nxv16i8(, i8*)
@@ -128,14 +143,15 @@
 ; 2-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv2i64(, , i64*)
 declare void @llvm.aarch64.sve.stnt1.nxv2f64(, , double*)
-  
-; 4-element non-temporal stores.
+
+; 4-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv4i32(, , i32*)
 declare void @llvm.aarch64.sve.stnt1.nxv4f32(, , float*)
-  
-; 8-element non-temporal stores.
+
+; 8-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv8i16(, , i16*)
 declare void @llvm.aarch64.sve.stnt1.nxv8f16(, , half*)
+declare void @llvm.aarch64.sve.stnt1.nxv8bf16(, , bfloat*)
 
 ; 16-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv16i8(, , i8*)
Index: llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
===
--- llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
+++ llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
@@ -1,4 +1,4 @@
-; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve --asm-verbose=false < %s 2>%t | FileCheck %s
+; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve,+bf16 --asm-verbose=false < %s 2>%t | FileCheck %s
 ; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t
 
 ; WARN-NOT: warning
@@ -139,6 +139,23 @@
   ret void
 }
 
+define void @test_masked_ldst_sv8bf16( * %base,  %mask) nounwind {
+; CHECK-LABEL: test_masked_ldst_sv8bf16:
+; CHECK-NEXT: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, #-1, mul vl]
+; CHECK-NEXT: stnt1h { z[[DATA]].h }, p0, [x0, #2, mul vl]
+; CHECK-NEXT: ret
+  %base_load = getelementptr , * %base, i64 -1
+  %base_load_bc = bitcast * %base_load to bfloat*
+  

[PATCH] D82448: [AArch64][SVE] Add bfloat16 support to store intrinsics

2020-06-24 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, c-rhodes, fpetrogalli, efriedma, stuij, 
david-arm.
Herald added subscribers: llvm-commits, cfe-commits, danielkiss, psnobl, 
rkruppe, hiraditya, kristof.beyls, tschuett.
Herald added projects: clang, LLVM.

Bfloat16 support added for the following intrinsics:

- ST1
- STNT1


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D82448

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_stnt1-bfloat.c
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll
  llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
  llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
  
llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
  
llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll

Index: llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
===
--- llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
+++ llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
@@ -94,6 +94,20 @@
   ret void
 }
 
+define void @test_masked_ldst_sv8bf16(bfloat* %base,  %mask, i64 %offset) nounwind {
+; CHECK-LABEL: test_masked_ldst_sv8bf16:
+; CHECK-NEXT: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, x1, lsl #1]
+; CHECK-NEXT: stnt1h { z[[DATA]].h }, p0, [x0, x1, lsl #1]
+; CHECK-NEXT: ret
+  %gep = getelementptr bfloat, bfloat* %base, i64 %offset
+  %data = call  @llvm.aarch64.sve.ldnt1.nxv8bf16( %mask,
+  bfloat* %gep)
+  call void @llvm.aarch64.sve.stnt1.nxv8bf16( %data,
+  %mask,
+ bfloat* %gep)
+  ret void
+}
+
 ; 16-lane non-temporal load/stores.
 
 define void @test_masked_ldst_sv16i8(i8* %base,  %mask, i64 %offset) nounwind {
@@ -121,6 +135,7 @@
 ; 8-element non-temporal loads.
 declare  @llvm.aarch64.sve.ldnt1.nxv8i16(, i16*)
 declare  @llvm.aarch64.sve.ldnt1.nxv8f16(, half*)
+declare  @llvm.aarch64.sve.ldnt1.nxv8bf16(, bfloat*)
 
 ; 16-element non-temporal loads.
 declare  @llvm.aarch64.sve.ldnt1.nxv16i8(, i8*)
@@ -128,14 +143,15 @@
 ; 2-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv2i64(, , i64*)
 declare void @llvm.aarch64.sve.stnt1.nxv2f64(, , double*)
-  
-; 4-element non-temporal stores.
+
+; 4-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv4i32(, , i32*)
 declare void @llvm.aarch64.sve.stnt1.nxv4f32(, , float*)
-  
-; 8-element non-temporal stores.
+
+; 8-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv8i16(, , i16*)
 declare void @llvm.aarch64.sve.stnt1.nxv8f16(, , half*)
+declare void @llvm.aarch64.sve.stnt1.nxv8bf16(, , bfloat*)
 
 ; 16-element non-temporal stores.
 declare void @llvm.aarch64.sve.stnt1.nxv16i8(, , i8*)
Index: llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
===
--- llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
+++ llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
@@ -139,6 +139,23 @@
   ret void
 }
 
+define void @test_masked_ldst_sv8bf16( * %base,  %mask) nounwind {
+; CHECK-LABEL: test_masked_ldst_sv8bf16:
+; CHECK-NEXT: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, #-1, mul vl]
+; CHECK-NEXT: stnt1h { z[[DATA]].h }, p0, [x0, #2, mul vl]
+; CHECK-NEXT: ret
+  %base_load = getelementptr , * %base, i64 -1
+  %base_load_bc = bitcast * %base_load to bfloat*
+  %data = call  @llvm.aarch64.sve.ldnt1.nxv8bf16( %mask,
+  bfloat* %base_load_bc)
+  %base_store = getelementptr ,  * %base, i64 2
+  %base_store_bc = bitcast * %base_store to bfloat*
+  call void @llvm.aarch64.sve.stnt1.nxv8bf16( %data,
+  %mask,
+ bfloat* %base_store_bc)
+  ret void
+}
+
 ; 16-lane non-temporal load/stores.
 
 define void @test_masked_ldst_sv16i8( * %base,  %mask) nounwind {
@@ -169,6 +186,7 @@
 ; 8-element non-temporal loads.
 declare  @llvm.aarch64.sve.ldnt1.nxv8i16(, i16*)
 declare  

[PATCH] D82298: [AArch64][SVE] Add bfloat16 support to load intrinsics

2020-06-24 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG3d6cab271c7c: [AArch64][SVE] Add bfloat16 support to load 
intrinsics (authored by kmclaughlin).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82298/new/

https://reviews.llvm.org/D82298

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1rq-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnt1-bfloat.c
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
  llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll

Index: llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
===
--- llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
+++ llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
@@ -87,6 +87,14 @@
   ret  %load
 }
 
+define  @masked_load_nxv8bf16( *%a,  %mask) nounwind {
+; CHECK-LABEL: masked_load_nxv8bf16:
+; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %load = call  @llvm.masked.load.nxv8bf16( *%a, i32 2,  %mask,  undef)
+  ret  %load
+}
+
 ;
 ; Masked Stores
 ;
@@ -182,6 +190,7 @@
 declare  @llvm.masked.load.nxv4f32(*, i32, , )
 declare  @llvm.masked.load.nxv4f16(*, i32, , )
 declare  @llvm.masked.load.nxv8f16(*, i32, , )
+declare  @llvm.masked.load.nxv8bf16(*, i32, , )
 
 declare void @llvm.masked.store.nxv2i64(, *, i32, )
 declare void @llvm.masked.store.nxv4i32(, *, i32, )
Index: llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
@@ -97,6 +97,23 @@
   ret  %res
 }
 
+define  @ld1rqh_bf16( %pred, bfloat* %addr) {
+; CHECK-LABEL: ld1rqh_bf16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8bf16( %pred, bfloat* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_bf16_imm( %pred, bfloat* %addr) {
+; CHECK-LABEL: ld1rqh_bf16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds bfloat, bfloat* %addr, i16 -8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8bf16( %pred, bfloat* %ptr)
+  ret  %res
+}
+
 ;
 ; LD1RQW
 ;
@@ -208,6 +225,15 @@
   ret  %res
 }
 
+define  @ldnt1h_bf16( %pred, bfloat* %addr) {
+; CHECK-LABEL: ldnt1h_bf16:
+; CHECK: ldnt1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ldnt1.nxv8bf16( %pred,
+ bfloat* %addr)
+  ret  %res
+}
+
 ;
 ; LDNT1W
 ;
@@ -498,6 +524,7 @@
 declare  @llvm.aarch64.sve.ld1rq.nxv4i32(, i32*)
 declare  @llvm.aarch64.sve.ld1rq.nxv2i64(, i64*)
 declare  @llvm.aarch64.sve.ld1rq.nxv8f16(, half*)
+declare  @llvm.aarch64.sve.ld1rq.nxv8bf16(, bfloat*)
 declare  @llvm.aarch64.sve.ld1rq.nxv4f32(, float*)
 declare  @llvm.aarch64.sve.ld1rq.nxv2f64(, double*)
 
@@ -506,6 +533,7 @@
 declare  @llvm.aarch64.sve.ldnt1.nxv4i32(, i32*)
 declare  @llvm.aarch64.sve.ldnt1.nxv2i64(, i64*)
 declare  @llvm.aarch64.sve.ldnt1.nxv8f16(, half*)
+declare  @llvm.aarch64.sve.ldnt1.nxv8bf16(, bfloat*)
 declare  @llvm.aarch64.sve.ldnt1.nxv4f32(, float*)
 declare  @llvm.aarch64.sve.ldnt1.nxv2f64(, double*)
 
Index: llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
@@ -140,6 +140,14 @@
   ret  %load
 }
 
+define  @ldnf1h_bf16( %pg, bfloat* %a) {
+; CHECK-LABEL: ldnf1h_bf16:
+; CHECK: ldnf1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %load = call  @llvm.aarch64.sve.ldnf1.nxv8bf16( %pg, bfloat* %a)
+  ret  %load
+}
+
 define  @ldnf1h_f16_inbound( %pg, half* %a) {
 ; CHECK-LABEL: ldnf1h_f16_inbound:
 ; CHECK: ldnf1h { z0.h }, p0/z, [x0, #1, mul vl]
@@ -151,6 +159,17 @@
   ret  %load
 }
 
+define  @ldnf1h_bf16_inbound( %pg, bfloat* %a) {
+; CHECK-LABEL: ldnf1h_bf16_inbound:
+; CHECK: ldnf1h { z0.h }, p0/z, [x0, #1, mul vl]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast bfloat* %a to *
+  %base = getelementptr , * %base_scalable, i64 1
+  %base_scalar = bitcast * %base to bfloat*
+  %load = call  @llvm.aarch64.sve.ldnf1.nxv8bf16( %pg, bfloat* %base_scalar)
+  ret  %load
+}
+
 define  

[PATCH] D82298: [AArch64][SVE] Add bfloat16 support to load intrinsics

2020-06-23 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 272759.
kmclaughlin added a comment.

- Moved bfloat tests into separate files
- Added checks to the bfloat test files which test the warnings given when 
ARM_FEATURE_SVE_BF16 is omitted in the RUN line


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82298/new/

https://reviews.llvm.org/D82298

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1rq-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1-bfloat.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnt1-bfloat.c
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
  llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll

Index: llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
===
--- llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
+++ llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
@@ -87,6 +87,14 @@
   ret  %load
 }
 
+define  @masked_load_nxv8bf16( *%a,  %mask) nounwind {
+; CHECK-LABEL: masked_load_nxv8bf16:
+; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %load = call  @llvm.masked.load.nxv8bf16( *%a, i32 2,  %mask,  undef)
+  ret  %load
+}
+
 ;
 ; Masked Stores
 ;
@@ -182,6 +190,7 @@
 declare  @llvm.masked.load.nxv4f32(*, i32, , )
 declare  @llvm.masked.load.nxv4f16(*, i32, , )
 declare  @llvm.masked.load.nxv8f16(*, i32, , )
+declare  @llvm.masked.load.nxv8bf16(*, i32, , )
 
 declare void @llvm.masked.store.nxv2i64(, *, i32, )
 declare void @llvm.masked.store.nxv4i32(, *, i32, )
Index: llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
@@ -97,6 +97,23 @@
   ret  %res
 }
 
+define  @ld1rqh_bf16( %pred, bfloat* %addr) {
+; CHECK-LABEL: ld1rqh_bf16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8bf16( %pred, bfloat* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_bf16_imm( %pred, bfloat* %addr) {
+; CHECK-LABEL: ld1rqh_bf16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds bfloat, bfloat* %addr, i16 -8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8bf16( %pred, bfloat* %ptr)
+  ret  %res
+}
+
 ;
 ; LD1RQW
 ;
@@ -208,6 +225,15 @@
   ret  %res
 }
 
+define  @ldnt1h_bf16( %pred, bfloat* %addr) {
+; CHECK-LABEL: ldnt1h_bf16:
+; CHECK: ldnt1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ldnt1.nxv8bf16( %pred,
+ bfloat* %addr)
+  ret  %res
+}
+
 ;
 ; LDNT1W
 ;
@@ -498,6 +524,7 @@
 declare  @llvm.aarch64.sve.ld1rq.nxv4i32(, i32*)
 declare  @llvm.aarch64.sve.ld1rq.nxv2i64(, i64*)
 declare  @llvm.aarch64.sve.ld1rq.nxv8f16(, half*)
+declare  @llvm.aarch64.sve.ld1rq.nxv8bf16(, bfloat*)
 declare  @llvm.aarch64.sve.ld1rq.nxv4f32(, float*)
 declare  @llvm.aarch64.sve.ld1rq.nxv2f64(, double*)
 
@@ -506,6 +533,7 @@
 declare  @llvm.aarch64.sve.ldnt1.nxv4i32(, i32*)
 declare  @llvm.aarch64.sve.ldnt1.nxv2i64(, i64*)
 declare  @llvm.aarch64.sve.ldnt1.nxv8f16(, half*)
+declare  @llvm.aarch64.sve.ldnt1.nxv8bf16(, bfloat*)
 declare  @llvm.aarch64.sve.ldnt1.nxv4f32(, float*)
 declare  @llvm.aarch64.sve.ldnt1.nxv2f64(, double*)
 
Index: llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
@@ -140,6 +140,14 @@
   ret  %load
 }
 
+define  @ldnf1h_bf16( %pg, bfloat* %a) {
+; CHECK-LABEL: ldnf1h_bf16:
+; CHECK: ldnf1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %load = call  @llvm.aarch64.sve.ldnf1.nxv8bf16( %pg, bfloat* %a)
+  ret  %load
+}
+
 define  @ldnf1h_f16_inbound( %pg, half* %a) {
 ; CHECK-LABEL: ldnf1h_f16_inbound:
 ; CHECK: ldnf1h { z0.h }, p0/z, [x0, #1, mul vl]
@@ -151,6 +159,17 @@
   ret  %load
 }
 
+define  @ldnf1h_bf16_inbound( %pg, bfloat* %a) {
+; CHECK-LABEL: ldnf1h_bf16_inbound:
+; CHECK: ldnf1h { z0.h }, p0/z, [x0, #1, mul vl]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast bfloat* %a to *
+  %base = getelementptr , * %base_scalable, i64 1
+  %base_scalar = bitcast * %base to bfloat*
+  %load = call  @llvm.aarch64.sve.ldnf1.nxv8bf16( %pg, bfloat* %base_scalar)
+  ret  %load
+}
+

[PATCH] D79167: [SVE][CodeGen] Legalisation of vsetcc with scalable types

2020-06-23 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG508050317403: [SVE][CodeGen] Legalisation of vsetcc with 
scalable types (authored by kmclaughlin).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79167/new/

https://reviews.llvm.org/D79167

Files:
  llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

Index: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
===
--- llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
+++ llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
@@ -165,6 +165,95 @@
   ret  %min
 }
 
+define  @smin_split_i8( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.b
+; CHECK-NEXT:smin z0.b, p0/m, z0.b, z2.b
+; CHECK-NEXT:smin z1.b, p0/m, z1.b, z3.b
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i16( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:smin z0.h, p0/m, z0.h, z4.h
+; CHECK-NEXT:smin z1.h, p0/m, z1.h, z5.h
+; CHECK-NEXT:smin z2.h, p0/m, z2.h, z6.h
+; CHECK-NEXT:smin z3.h, p0/m, z3.h, z7.h
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i32:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.s
+; CHECK-NEXT:smin z0.s, p0/m, z0.s, z2.s
+; CHECK-NEXT:smin z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i64( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i64:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.d
+; CHECK-NEXT:smin z0.d, p0/m, z0.d, z2.d
+; CHECK-NEXT:smin z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i8( %a,  %b,  %c) {
+; CHECK-LABEL: smin_promote_i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:sxtb z1.h, p0/m, z1.h
+; CHECK-NEXT:sxtb z0.h, p0/m, z0.h
+; CHECK-NEXT:smin z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i16( %a,  %b,  %c) {
+; CHECK-LABEL: smin_promote_i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.s
+; CHECK-NEXT:sxth z1.s, p0/m, z1.s
+; CHECK-NEXT:sxth z0.s, p0/m, z0.s
+; CHECK-NEXT:smin z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smin_promote_i32:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.d
+; CHECK-NEXT:sxtw z1.d, p0/m, z1.d
+; CHECK-NEXT:sxtw z0.d, p0/m, z0.d
+; CHECK-NEXT:smin z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
 ;
 ; UMIN
 ;
@@ -213,6 +302,31 @@
   ret  %min
 }
 
+define  @umin_split_i64( %a,  %b,  %c) {
+; CHECK-LABEL: umin_split_i64:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.d
+; CHECK-NEXT:umin z0.d, p0/m, z0.d, z2.d
+; CHECK-NEXT:umin z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT:ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @umin_promote_i8( %a,  %b,  %c) {
+; CHECK-LABEL: umin_promote_i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:and z1.h, z1.h, #0xff
+; CHECK-NEXT:and z0.h, z0.h, #0xff
+; CHECK-NEXT:umin z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT:ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
 ;
 ; SMAX
 ;
@@ -224,8 +338,8 @@
 ; CHECK-NEXT:smax z0.b, p0/m, z0.b, z1.b
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i16( %a,  %b,  %c) {
@@ -235,8 +349,8 @@
 ; CHECK-NEXT:smax z0.h, p0/m, z0.h, z1.h
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i32( %a,  %b,  %c) {
@@ -246,8 +360,8 @@
 ; CHECK-NEXT:smax z0.s, p0/m, z0.s, z1.s
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i64( %a,  %b,  %c) {
@@ -257,8 +371,33 @@
 ; CHECK-NEXT:smax z0.d, p0/m, z0.d, z1.d
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
+}
+
+define  @smax_split_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smax_split_i32:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue 

[PATCH] D79167: [SVE][CodeGen] Legalisation of vsetcc with scalable types

2020-06-22 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 272483.
kmclaughlin added a comment.

Added tests to llvm-ir-to-intrinsic.ll which check the results of compare 
instructions


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79167/new/

https://reviews.llvm.org/D79167

Files:
  llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

Index: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
===
--- llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
+++ llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
@@ -165,6 +165,95 @@
   ret  %min
 }
 
+define  @smin_split_i8( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.b
+; CHECK-NEXT:smin z0.b, p0/m, z0.b, z2.b
+; CHECK-NEXT:smin z1.b, p0/m, z1.b, z3.b
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i16( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:smin z0.h, p0/m, z0.h, z4.h
+; CHECK-NEXT:smin z1.h, p0/m, z1.h, z5.h
+; CHECK-NEXT:smin z2.h, p0/m, z2.h, z6.h
+; CHECK-NEXT:smin z3.h, p0/m, z3.h, z7.h
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i32:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.s
+; CHECK-NEXT:smin z0.s, p0/m, z0.s, z2.s
+; CHECK-NEXT:smin z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i64( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i64:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.d
+; CHECK-NEXT:smin z0.d, p0/m, z0.d, z2.d
+; CHECK-NEXT:smin z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i8( %a,  %b,  %c) {
+; CHECK-LABEL: smin_promote_i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:sxtb z1.h, p0/m, z1.h
+; CHECK-NEXT:sxtb z0.h, p0/m, z0.h
+; CHECK-NEXT:smin z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i16( %a,  %b,  %c) {
+; CHECK-LABEL: smin_promote_i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.s
+; CHECK-NEXT:sxth z1.s, p0/m, z1.s
+; CHECK-NEXT:sxth z0.s, p0/m, z0.s
+; CHECK-NEXT:smin z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smin_promote_i32:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.d
+; CHECK-NEXT:sxtw z1.d, p0/m, z1.d
+; CHECK-NEXT:sxtw z0.d, p0/m, z0.d
+; CHECK-NEXT:smin z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
 ;
 ; UMIN
 ;
@@ -213,6 +302,31 @@
   ret  %min
 }
 
+define  @umin_split_i64( %a,  %b,  %c) {
+; CHECK-LABEL: umin_split_i64:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.d
+; CHECK-NEXT:umin z0.d, p0/m, z0.d, z2.d
+; CHECK-NEXT:umin z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT:ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @umin_promote_i8( %a,  %b,  %c) {
+; CHECK-LABEL: umin_promote_i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:and z1.h, z1.h, #0xff
+; CHECK-NEXT:and z0.h, z0.h, #0xff
+; CHECK-NEXT:umin z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT:ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
 ;
 ; SMAX
 ;
@@ -224,8 +338,8 @@
 ; CHECK-NEXT:smax z0.b, p0/m, z0.b, z1.b
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i16( %a,  %b,  %c) {
@@ -235,8 +349,8 @@
 ; CHECK-NEXT:smax z0.h, p0/m, z0.h, z1.h
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i32( %a,  %b,  %c) {
@@ -246,8 +360,8 @@
 ; CHECK-NEXT:smax z0.s, p0/m, z0.s, z1.s
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i64( %a,  %b,  %c) {
@@ -257,8 +371,33 @@
 ; CHECK-NEXT:smax z0.d, p0/m, z0.d, z1.d
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
+}
+
+define  @smax_split_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smax_split_i32:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.s
+; CHECK-NEXT:smax z0.s, p0/m, z0.s, z2.s
+; CHECK-NEXT:  

[PATCH] D79167: [SVE][CodeGen] Legalisation of vsetcc with scalable types

2020-06-22 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added a comment.

In D79167#2098774 , @efriedma wrote:

> Is it possible to write tests for this that don't result in a "max" or "min" 
> operation?  Or does that fail for some other reason?
>
> Otherwise LGTM.


Thanks for reviewing this, @efriedma. It was possible to add tests which don't 
result in a min/max by returning the result of the compare.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79167/new/

https://reviews.llvm.org/D79167



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82298: [AArch64][SVE] Add bfloat16 support to load intrinsics

2020-06-22 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, c-rhodes, efriedma, stuij, fpetrogalli, 
david-arm.
Herald added subscribers: llvm-commits, cfe-commits, danielkiss, psnobl, 
rkruppe, hiraditya, kristof.beyls, tschuett.
Herald added projects: clang, LLVM.

Bfloat16 support added for the following intrinsics:

- LD1
- LD1RQ
- LDNT1
- LDNF1
- LDFF1


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D82298

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1rq.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1.c
  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnt1.c
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads-ff.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
  llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll

Index: llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
===
--- llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
+++ llvm/test/CodeGen/AArch64/sve-masked-ldst-nonext.ll
@@ -87,6 +87,14 @@
   ret  %load
 }
 
+define  @masked_load_nxv8bf16( *%a,  %mask) nounwind {
+; CHECK-LABEL: masked_load_nxv8bf16:
+; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %load = call  @llvm.masked.load.nxv8bf16( *%a, i32 2,  %mask,  undef)
+  ret  %load
+}
+
 ;
 ; Masked Stores
 ;
@@ -182,6 +190,7 @@
 declare  @llvm.masked.load.nxv4f32(*, i32, , )
 declare  @llvm.masked.load.nxv4f16(*, i32, , )
 declare  @llvm.masked.load.nxv8f16(*, i32, , )
+declare  @llvm.masked.load.nxv8bf16(*, i32, , )
 
 declare void @llvm.masked.store.nxv2i64(, *, i32, )
 declare void @llvm.masked.store.nxv4i32(, *, i32, )
Index: llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
@@ -97,6 +97,23 @@
   ret  %res
 }
 
+define  @ld1rqh_bf16( %pred, bfloat* %addr) {
+; CHECK-LABEL: ld1rqh_bf16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8bf16( %pred, bfloat* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_bf16_imm( %pred, bfloat* %addr) {
+; CHECK-LABEL: ld1rqh_bf16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds bfloat, bfloat* %addr, i16 -8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8bf16( %pred, bfloat* %ptr)
+  ret  %res
+}
+
 ;
 ; LD1RQW
 ;
@@ -208,6 +225,15 @@
   ret  %res
 }
 
+define  @ldnt1h_bf16( %pred, bfloat* %addr) {
+; CHECK-LABEL: ldnt1h_bf16:
+; CHECK: ldnt1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ldnt1.nxv8bf16( %pred,
+ bfloat* %addr)
+  ret  %res
+}
+
 ;
 ; LDNT1W
 ;
@@ -474,6 +500,7 @@
 declare  @llvm.aarch64.sve.ld1rq.nxv4i32(, i32*)
 declare  @llvm.aarch64.sve.ld1rq.nxv2i64(, i64*)
 declare  @llvm.aarch64.sve.ld1rq.nxv8f16(, half*)
+declare  @llvm.aarch64.sve.ld1rq.nxv8bf16(, bfloat*)
 declare  @llvm.aarch64.sve.ld1rq.nxv4f32(, float*)
 declare  @llvm.aarch64.sve.ld1rq.nxv2f64(, double*)
 
@@ -482,6 +509,7 @@
 declare  @llvm.aarch64.sve.ldnt1.nxv4i32(, i32*)
 declare  @llvm.aarch64.sve.ldnt1.nxv2i64(, i64*)
 declare  @llvm.aarch64.sve.ldnt1.nxv8f16(, half*)
+declare  @llvm.aarch64.sve.ldnt1.nxv8bf16(, bfloat*)
 declare  @llvm.aarch64.sve.ldnt1.nxv4f32(, float*)
 declare  @llvm.aarch64.sve.ldnt1.nxv2f64(, double*)
 
Index: llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll
@@ -140,6 +140,14 @@
   ret  %load
 }
 
+define  @ldnf1h_bf16( %pg, bfloat* %a) {
+; CHECK-LABEL: ldnf1h_bf16:
+; CHECK: ldnf1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %load = call  @llvm.aarch64.sve.ldnf1.nxv8bf16( %pg, bfloat* %a)
+  ret  %load
+}
+
 define  @ldnf1h_f16_inbound( %pg, half* %a) {
 ; CHECK-LABEL: ldnf1h_f16_inbound:
 ; CHECK: ldnf1h { z0.h }, p0/z, [x0, #1, mul vl]
@@ -151,6 +159,17 @@
   ret  %load
 }
 
+define  @ldnf1h_bf16_inbound( %pg, bfloat* %a) {
+; CHECK-LABEL: ldnf1h_bf16_inbound:
+; CHECK: ldnf1h { z0.h }, p0/z, [x0, #1, mul vl]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast bfloat* %a to *
+  %base = getelementptr , * %base_scalable, i64 1
+  %base_scalar = bitcast * %base to bfloat*
+  %load = call  

[PATCH] D79167: [SVE][CodeGen] Legalisation of vsetcc with scalable types

2020-06-17 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 271391.
kmclaughlin retitled this revision from "[SVE][CodeGen] Legalise scalable 
vector types for vsetcc & vselect" to "[SVE][CodeGen] Legalisation of vsetcc 
with scalable types".
kmclaughlin edited the summary of this revision.
kmclaughlin added a comment.

- Rebased patch & updated checks in the tests with update_llc_test_checks.py


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79167/new/

https://reviews.llvm.org/D79167

Files:
  llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

Index: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
===
--- llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
+++ llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
@@ -165,6 +165,95 @@
   ret  %min
 }
 
+define  @smin_split_i8( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.b
+; CHECK-NEXT:smin z0.b, p0/m, z0.b, z2.b
+; CHECK-NEXT:smin z1.b, p0/m, z1.b, z3.b
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i16( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:smin z0.h, p0/m, z0.h, z4.h
+; CHECK-NEXT:smin z1.h, p0/m, z1.h, z5.h
+; CHECK-NEXT:smin z2.h, p0/m, z2.h, z6.h
+; CHECK-NEXT:smin z3.h, p0/m, z3.h, z7.h
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i32:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.s
+; CHECK-NEXT:smin z0.s, p0/m, z0.s, z2.s
+; CHECK-NEXT:smin z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i64( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i64:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.d
+; CHECK-NEXT:smin z0.d, p0/m, z0.d, z2.d
+; CHECK-NEXT:smin z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i8( %a,  %b,  %c) {
+; CHECK-LABEL: smin_promote_i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:sxtb z1.h, p0/m, z1.h
+; CHECK-NEXT:sxtb z0.h, p0/m, z0.h
+; CHECK-NEXT:smin z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i16( %a,  %b,  %c) {
+; CHECK-LABEL: smin_promote_i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.s
+; CHECK-NEXT:sxth z1.s, p0/m, z1.s
+; CHECK-NEXT:sxth z0.s, p0/m, z0.s
+; CHECK-NEXT:smin z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smin_promote_i32:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.d
+; CHECK-NEXT:sxtw z1.d, p0/m, z1.d
+; CHECK-NEXT:sxtw z0.d, p0/m, z0.d
+; CHECK-NEXT:smin z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT:ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
 ;
 ; UMIN
 ;
@@ -213,6 +302,31 @@
   ret  %min
 }
 
+define  @umin_split_i64( %a,  %b,  %c) {
+; CHECK-LABEL: umin_split_i64:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.d
+; CHECK-NEXT:umin z0.d, p0/m, z0.d, z2.d
+; CHECK-NEXT:umin z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT:ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @umin_promote_i8( %a,  %b,  %c) {
+; CHECK-LABEL: umin_promote_i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:and z1.h, z1.h, #0xff
+; CHECK-NEXT:and z0.h, z0.h, #0xff
+; CHECK-NEXT:umin z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT:ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
 ;
 ; SMAX
 ;
@@ -224,8 +338,8 @@
 ; CHECK-NEXT:smax z0.b, p0/m, z0.b, z1.b
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i16( %a,  %b,  %c) {
@@ -235,8 +349,8 @@
 ; CHECK-NEXT:smax z0.h, p0/m, z0.h, z1.h
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i32( %a,  %b,  %c) {
@@ -246,8 +360,8 @@
 ; CHECK-NEXT:smax z0.s, p0/m, z0.s, z1.s
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i64( %a,  %b,  %c) {
@@ -257,8 +371,33 @@
 ; CHECK-NEXT:smax z0.d, p0/m, z0.d, z1.d
 ; CHECK-NEXT:ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a, 

[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-06-05 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG89fc0166f532: [CodeGen][SVE] Legalisation of extends with 
scalable types (authored by kmclaughlin).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587

Files:
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/test/CodeGen/AArch64/sve-sext-zext.ll

Index: llvm/test/CodeGen/AArch64/sve-sext-zext.ll
===
--- llvm/test/CodeGen/AArch64/sve-sext-zext.ll
+++ llvm/test/CodeGen/AArch64/sve-sext-zext.ll
@@ -186,3 +186,143 @@
   %r = zext  %a to 
   ret  %r
 }
+
+; Extending to illegal types
+
+define  @sext_b_to_h( %a) {
+; CHECK-LABEL: sext_b_to_h:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.h, z0.b
+; CHECK-NEXT:sunpkhi z1.h, z0.b
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_h_to_s( %a) {
+; CHECK-LABEL: sext_h_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.s, z0.h
+; CHECK-NEXT:sunpkhi z1.s, z0.h
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_s_to_d( %a) {
+; CHECK-LABEL: sext_s_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.d, z0.s
+; CHECK-NEXT:sunpkhi z1.d, z0.s
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_s( %a) {
+; CHECK-LABEL: sext_b_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z1.h, z0.b
+; CHECK-NEXT:sunpkhi z3.h, z0.b
+; CHECK-NEXT:sunpklo z0.s, z1.h
+; CHECK-NEXT:sunpkhi z1.s, z1.h
+; CHECK-NEXT:sunpklo z2.s, z3.h
+; CHECK-NEXT:sunpkhi z3.s, z3.h
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_d( %a) {
+; CHECK-LABEL: sext_b_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z1.h, z0.b
+; CHECK-NEXT:sunpkhi z0.h, z0.b
+; CHECK-NEXT:sunpklo z2.s, z1.h
+; CHECK-NEXT:sunpkhi z3.s, z1.h
+; CHECK-NEXT:sunpklo z5.s, z0.h
+; CHECK-NEXT:sunpkhi z7.s, z0.h
+; CHECK-NEXT:sunpklo z0.d, z2.s
+; CHECK-NEXT:sunpkhi z1.d, z2.s
+; CHECK-NEXT:sunpklo z2.d, z3.s
+; CHECK-NEXT:sunpkhi z3.d, z3.s
+; CHECK-NEXT:sunpklo z4.d, z5.s
+; CHECK-NEXT:sunpkhi z5.d, z5.s
+; CHECK-NEXT:sunpklo z6.d, z7.s
+; CHECK-NEXT:sunpkhi z7.d, z7.s
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_h( %a) {
+; CHECK-LABEL: zext_b_to_h:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.h, z0.b
+; CHECK-NEXT:uunpkhi z1.h, z0.b
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_h_to_s( %a) {
+; CHECK-LABEL: zext_h_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.s, z0.h
+; CHECK-NEXT:uunpkhi z1.s, z0.h
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_s_to_d( %a) {
+; CHECK-LABEL: zext_s_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.d, z0.s
+; CHECK-NEXT:uunpkhi z1.d, z0.s
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_s( %a) {
+; CHECK-LABEL: zext_b_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z1.h, z0.b
+; CHECK-NEXT:uunpkhi z3.h, z0.b
+; CHECK-NEXT:uunpklo z0.s, z1.h
+; CHECK-NEXT:uunpkhi z1.s, z1.h
+; CHECK-NEXT:uunpklo z2.s, z3.h
+; CHECK-NEXT:uunpkhi z3.s, z3.h
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_d( %a) {
+; CHECK-LABEL: zext_b_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z1.h, z0.b
+; CHECK-NEXT:uunpkhi z0.h, z0.b
+; CHECK-NEXT:uunpklo z2.s, z1.h
+; CHECK-NEXT:uunpkhi z3.s, z1.h
+; CHECK-NEXT:uunpklo z5.s, z0.h
+; CHECK-NEXT:uunpkhi z7.s, z0.h
+; CHECK-NEXT:uunpklo z0.d, z2.s
+; CHECK-NEXT:uunpkhi z1.d, z2.s
+; CHECK-NEXT:uunpklo z2.d, z3.s
+; CHECK-NEXT:uunpkhi z3.d, z3.s
+; CHECK-NEXT:uunpklo z4.d, z5.s
+; CHECK-NEXT:uunpkhi z5.d, z5.s
+; CHECK-NEXT:uunpklo z6.d, z7.s
+; CHECK-NEXT:uunpkhi z7.d, z7.s
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -889,6 +889,9 @@
 
   void ReplaceNodeResults(SDNode *N, SmallVectorImpl ,
   SelectionDAG ) const override;
+  void ReplaceExtractSubVectorResults(SDNode *N,
+  SmallVectorImpl ,
+ 

[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-06-03 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 268246.
kmclaughlin added a comment.

- Use APInt::trunc to truncate the constant in performSVEAndCombine


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587

Files:
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/test/CodeGen/AArch64/sve-sext-zext.ll

Index: llvm/test/CodeGen/AArch64/sve-sext-zext.ll
===
--- llvm/test/CodeGen/AArch64/sve-sext-zext.ll
+++ llvm/test/CodeGen/AArch64/sve-sext-zext.ll
@@ -186,3 +186,143 @@
   %r = zext  %a to 
   ret  %r
 }
+
+; Extending to illegal types
+
+define  @sext_b_to_h( %a) {
+; CHECK-LABEL: sext_b_to_h:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.h, z0.b
+; CHECK-NEXT:sunpkhi z1.h, z0.b
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_h_to_s( %a) {
+; CHECK-LABEL: sext_h_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.s, z0.h
+; CHECK-NEXT:sunpkhi z1.s, z0.h
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_s_to_d( %a) {
+; CHECK-LABEL: sext_s_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.d, z0.s
+; CHECK-NEXT:sunpkhi z1.d, z0.s
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_s( %a) {
+; CHECK-LABEL: sext_b_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z1.h, z0.b
+; CHECK-NEXT:sunpkhi z3.h, z0.b
+; CHECK-NEXT:sunpklo z0.s, z1.h
+; CHECK-NEXT:sunpkhi z1.s, z1.h
+; CHECK-NEXT:sunpklo z2.s, z3.h
+; CHECK-NEXT:sunpkhi z3.s, z3.h
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_d( %a) {
+; CHECK-LABEL: sext_b_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z1.h, z0.b
+; CHECK-NEXT:sunpkhi z0.h, z0.b
+; CHECK-NEXT:sunpklo z2.s, z1.h
+; CHECK-NEXT:sunpkhi z3.s, z1.h
+; CHECK-NEXT:sunpklo z5.s, z0.h
+; CHECK-NEXT:sunpkhi z7.s, z0.h
+; CHECK-NEXT:sunpklo z0.d, z2.s
+; CHECK-NEXT:sunpkhi z1.d, z2.s
+; CHECK-NEXT:sunpklo z2.d, z3.s
+; CHECK-NEXT:sunpkhi z3.d, z3.s
+; CHECK-NEXT:sunpklo z4.d, z5.s
+; CHECK-NEXT:sunpkhi z5.d, z5.s
+; CHECK-NEXT:sunpklo z6.d, z7.s
+; CHECK-NEXT:sunpkhi z7.d, z7.s
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_h( %a) {
+; CHECK-LABEL: zext_b_to_h:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.h, z0.b
+; CHECK-NEXT:uunpkhi z1.h, z0.b
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_h_to_s( %a) {
+; CHECK-LABEL: zext_h_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.s, z0.h
+; CHECK-NEXT:uunpkhi z1.s, z0.h
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_s_to_d( %a) {
+; CHECK-LABEL: zext_s_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.d, z0.s
+; CHECK-NEXT:uunpkhi z1.d, z0.s
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_s( %a) {
+; CHECK-LABEL: zext_b_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z1.h, z0.b
+; CHECK-NEXT:uunpkhi z3.h, z0.b
+; CHECK-NEXT:uunpklo z0.s, z1.h
+; CHECK-NEXT:uunpkhi z1.s, z1.h
+; CHECK-NEXT:uunpklo z2.s, z3.h
+; CHECK-NEXT:uunpkhi z3.s, z3.h
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_d( %a) {
+; CHECK-LABEL: zext_b_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z1.h, z0.b
+; CHECK-NEXT:uunpkhi z0.h, z0.b
+; CHECK-NEXT:uunpklo z2.s, z1.h
+; CHECK-NEXT:uunpkhi z3.s, z1.h
+; CHECK-NEXT:uunpklo z5.s, z0.h
+; CHECK-NEXT:uunpkhi z7.s, z0.h
+; CHECK-NEXT:uunpklo z0.d, z2.s
+; CHECK-NEXT:uunpkhi z1.d, z2.s
+; CHECK-NEXT:uunpklo z2.d, z3.s
+; CHECK-NEXT:uunpkhi z3.d, z3.s
+; CHECK-NEXT:uunpklo z4.d, z5.s
+; CHECK-NEXT:uunpkhi z5.d, z5.s
+; CHECK-NEXT:uunpklo z6.d, z7.s
+; CHECK-NEXT:uunpkhi z7.d, z7.s
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -888,6 +888,9 @@
 
   void ReplaceNodeResults(SDNode *N, SmallVectorImpl ,
   SelectionDAG ) const override;
+  void ReplaceExtractSubVectorResults(SDNode *N,
+  SmallVectorImpl ,
+  SelectionDAG ) const;
 
   bool 

[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-06-02 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 267919.
kmclaughlin marked 2 inline comments as done.
kmclaughlin added a comment.

- Added a truncate of ExtVal in performSVEAndCombine
- Changed the assert added to performSignExtendInRegCombine in the previous 
revision


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587

Files:
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/test/CodeGen/AArch64/sve-sext-zext.ll

Index: llvm/test/CodeGen/AArch64/sve-sext-zext.ll
===
--- llvm/test/CodeGen/AArch64/sve-sext-zext.ll
+++ llvm/test/CodeGen/AArch64/sve-sext-zext.ll
@@ -186,3 +186,143 @@
   %r = zext  %a to 
   ret  %r
 }
+
+; Extending to illegal types
+
+define  @sext_b_to_h( %a) {
+; CHECK-LABEL: sext_b_to_h:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.h, z0.b
+; CHECK-NEXT:sunpkhi z1.h, z0.b
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_h_to_s( %a) {
+; CHECK-LABEL: sext_h_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.s, z0.h
+; CHECK-NEXT:sunpkhi z1.s, z0.h
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_s_to_d( %a) {
+; CHECK-LABEL: sext_s_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.d, z0.s
+; CHECK-NEXT:sunpkhi z1.d, z0.s
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_s( %a) {
+; CHECK-LABEL: sext_b_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z1.h, z0.b
+; CHECK-NEXT:sunpkhi z3.h, z0.b
+; CHECK-NEXT:sunpklo z0.s, z1.h
+; CHECK-NEXT:sunpkhi z1.s, z1.h
+; CHECK-NEXT:sunpklo z2.s, z3.h
+; CHECK-NEXT:sunpkhi z3.s, z3.h
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_d( %a) {
+; CHECK-LABEL: sext_b_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z1.h, z0.b
+; CHECK-NEXT:sunpkhi z0.h, z0.b
+; CHECK-NEXT:sunpklo z2.s, z1.h
+; CHECK-NEXT:sunpkhi z3.s, z1.h
+; CHECK-NEXT:sunpklo z5.s, z0.h
+; CHECK-NEXT:sunpkhi z7.s, z0.h
+; CHECK-NEXT:sunpklo z0.d, z2.s
+; CHECK-NEXT:sunpkhi z1.d, z2.s
+; CHECK-NEXT:sunpklo z2.d, z3.s
+; CHECK-NEXT:sunpkhi z3.d, z3.s
+; CHECK-NEXT:sunpklo z4.d, z5.s
+; CHECK-NEXT:sunpkhi z5.d, z5.s
+; CHECK-NEXT:sunpklo z6.d, z7.s
+; CHECK-NEXT:sunpkhi z7.d, z7.s
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_h( %a) {
+; CHECK-LABEL: zext_b_to_h:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.h, z0.b
+; CHECK-NEXT:uunpkhi z1.h, z0.b
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_h_to_s( %a) {
+; CHECK-LABEL: zext_h_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.s, z0.h
+; CHECK-NEXT:uunpkhi z1.s, z0.h
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_s_to_d( %a) {
+; CHECK-LABEL: zext_s_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.d, z0.s
+; CHECK-NEXT:uunpkhi z1.d, z0.s
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_s( %a) {
+; CHECK-LABEL: zext_b_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z1.h, z0.b
+; CHECK-NEXT:uunpkhi z3.h, z0.b
+; CHECK-NEXT:uunpklo z0.s, z1.h
+; CHECK-NEXT:uunpkhi z1.s, z1.h
+; CHECK-NEXT:uunpklo z2.s, z3.h
+; CHECK-NEXT:uunpkhi z3.s, z3.h
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_d( %a) {
+; CHECK-LABEL: zext_b_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z1.h, z0.b
+; CHECK-NEXT:uunpkhi z0.h, z0.b
+; CHECK-NEXT:uunpklo z2.s, z1.h
+; CHECK-NEXT:uunpkhi z3.s, z1.h
+; CHECK-NEXT:uunpklo z5.s, z0.h
+; CHECK-NEXT:uunpkhi z7.s, z0.h
+; CHECK-NEXT:uunpklo z0.d, z2.s
+; CHECK-NEXT:uunpkhi z1.d, z2.s
+; CHECK-NEXT:uunpklo z2.d, z3.s
+; CHECK-NEXT:uunpkhi z3.d, z3.s
+; CHECK-NEXT:uunpklo z4.d, z5.s
+; CHECK-NEXT:uunpkhi z5.d, z5.s
+; CHECK-NEXT:uunpklo z6.d, z7.s
+; CHECK-NEXT:uunpkhi z7.d, z7.s
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -888,6 +888,9 @@
 
   void ReplaceNodeResults(SDNode *N, SmallVectorImpl ,
   SelectionDAG ) const override;
+  void ReplaceExtractSubVectorResults(SDNode *N,
+  

[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-06-02 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin marked an inline comment as not done.
kmclaughlin added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10676
+ConstantSDNode *C = dyn_cast(Dup->getOperand(0));
+uint64_t ExtVal = C->getZExtValue();
+

efriedma wrote:
> kmclaughlin wrote:
> > efriedma wrote:
> > > Do you need to truncate ExtVal somewhere, so you don't end up with a DUP 
> > > with an over-wide constant?
> > I've changed the call to `getNode` below that creates the DUP to use 
> > `DAG.getAnyExtOrTrunc` (similar to what we do in LowerSPLAT_VECTOR)
> I'm specifically concerned that you could end up with something like 
> `(nxv16i8 (dup (i32 0x12345678)))`.
I see what you mean - I've added a truncate of `Dup->getOperand(0)` for this, 
which will truncate the constant to the type of `UnpkOp`



Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13300
+// is another unpack:
+// 4i32 sign_extend_inreg (4i32 uunpklo(8i16 uunpklo (16i8 opnd)), from 
4i8)
+// ->

efriedma wrote:
> kmclaughlin wrote:
> > efriedma wrote:
> > > It seems a little fragile to assume the inner VT of the SIGN_EXTEND_INREG 
> > > is exactly the type you're expecting here.  Probably worth at least 
> > > adding an assertion to encode the assumptions you're making.
> > I've added an assert above here to make sure the sign_extend_inreg and 
> > unpack types match, is this the assumption you were referring to?
> We assert that SIGN_EXTEND_INREG has valid operand/result types elsewhere.
> 
> I was more concerned about the inner VT 
> (`cast(N->getOperand(1))->getVT();`).  You could end up creating an 
> invalid SIGN_EXTEND_INREG if the type is something weird, like a 
> non-byte-size integer type.
Removed my previous check on the operand & result types and added an assert for 
the type of VT.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-06-01 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin marked 2 inline comments as done.
kmclaughlin added a comment.

Thanks for taking another look at this, @efriedma!




Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10676
+ConstantSDNode *C = dyn_cast(Dup->getOperand(0));
+uint64_t ExtVal = C->getZExtValue();
+

efriedma wrote:
> Do you need to truncate ExtVal somewhere, so you don't end up with a DUP with 
> an over-wide constant?
I've changed the call to `getNode` below that creates the DUP to use 
`DAG.getAnyExtOrTrunc` (similar to what we do in LowerSPLAT_VECTOR)



Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13300
+// is another unpack:
+// 4i32 sign_extend_inreg (4i32 uunpklo(8i16 uunpklo (16i8 opnd)), from 
4i8)
+// ->

efriedma wrote:
> It seems a little fragile to assume the inner VT of the SIGN_EXTEND_INREG is 
> exactly the type you're expecting here.  Probably worth at least adding an 
> assertion to encode the assumptions you're making.
I've added an assert above here to make sure the sign_extend_inreg and unpack 
types match, is this the assumption you were referring to?



Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13816
+  if (!isTypeLegal(InVT)) {
+// Bubble truncates to illegal types to the surface.
+if (In->getOpcode() == ISD::TRUNCATE) {

efriedma wrote:
> "Bubble truncates to illegal types to the surface" is an optimization?
Removed - this was not required for this patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-06-01 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 267695.
kmclaughlin marked 7 inline comments as done.
kmclaughlin added a comment.

- Restricted the illegal types which should be lowered for EXTRACT_SUBVECTOR to 
those handled in this patch (nxv8i8, nxv4i16 & nxv2i32)
- Removed unnecessary changes in ReplaceExtractSubVectorResults
- Updated the tests with update_llc_test_checks.py
- Added a check on expected types in performSignExtendInRegCombine & 
PromoteIntRes_EXTRACT_SUBVECTOR


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587

Files:
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/test/CodeGen/AArch64/sve-sext-zext.ll

Index: llvm/test/CodeGen/AArch64/sve-sext-zext.ll
===
--- llvm/test/CodeGen/AArch64/sve-sext-zext.ll
+++ llvm/test/CodeGen/AArch64/sve-sext-zext.ll
@@ -186,3 +186,143 @@
   %r = zext  %a to 
   ret  %r
 }
+
+; Extending to illegal types
+
+define  @sext_b_to_h( %a) {
+; CHECK-LABEL: sext_b_to_h:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.h, z0.b
+; CHECK-NEXT:sunpkhi z1.h, z0.b
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_h_to_s( %a) {
+; CHECK-LABEL: sext_h_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.s, z0.h
+; CHECK-NEXT:sunpkhi z1.s, z0.h
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_s_to_d( %a) {
+; CHECK-LABEL: sext_s_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z2.d, z0.s
+; CHECK-NEXT:sunpkhi z1.d, z0.s
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_s( %a) {
+; CHECK-LABEL: sext_b_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z1.h, z0.b
+; CHECK-NEXT:sunpkhi z3.h, z0.b
+; CHECK-NEXT:sunpklo z0.s, z1.h
+; CHECK-NEXT:sunpkhi z1.s, z1.h
+; CHECK-NEXT:sunpklo z2.s, z3.h
+; CHECK-NEXT:sunpkhi z3.s, z3.h
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_d( %a) {
+; CHECK-LABEL: sext_b_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sunpklo z1.h, z0.b
+; CHECK-NEXT:sunpkhi z0.h, z0.b
+; CHECK-NEXT:sunpklo z2.s, z1.h
+; CHECK-NEXT:sunpkhi z3.s, z1.h
+; CHECK-NEXT:sunpklo z5.s, z0.h
+; CHECK-NEXT:sunpkhi z7.s, z0.h
+; CHECK-NEXT:sunpklo z0.d, z2.s
+; CHECK-NEXT:sunpkhi z1.d, z2.s
+; CHECK-NEXT:sunpklo z2.d, z3.s
+; CHECK-NEXT:sunpkhi z3.d, z3.s
+; CHECK-NEXT:sunpklo z4.d, z5.s
+; CHECK-NEXT:sunpkhi z5.d, z5.s
+; CHECK-NEXT:sunpklo z6.d, z7.s
+; CHECK-NEXT:sunpkhi z7.d, z7.s
+; CHECK-NEXT:ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_h( %a) {
+; CHECK-LABEL: zext_b_to_h:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.h, z0.b
+; CHECK-NEXT:uunpkhi z1.h, z0.b
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_h_to_s( %a) {
+; CHECK-LABEL: zext_h_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.s, z0.h
+; CHECK-NEXT:uunpkhi z1.s, z0.h
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_s_to_d( %a) {
+; CHECK-LABEL: zext_s_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z2.d, z0.s
+; CHECK-NEXT:uunpkhi z1.d, z0.s
+; CHECK-NEXT:mov z0.d, z2.d
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_s( %a) {
+; CHECK-LABEL: zext_b_to_s:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z1.h, z0.b
+; CHECK-NEXT:uunpkhi z3.h, z0.b
+; CHECK-NEXT:uunpklo z0.s, z1.h
+; CHECK-NEXT:uunpkhi z1.s, z1.h
+; CHECK-NEXT:uunpklo z2.s, z3.h
+; CHECK-NEXT:uunpkhi z3.s, z3.h
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_d( %a) {
+; CHECK-LABEL: zext_b_to_d:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uunpklo z1.h, z0.b
+; CHECK-NEXT:uunpkhi z0.h, z0.b
+; CHECK-NEXT:uunpklo z2.s, z1.h
+; CHECK-NEXT:uunpkhi z3.s, z1.h
+; CHECK-NEXT:uunpklo z5.s, z0.h
+; CHECK-NEXT:uunpkhi z7.s, z0.h
+; CHECK-NEXT:uunpklo z0.d, z2.s
+; CHECK-NEXT:uunpkhi z1.d, z2.s
+; CHECK-NEXT:uunpklo z2.d, z3.s
+; CHECK-NEXT:uunpkhi z3.d, z3.s
+; CHECK-NEXT:uunpklo z4.d, z5.s
+; CHECK-NEXT:uunpkhi z5.d, z5.s
+; CHECK-NEXT:uunpklo z6.d, z7.s
+; CHECK-NEXT:uunpkhi z7.d, z7.s
+; CHECK-NEXT:ret
+  %ext = zext  %a to 
+  ret  %ext
+}
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -888,6 +888,9 @@
 

[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-05-29 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 267280.
kmclaughlin added a comment.

- Replaced uses of getVectorNumElements() with getVectorElementCount()
- Moved the new tests into the existing sve-sext-zext.ll file


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587

Files:
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/test/CodeGen/AArch64/sve-sext-zext.ll

Index: llvm/test/CodeGen/AArch64/sve-sext-zext.ll
===
--- llvm/test/CodeGen/AArch64/sve-sext-zext.ll
+++ llvm/test/CodeGen/AArch64/sve-sext-zext.ll
@@ -186,3 +186,135 @@
   %r = zext  %a to 
   ret  %r
 }
+
+; Extending to illegal types
+
+define  @sext_b_to_h( %a) {
+; CHECK-LABEL: sext_b_to_h:
+; CHECK-DAG: sunpklo z2.h, z0.b
+; CHECK-DAG: sunpkhi z1.h, z0.b
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_h_to_s( %a) {
+; CHECK-LABEL: sext_h_to_s:
+; CHECK-DAG: sunpklo z2.s, z0.h
+; CHECK-DAG: sunpkhi z1.s, z0.h
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_s_to_d( %a) {
+; CHECK-LABEL: sext_s_to_d:
+; CHECK-DAG: sunpklo z2.d, z0.s
+; CHECK-DAG: sunpkhi z1.d, z0.s
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_s( %a) {
+; CHECK-LABEL: sext_b_to_s:
+; CHECK-DAG: sunpklo [[LO:z[0-9]+]].h, z0.b
+; CHECK-DAG: sunpkhi [[HI:z[0-9]+]].h, z0.b
+; CHECK-DAG: sunpklo [[LOLO:z[0-9]+]].s, [[LO]].h
+; CHECK-DAG: sunpkhi {{z[0-9]+}}.s, [[LO]].h
+; CHECK-DAG: sunpklo {{z[0-9]+}}.s, [[HI]].h
+; CHECK-DAG: sunpkhi {{z[0-9]+}}.s, [[HI]].h
+; CHECK-NOT: sxt
+; CHECK: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_d( %a) {
+; CHECK-DAG: sunpklo [[LO1:z[0-9]+]].h, z0.b
+; CHECK-DAG: sunpkhi [[HI1:z[0-9]+]].h, z0.b
+; CHECK-DAG: sunpklo [[LO2:z[0-9]+]].s, z1.h
+; CHECK-DAG: sunpkhi [[HI2:z[0-9]+]].s, z1.h
+; CHECK-DAG: sunpklo [[LO3:z[0-9]+]].s, z0.h
+; CHECK-DAG: sunpkhi [[HI3:z[0-9]+]].s, z0.h
+; CHECK-DAG: sunpklo z0.d, [[LO2]].s
+; CHECK-DAG: sunpkhi z1.d, [[LO2]].s
+; CHECK-DAG: sunpklo z2.d, [[HI2]].s
+; CHECK-DAG: sunpkhi z3.d, [[HI2]].s
+; CHECK-DAG: sunpklo z4.d, [[LO3]].s
+; CHECK-DAG: sunpkhi z5.d, [[LO3]].s
+; CHECK-DAG: sunpklo z6.d, [[HI3]].s
+; CHECK-DAG: sunpkhi z7.d, [[HI3]].s
+; CHECK-NOT: sxt
+; CHECK-NEXT: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_h( %a) {
+; CHECK-LABEL: zext_b_to_h:
+; CHECK-DAG: uunpklo z2.h, z0.b
+; CHECK-DAG: uunpkhi z1.h, z0.b
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_h_to_s( %a) {
+; CHECK-LABEL: zext_h_to_s:
+; CHECK-DAG: uunpklo z2.s, z0.h
+; CHECK-DAG: uunpkhi z1.s, z0.h
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_s_to_d( %a) {
+; CHECK-LABEL: zext_s_to_d:
+; CHECK-DAG: uunpklo z2.d, z0.s
+; CHECK-DAG: uunpkhi z1.d, z0.s
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_s( %a) {
+; CHECK-LABEL: zext_b_to_s:
+; CHECK-DAG: uunpklo [[LO:z[0-9]+]].h, z0.b
+; CHECK-DAG: uunpkhi [[HI:z[0-9]+]].h, z0.b
+; CHECK-DAG: uunpklo z0.s, [[LO]].h
+; CHECK-DAG: uunpkhi z1.s, [[LO]].h
+; CHECK-DAG: uunpklo z2.s, [[HI]].h
+; CHECK-DAG: uunpkhi z3.s, [[HI]].h
+; CHECK-NOT: and
+; CHECK: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_d( %a) {
+; CHECK-DAG: uunpklo [[LO1:z[0-9]+]].h, z0.b
+; CHECK-DAG: uunpkhi [[HI1:z[0-9]+]].h, z0.b
+; CHECK-DAG: uunpklo [[LO2:z[0-9]+]].s, z1.h
+; CHECK-DAG: uunpkhi [[HI2:z[0-9]+]].s, z1.h
+; CHECK-DAG: uunpklo [[LO3:z[0-9]+]].s, z0.h
+; CHECK-DAG: uunpkhi [[HI3:z[0-9]+]].s, z0.h
+; CHECK-DAG: uunpklo z0.d, [[LO2]].s
+; CHECK-DAG: uunpkhi z1.d, [[LO2]].s
+; CHECK-DAG: uunpklo z2.d, [[HI2]].s
+; CHECK-DAG: uunpkhi z3.d, [[HI2]].s
+; CHECK-DAG: uunpklo z4.d, [[LO3]].s
+; CHECK-DAG: uunpkhi z5.d, [[LO3]].s
+; CHECK-DAG: uunpklo z6.d, [[HI3]].s
+; CHECK-DAG: uunpkhi z7.d, [[HI3]].s
+; CHECK-NOT: and
+; CHECK-NEXT: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -888,6 +888,9 @@
 
   void ReplaceNodeResults(SDNode *N, SmallVectorImpl ,
   SelectionDAG ) const override;
+  void ReplaceExtractSubVectorResults(SDNode *N,
+  SmallVectorImpl ,
+  SelectionDAG ) const;
 
   bool shouldNormalizeToSelectSequence(LLVMContext &, EVT) const override;
 
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-05-29 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 267183.
kmclaughlin edited the summary of this revision.
kmclaughlin added a comment.

- Removed ReplaceExtensionResults and instead try to use extract_subvector as 
much as possible to legalise the result
- Added ReplaceExtractSubVectorResults, which replaces extract_subvector with 
unpack operations
- Changed performSignExtendInRegCombine to replace a sign extend of an unsigned 
unpack with a signed unpack
- Removed changes to unrelated test files


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79587/new/

https://reviews.llvm.org/D79587

Files:
  llvm/include/llvm/CodeGen/ValueTypes.h
  llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/test/CodeGen/AArch64/sve-ext.ll

Index: llvm/test/CodeGen/AArch64/sve-ext.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-ext.ll
@@ -0,0 +1,171 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; SEXT
+;
+
+define  @sext_b_to_h( %a) {
+; CHECK-LABEL: sext_b_to_h:
+; CHECK-DAG: sunpklo z2.h, z0.b
+; CHECK-DAG: sunpkhi z1.h, z0.b
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_h_to_s( %a) {
+; CHECK-LABEL: sext_h_to_s:
+; CHECK-DAG: sunpklo z2.s, z0.h
+; CHECK-DAG: sunpkhi z1.s, z0.h
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_s_to_d( %a) {
+; CHECK-LABEL: sext_s_to_d:
+; CHECK-DAG: sunpklo z2.d, z0.s
+; CHECK-DAG: sunpkhi z1.d, z0.s
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_s( %a) {
+; CHECK-LABEL: sext_b_to_s:
+; CHECK-DAG: sunpklo [[LO:z[0-9]+]].h, z0.b
+; CHECK-DAG: sunpkhi [[HI:z[0-9]+]].h, z0.b
+; CHECK-DAG: sunpklo [[LOLO:z[0-9]+]].s, [[LO]].h
+; CHECK-DAG: sunpkhi {{z[0-9]+}}.s, [[LO]].h
+; CHECK-DAG: sunpklo {{z[0-9]+}}.s, [[HI]].h
+; CHECK-DAG: sunpkhi {{z[0-9]+}}.s, [[HI]].h
+; CHECK-NOT: sxt
+; CHECK: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_d( %a) {
+; CHECK-DAG: sunpklo [[LO1:z[0-9]+]].h, z0.b
+; CHECK-DAG: sunpkhi [[HI1:z[0-9]+]].h, z0.b
+; CHECK-DAG: sunpklo [[LO2:z[0-9]+]].s, z1.h
+; CHECK-DAG: sunpkhi [[HI2:z[0-9]+]].s, z1.h
+; CHECK-DAG: sunpklo [[LO3:z[0-9]+]].s, z0.h
+; CHECK-DAG: sunpkhi [[HI3:z[0-9]+]].s, z0.h
+; CHECK-DAG: sunpklo z0.d, [[LO2]].s
+; CHECK-DAG: sunpkhi z1.d, [[LO2]].s
+; CHECK-DAG: sunpklo z2.d, [[HI2]].s
+; CHECK-DAG: sunpkhi z3.d, [[HI2]].s
+; CHECK-DAG: sunpklo z4.d, [[LO3]].s
+; CHECK-DAG: sunpkhi z5.d, [[LO3]].s
+; CHECK-DAG: sunpklo z6.d, [[HI3]].s
+; CHECK-DAG: sunpkhi z7.d, [[HI3]].s
+; CHECK-NOT: sxt
+; CHECK-NEXT: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_promote_b_to_s( %in) {
+; CHECK-LABEL: @sext_promote
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: sxtb z0.s, p0/m, z0.s
+; CHECK-NEXT: ret
+  %out = sext  %in to 
+  ret  %out
+}
+
+define  @sext_promote_h_to_d( %in) {
+; CHECK-LABEL: @sext_promote_h_to_d
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sxth z0.d, p0/m, z0.d
+; CHECK-NEXT: ret
+  %out = sext  %in to 
+  ret  %out
+}
+
+; ZEXT
+
+define  @zext_b_to_h( %a) {
+; CHECK-LABEL: zext_b_to_h:
+; CHECK-DAG: uunpklo z2.h, z0.b
+; CHECK-DAG: uunpkhi z1.h, z0.b
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_h_to_s( %a) {
+; CHECK-LABEL: zext_h_to_s:
+; CHECK-DAG: uunpklo z2.s, z0.h
+; CHECK-DAG: uunpkhi z1.s, z0.h
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_s_to_d( %a) {
+; CHECK-LABEL: zext_s_to_d:
+; CHECK-DAG: uunpklo z2.d, z0.s
+; CHECK-DAG: uunpkhi z1.d, z0.s
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_s( %a) {
+; CHECK-LABEL: zext_b_to_s:
+; CHECK-DAG: uunpklo [[LO:z[0-9]+]].h, z0.b
+; CHECK-DAG: uunpkhi [[HI:z[0-9]+]].h, z0.b
+; CHECK-DAG: uunpklo z0.s, [[LO]].h
+; CHECK-DAG: uunpkhi z1.s, [[LO]].h
+; CHECK-DAG: uunpklo z2.s, [[HI]].h
+; CHECK-DAG: uunpkhi z3.s, [[HI]].h
+; CHECK-NOT: and
+; CHECK: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_d( %a) {
+; CHECK-DAG: uunpklo [[LO1:z[0-9]+]].h, z0.b
+; CHECK-DAG: uunpkhi [[HI1:z[0-9]+]].h, z0.b
+; CHECK-DAG: uunpklo [[LO2:z[0-9]+]].s, z1.h
+; CHECK-DAG: uunpkhi [[HI2:z[0-9]+]].s, z1.h
+; CHECK-DAG: uunpklo [[LO3:z[0-9]+]].s, z0.h
+; CHECK-DAG: uunpkhi [[HI3:z[0-9]+]].s, z0.h
+; CHECK-DAG: uunpklo z0.d, [[LO2]].s
+; CHECK-DAG: uunpkhi z1.d, [[LO2]].s
+; CHECK-DAG: uunpklo z2.d, [[HI2]].s
+; CHECK-DAG: uunpkhi z3.d, [[HI2]].s
+; CHECK-DAG: uunpklo z4.d, [[LO3]].s
+; CHECK-DAG: uunpkhi z5.d, [[LO3]].s
+; CHECK-DAG: uunpklo z6.d, [[HI3]].s
+; CHECK-DAG: uunpkhi z7.d, [[HI3]].s
+; CHECK-NOT: and
+; CHECK-NEXT: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  

[PATCH] D79587: [CodeGen][SVE] Legalisation of extends with scalable types

2020-05-07 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, efriedma, david-arm.
Herald added subscribers: psnobl, rkruppe, hiraditya, tschuett.
Herald added a project: LLVM.

This patch adds legalisation of extensions where the operand
of the extend is a legal scalable type but the result is not.

In these cases we can try to use the [S|U]UNPK[HI|LO] operations
to extend each half individually and concatenate the result.

For example:

  zext  %a to 

should emit:

  uunpklo z2.h, z0.b
  uunpkhi z1.h, z0.b

Patch by Richard Sandiford


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D79587

Files:
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
  llvm/test/CodeGen/AArch64/sve-arith.ll
  llvm/test/CodeGen/AArch64/sve-ext.ll

Index: llvm/test/CodeGen/AArch64/sve-ext.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-ext.ll
@@ -0,0 +1,127 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; SEXT
+;
+
+define  @sext_b_to_h( %a) {
+; CHECK-LABEL: sext_b_to_h:
+; CHECK-DAG: sunpklo z2.h, z0.b
+; CHECK-DAG: sunpkhi z1.h, z0.b
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_h_to_s( %a) {
+; CHECK-LABEL: sext_h_to_s:
+; CHECK-DAG: sunpklo z2.s, z0.h
+; CHECK-DAG: sunpkhi z1.s, z0.h
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_s_to_d( %a) {
+; CHECK-LABEL: sext_s_to_d:
+; CHECK-DAG: sunpklo z2.d, z0.s
+; CHECK-DAG: sunpkhi z1.d, z0.s
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_b_to_s( %a) {
+; CHECK-LABEL: sext_b_to_s:
+; CHECK-DAG: sunpklo [[LO:z[0-9]+]].h, z0.b
+; CHECK-DAG: sunpkhi [[HI:z[0-9]+]].h, z0.b
+; CHECK-DAG: sunpklo [[LOLO:z[0-9]+]].s, [[LO]].h
+; CHECK-DAG: sunpkhi {{z[0-9]+}}.s, [[LO]].h
+; CHECK-DAG: sunpklo {{z[0-9]+}}.s, [[HI]].h
+; CHECK-DAG: sunpkhi {{z[0-9]+}}.s, [[HI]].h
+; CHECK: ret
+  %ext = sext  %a to 
+  ret  %ext
+}
+
+define  @sext_promote_b_to_s( %in) {
+; CHECK-LABEL: @sext_promote
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: sxtb z0.s, p0/m, z0.s
+; CHECK-NEXT: ret
+  %out = sext  %in to 
+  ret  %out
+}
+
+define  @sext_promote_h_to_d( %in) {
+; CHECK-LABEL: @sext_promote_h_to_d
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sxth z0.d, p0/m, z0.d
+; CHECK-NEXT: ret
+  %out = sext  %in to 
+  ret  %out
+}
+
+; ZEXT
+
+define  @zext_b_to_h( %a) {
+; CHECK-LABEL: zext_b_to_h:
+; CHECK-DAG: uunpklo z2.h, z0.b
+; CHECK-DAG: uunpkhi z1.h, z0.b
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_h_to_s( %a) {
+; CHECK-LABEL: zext_h_to_s:
+; CHECK-DAG: uunpklo z2.s, z0.h
+; CHECK-DAG: uunpkhi z1.s, z0.h
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_s_to_d( %a) {
+; CHECK-LABEL: zext_s_to_d:
+; CHECK-DAG: uunpklo z2.d, z0.s
+; CHECK-DAG: uunpkhi z1.d, z0.s
+; CHECK-DAG: mov z0.d, z2.d
+; CHECK-NEXT: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_b_to_s( %a) {
+; CHECK-LABEL: zext_b_to_s:
+; CHECK-DAG: uunpklo [[LO:z[0-9]+]].h, z0.b
+; CHECK-DAG: uunpkhi [[HI:z[0-9]+]].h, z0.b
+; CHECK-DAG: uunpklo z0.s, [[LO]].h
+; CHECK-DAG: uunpkhi z1.s, [[LO]].h
+; CHECK-DAG: uunpklo z2.s, [[HI]].h
+; CHECK-DAG: uunpkhi z3.s, [[HI]].h
+; CHECK: ret
+  %ext = zext  %a to 
+  ret  %ext
+}
+
+define  @zext_promote_b_to_s( %in) {
+; CHECK-LABEL: @zext_promote
+; CHECK-DAG: and z0.s, z0.s, #0xff
+; CHECK-NEXT: ret
+  %out = zext  %in to 
+  ret  %out
+}
+
+define  @zext_promote_h_to_d( %in) {
+; CHECK-LABEL: @zext_promote_h_to_d
+; CHECK-DAG: and z0.d, z0.d, #0x
+; CHECK-NEXT: ret
+  %out = zext  %in to 
+  ret  %out
+}
Index: llvm/test/CodeGen/AArch64/sve-arith.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-arith.ll
@@ -0,0 +1,608 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; SDIV
+;
+
+define  @sdiv_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: sdiv z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_i64( %a,  %b) {
+; CHECK-LABEL: @sdiv_i64
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sdiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_split_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_split_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: sdiv z0.s, p0/m, z0.s, z2.s
+; CHECK-DAG: sdiv z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_promote_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_promote_i32
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sxtw z1.d, p0/m, z1.d
+; CHECK-DAG: sxtw z0.d, p0/m, z0.d
+; CHECK-DAG: sdiv z0.d, p0/m, z0.d, z1.d
+; 

[PATCH] D79478: [CodeGen][SVE] Lowering of shift operations with scalable types

2020-05-07 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG3bcd3dd4734d: [CodeGen][SVE] Lowering of shift operations 
with scalable types (authored by kmclaughlin).

Changed prior to commit:
  https://reviews.llvm.org/D79478?vs=262333=262600#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79478/new/

https://reviews.llvm.org/D79478

Files:
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
  llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
@@ -533,6 +533,168 @@
   ret  %out
 }
 
+; ASR
+
+define  @asr_i8( %a) {
+; CHECK-LABEL: asr_i8:
+; CHECK: asr z0.b, z0.b, #8
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
+  %elt = insertelement  undef, i8 8, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.asr.nxv16i8( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @asr_i16( %a) {
+; CHECK-LABEL: asr_i16:
+; CHECK: asr z0.h, z0.h, #16
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
+  %elt = insertelement  undef, i16 16, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.asr.nxv8i16( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @asr_i32( %a) {
+; CHECK-LABEL: asr_i32:
+; CHECK: asr z0.s, z0.s, #32
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
+  %elt = insertelement  undef, i32 32, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.asr.nxv4i32( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @asr_i64( %a) {
+; CHECK-LABEL: asr_i64:
+; CHECK: asr z0.d, z0.d, #64
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
+  %elt = insertelement  undef, i64 64, i64 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.asr.nxv2i64( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+; LSL
+
+define  @lsl_i8( %a) {
+; CHECK-LABEL: lsl_i8:
+; CHECK: lsl z0.b, z0.b, #7
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
+  %elt = insertelement  undef, i8 7, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.lsl.nxv16i8( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @lsl_i16( %a) {
+; CHECK-LABEL: lsl_i16:
+; CHECK: lsl z0.h, z0.h, #15
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
+  %elt = insertelement  undef, i16 15, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.lsl.nxv8i16( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @lsl_i32( %a) {
+; CHECK-LABEL: lsl_i32:
+; CHECK: lsl z0.s, z0.s, #31
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
+  %elt = insertelement  undef, i32 31, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.lsl.nxv4i32( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @lsl_i64( %a) {
+; CHECK-LABEL: lsl_i64:
+; CHECK: lsl z0.d, z0.d, #63
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
+  %elt = insertelement  undef, i64 63, i64 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.lsl.nxv2i64( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+; LSR
+
+define  @lsr_i8( %a) {
+; CHECK-LABEL: lsr_i8:
+; CHECK: lsr z0.b, z0.b, #8
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)

[PATCH] D78812: [SVE][CodeGen] Fix legalisation for scalable types

2020-05-07 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGa31f4c52bf85: [SVE][CodeGen] Fix legalisation for scalable 
types (authored by kmclaughlin).

Changed prior to commit:
  https://reviews.llvm.org/D78812?vs=260603=262580#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78812/new/

https://reviews.llvm.org/D78812

Files:
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

Index: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
===
--- llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
+++ llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
@@ -22,6 +22,37 @@
   ret  %div
 }
 
+define  @sdiv_split_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_split_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: sdiv z0.s, p0/m, z0.s, z2.s
+; CHECK-DAG: sdiv z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_widen_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_widen_i32
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sxtw z1.d, p0/m, z1.d
+; CHECK-DAG: sxtw z0.d, p0/m, z0.d
+; CHECK-DAG: sdiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_split_i64( %a,  %b) {
+; CHECK-LABEL: @sdiv_split_i64
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sdiv z0.d, p0/m, z0.d, z2.d
+; CHECK-DAG: sdiv z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
 ;
 ; UDIV
 ;
@@ -44,6 +75,37 @@
   ret  %div
 }
 
+define  @udiv_split_i32( %a,  %b) {
+; CHECK-LABEL: @udiv_split_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: udiv z0.s, p0/m, z0.s, z2.s
+; CHECK-DAG: udiv z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
+define  @udiv_widen_i32( %a,  %b) {
+; CHECK-LABEL: @udiv_widen_i32
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: and z1.d, z1.d, #0x
+; CHECK-DAG: and z0.d, z0.d, #0x
+; CHECK-DAG: udiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
+define  @udiv_split_i64( %a,  %b) {
+; CHECK-LABEL: @udiv_split_i64
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: udiv z0.d, p0/m, z0.d, z2.d
+; CHECK-DAG: udiv z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
 ;
 ; SMIN
 ;
Index: llvm/lib/CodeGen/TargetLoweringBase.cpp
===
--- llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1392,7 +1392,7 @@
 EVT ,
 unsigned ,
 MVT ) const {
-  unsigned NumElts = VT.getVectorNumElements();
+  ElementCount EltCnt = VT.getVectorElementCount();
 
   // If there is a wider vector type with the same element type as this one,
   // or a promoted vector type that has the same number of elements which
@@ -1400,7 +1400,7 @@
   // This handles things like <2 x float> -> <4 x float> and
   // <4 x i1> -> <4 x i32>.
   LegalizeTypeAction TA = getTypeAction(Context, VT);
-  if (NumElts != 1 && (TA == TypeWidenVector || TA == TypePromoteInteger)) {
+  if (EltCnt.Min != 1 && (TA == TypeWidenVector || TA == TypePromoteInteger)) {
 EVT RegisterEVT = getTypeToTransformTo(Context, VT);
 if (isTypeLegal(RegisterEVT)) {
   IntermediateVT = RegisterEVT;
@@ -1417,22 +1417,22 @@
 
   // FIXME: We don't support non-power-of-2-sized vectors for now.  Ideally we
   // could break down into LHS/RHS like LegalizeDAG does.
-  if (!isPowerOf2_32(NumElts)) {
-NumVectorRegs = NumElts;
-NumElts = 1;
+  if (!isPowerOf2_32(EltCnt.Min)) {
+NumVectorRegs = EltCnt.Min;
+EltCnt.Min = 1;
   }
 
   // Divide the input until we get to a supported size.  This will always
   // end with a scalar if the target doesn't support vectors.
-  while (NumElts > 1 && !isTypeLegal(
-   EVT::getVectorVT(Context, EltTy, NumElts))) {
-NumElts >>= 1;
+  while (EltCnt.Min > 1 &&
+ !isTypeLegal(EVT::getVectorVT(Context, EltTy, EltCnt))) {
+EltCnt.Min >>= 1;
 NumVectorRegs <<= 1;
   }
 
   NumIntermediates = NumVectorRegs;
 
-  EVT NewVT = EVT::getVectorVT(Context, EltTy, NumElts);
+  EVT NewVT = EVT::getVectorVT(Context, EltTy, EltCnt);
   if (!isTypeLegal(NewVT))
 NewVT = EltTy;
   IntermediateVT = NewVT;
Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
===
--- llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -731,10 +731,10 @@
 IntermediateVT.getVectorNumElements() : 1;
 
   // Convert the vector to the appropriate type if necessary.
-  unsigned DestVectorNoElts = NumIntermediates * IntermediateNumElts;
-
+  auto DestEltCnt = 

[PATCH] D79478: [CodeGen][SVE] Lowering of shift operations with scalable types

2020-05-06 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, efriedma, ctetreau, huihuiz.
Herald added subscribers: psnobl, rkruppe, hiraditya, kristof.beyls, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.
kmclaughlin added a parent revision: D78812: [SVE][CodeGen] Fix legalisation 
for scalable types.

Adds AArch64ISD nodes for:

- SHL_PRED (logical shift left)
- SHR_PRED (logical shift right)
- SRA_PRED (arithmetic shift right)

Existing patterns for unpredicated left shift by immediate
have also been moved into the appropriate multiclasses
in SVEInstrFormats.td.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D79478

Files:
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
  llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
@@ -533,6 +533,168 @@
   ret  %out
 }
 
+; ASR
+
+define  @asr_i8( %a) {
+; CHECK-LABEL: asr_i8:
+; CHECK: asr z0.b, z0.b, #8
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
+  %elt = insertelement  undef, i8 8, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.asr.nxv16i8( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @asr_i16( %a) {
+; CHECK-LABEL: asr_i16:
+; CHECK: asr z0.h, z0.h, #16
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
+  %elt = insertelement  undef, i16 16, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.asr.nxv8i16( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @asr_i32( %a) {
+; CHECK-LABEL: asr_i32:
+; CHECK: asr z0.s, z0.s, #32
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
+  %elt = insertelement  undef, i32 32, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.asr.nxv4i32( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @asr_i64( %a) {
+; CHECK-LABEL: asr_i64:
+; CHECK: asr z0.d, z0.d, #64
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
+  %elt = insertelement  undef, i64 64, i64 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.asr.nxv2i64( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+; LSL
+
+define  @lsl_i8( %a) {
+; CHECK-LABEL: lsl_i8:
+; CHECK: lsl z0.b, z0.b, #7
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
+  %elt = insertelement  undef, i8 7, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.lsl.nxv16i8( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @lsl_i16( %a) {
+; CHECK-LABEL: lsl_i16:
+; CHECK: lsl z0.h, z0.h, #15
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
+  %elt = insertelement  undef, i16 15, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.lsl.nxv8i16( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @lsl_i32( %a) {
+; CHECK-LABEL: lsl_i32:
+; CHECK: lsl z0.s, z0.s, #31
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
+  %elt = insertelement  undef, i32 31, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.lsl.nxv4i32( %pg,
+%a,
+%splat)
+  ret  %out
+}
+
+define  @lsl_i64( %a) {
+; CHECK-LABEL: lsl_i64:
+; CHECK: lsl z0.d, z0.d, #63
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
+  %elt = insertelement  undef, i64 63, i64 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.lsl.nxv2i64( %pg,
+  

[PATCH] D79087: [SVE][Codegen] Lower legal min & max operations

2020-05-04 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG19f5da9c1d69: [SVE][Codegen] Lower legal min  max 
operations (authored by kmclaughlin).

Changed prior to commit:
  https://reviews.llvm.org/D79087?vs=261462=261787#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79087/new/

https://reviews.llvm.org/D79087

Files:
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
@@ -1,5 +1,221 @@
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
 
+; SMAX
+
+define  @smax_i8( %a) {
+; CHECK-LABEL: smax_i8:
+; CHECK: smax z0.b, z0.b, #-128
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
+  %elt = insertelement  undef, i8 -128, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smax.nxv16i8( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smax_i16( %a) {
+; CHECK-LABEL: smax_i16:
+; CHECK: smax z0.h, z0.h, #127
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
+  %elt = insertelement  undef, i16 127, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smax.nxv8i16( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smax_i32( %a) {
+; CHECK-LABEL: smax_i32:
+; CHECK: smax z0.s, z0.s, #-128
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
+  %elt = insertelement  undef, i32 -128, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smax.nxv4i32( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smax_i64( %a) {
+; CHECK-LABEL: smax_i64:
+; CHECK: smax z0.d, z0.d, #127
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
+  %elt = insertelement  undef, i64 127, i64 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smax.nxv2i64( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+; SMIN
+
+define  @smin_i8( %a) {
+; CHECK-LABEL: smin_i8:
+; CHECK: smin z0.b, z0.b, #127
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
+  %elt = insertelement  undef, i8 127, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smin.nxv16i8( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smin_i16( %a) {
+; CHECK-LABEL: smin_i16:
+; CHECK: smin z0.h, z0.h, #-128
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
+  %elt = insertelement  undef, i16 -128, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smin.nxv8i16( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smin_i32( %a) {
+; CHECK-LABEL: smin_i32:
+; CHECK: smin z0.s, z0.s, #127
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
+  %elt = insertelement  undef, i32 127, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smin.nxv4i32( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smin_i64( %a) {
+; CHECK-LABEL: smin_i64:
+; CHECK: smin z0.d, z0.d, #-128
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
+  %elt = insertelement  undef, i64 -128, i64 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smin.nxv2i64( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+; UMAX
+
+define  @umax_i8( %a) {
+; 

[PATCH] D79087: [SVE][Codegen] Lower legal min & max operations

2020-05-01 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin marked an inline comment as done.
kmclaughlin added inline comments.



Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:3851
+  def : SVE_1_Op_Imm_Arith_Pred_Pat(NAME # _S)>;
+  def : SVE_1_Op_Imm_Arith_Pred_Pat(NAME # _D)>;
 }

efriedma wrote:
> kmclaughlin wrote:
> > efriedma wrote:
> > > I don't see any test for this part of the patch?
> > The tests for these patterns are in `sve-int-arith-imm.ll`, which was added 
> > as part of D71779
> I'd also like to see tests for the intrinsic where the second operand is an 
> immediate, since we can pattern-match that to the immdiate smax now.
I see what you mean now, I've added the intrinsic tests to 
sve-intrinsics-int-arith-imm.ll


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79087/new/

https://reviews.llvm.org/D79087



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79087: [SVE][Codegen] Lower legal min & max operations

2020-05-01 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 261462.
kmclaughlin added a comment.

- Added tests for the intrinsics where the second operand is an immediate
- Changed the range SelectSVESignedArithImm checks for, as the range for the 
immediates of smin & smax is -128 to +127 (inclusive)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79087/new/

https://reviews.llvm.org/D79087

Files:
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
@@ -1,5 +1,221 @@
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
 
+; SMAX
+
+define  @smax_i8( %a) {
+; CHECK-LABEL: smax_i8:
+; CHECK: smax z0.b, z0.b, #-128
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
+  %elt = insertelement  undef, i8 -128, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smax.nxv16i8( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smax_i16( %a) {
+; CHECK-LABEL: smax_i16:
+; CHECK: smax z0.h, z0.h, #127
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
+  %elt = insertelement  undef, i16 127, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smax.nxv8i16( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smax_i32( %a) {
+; CHECK-LABEL: smax_i32:
+; CHECK: smax z0.s, z0.s, #-128
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
+  %elt = insertelement  undef, i32 -128, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smax.nxv4i32( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smax_i64( %a) {
+; CHECK-LABEL: smax_i64:
+; CHECK: smax z0.d, z0.d, #127
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
+  %elt = insertelement  undef, i64 127, i64 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smax.nxv2i64( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+; SMIN
+
+define  @smin_i8( %a) {
+; CHECK-LABEL: smin_i8:
+; CHECK: smin z0.b, z0.b, #127
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
+  %elt = insertelement  undef, i8 127, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smin.nxv16i8( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smin_i16( %a) {
+; CHECK-LABEL: smin_i16:
+; CHECK: smin z0.h, z0.h, #-128
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
+  %elt = insertelement  undef, i16 -128, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smin.nxv8i16( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smin_i32( %a) {
+; CHECK-LABEL: smin_i32:
+; CHECK: smin z0.s, z0.s, #127
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
+  %elt = insertelement  undef, i32 127, i32 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smin.nxv4i32( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+define  @smin_i64( %a) {
+; CHECK-LABEL: smin_i64:
+; CHECK: smin z0.d, z0.d, #-128
+; CHECK-NEXT: ret
+  %pg = call  @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
+  %elt = insertelement  undef, i64 -128, i64 0
+  %splat = shufflevector  %elt,  undef,  zeroinitializer
+  %out = call  @llvm.aarch64.sve.smin.nxv2i64( %pg,
+ %a,
+ %splat)
+  ret  %out
+}
+
+; UMAX
+
+define  @umax_i8( %a) {
+; CHECK-LABEL: umax_i8:

[PATCH] D79087: [SVE][Codegen] Lower legal min & max operations

2020-04-30 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin marked an inline comment as done.
kmclaughlin added inline comments.



Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:3851
+  def : SVE_1_Op_Imm_Arith_Pred_Pat(NAME # _S)>;
+  def : SVE_1_Op_Imm_Arith_Pred_Pat(NAME # _D)>;
 }

efriedma wrote:
> I don't see any test for this part of the patch?
The tests for these patterns are in `sve-int-arith-imm.ll`, which was added as 
part of D71779


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79087/new/

https://reviews.llvm.org/D79087



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79167: [SVE][CodeGen] Legalise scalable vector types for vsetcc & vselect

2020-04-30 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, efriedma, dancgr.
Herald added subscribers: psnobl, rkruppe, hiraditya, tschuett.
Herald added a project: LLVM.
kmclaughlin added a parent revision: D79087: [SVE][Codegen] Lower legal min & 
max operations.

The visitSelect function in SelectionDAGBuilder calls
getTypeConversion to get the type of the operation
after it has been legalised.

This patch changes getTypeConversion to use
ElementCount where necessary to ensure that the
Scalable flag is not dropped.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D79167

Files:
  llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

Index: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
===
--- llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
+++ llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
@@ -150,6 +150,88 @@
   ret  %min
 }
 
+define  @smin_split_i8( %a,  %b,  %c) {
+; CHECK-LABEL: @smin_split_i8
+; CHECK-DAG: ptrue p0.b
+; CHECK-DAG: smin z0.b, p0/m, z0.b, z2.b
+; CHECK-DAG: smin z1.b, p0/m, z1.b, z3.b
+; CHECK-NEXT: ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i16( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i16:
+; CHECK-DAG: ptrue p0.h
+; CHECK-DAG: smin z0.h, p0/m, z0.h, z4.h
+; CHECK-DAG: smin z1.h, p0/m, z1.h, z5.h
+; CHECK-DAG: smin z2.h, p0/m, z2.h, z6.h
+; CHECK-DAG: smin z3.h, p0/m, z3.h, z7.h
+; CHECK-NEXT: ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i32:
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: smin z0.s, p0/m, z0.s, z2.s
+; CHECK-DAG: smin z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT: ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_split_i64( %a,  %b,  %c) {
+; CHECK-LABEL: smin_split_i64:
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: smin z0.d, p0/m, z0.d, z2.d
+; CHECK-DAG: smin z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT: ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i8( %a,  %b,  %c) {
+; CHECK-LABEL: @smin_promote_i8
+; CHECK-DAG: ptrue p0.h
+; CHECK-DAG: sxtb z1.h, p0/m, z1.h
+; CHECK-DAG: sxtb z0.h, p0/m, z0.h
+; CHECK-DAG: smin z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT: ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i16( %a,  %b,  %c) {
+; CHECK-LABEL: @smin_promote_i16
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: sxth z1.s, p0/m, z1.s
+; CHECK-DAG: sxth z0.s, p0/m, z0.s
+; CHECK-DAG: smin z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_promote_i32( %a,  %b,  %c) {
+; CHECK-LABEL: @smin_promote_i32
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sxtw z1.d, p0/m, z1.d
+; CHECK-DAG: sxtw z0.d, p0/m, z0.d
+; CHECK-DAG: smin z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
 ;
 ; UMIN
 ;
@@ -194,6 +276,27 @@
   ret  %min
 }
 
+define  @umin_split_i64( %a,  %b,  %c) {
+; CHECK-LABEL: umin_split_i64:
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: umin z0.d, p0/m, z0.d, z2.d
+; CHECK-DAG: umin z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT: ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @umin_promote_i8( %a,  %b,  %c) {
+; CHECK-LABEL: @umin_promote_i8
+; CHECK-DAG: ptrue p0.h
+; CHECK-DAG: umin z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT: ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
 ;
 ; SMAX
 ;
@@ -204,8 +307,8 @@
 ; CHECK-DAG: smax z0.b, p0/m, z0.b, z1.b
 ; CHECK-NEXT: ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i16( %a,  %b,  %c) {
@@ -214,8 +317,8 @@
 ; CHECK-DAG: smax z0.h, p0/m, z0.h, z1.h
 ; CHECK-NEXT: ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i32( %a,  %b,  %c) {
@@ -224,8 +327,8 @@
 ; CHECK-DAG: smax z0.s, p0/m, z0.s, z1.s
 ; CHECK-NEXT: ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
 }
 
 define  @smax_i64( %a,  %b,  %c) {
@@ -234,8 +337,29 @@
 ; CHECK-DAG: smax z0.d, p0/m, z0.d, z1.d
 ; CHECK-NEXT: ret
   %cmp = icmp sgt  %a, %b
-  %min = select  %cmp,  %a,  %b
-  ret  %min
+  %max = select  %cmp,  %a,  %b
+  ret  %max
+}
+
+define  @smax_split_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smax_split_i32:
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: smax z0.s, p0/m, z0.s, z2.s
+; CHECK-DAG: smax z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT: ret
+  %cmp = icmp sgt  %a, %b
+  %max = select  %cmp,  %a,  %b
+  ret  %max
+}
+
+define  

[PATCH] D79087: [SVE][Codegen] Lower legal min & max operations

2020-04-29 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, efriedma, dancgr.
Herald added subscribers: psnobl, rkruppe, hiraditya, kristof.beyls, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.

This patch adds AArch64ISD nodes for [S|U]MIN_PRED
and [S|U]MAX_PRED, and lowers both SVE intrinsics and
IR operations for min and max to these nodes.

There are two forms of these instructions for SVE: a predicated
form and an immediate (unpredicated) form. The patterns
which existed for the latter have been updated to match a
predicated node with an immediate and map this
to the immediate instruction.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D79087

Files:
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

Index: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
===
--- llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
+++ llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
@@ -32,8 +32,8 @@
   ret  %div
 }
 
-define  @sdiv_widen_i32( %a,  %b) {
-; CHECK-LABEL: @sdiv_widen_i32
+define  @sdiv_promote_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_promote_i32
 ; CHECK-DAG: ptrue p0.d
 ; CHECK-DAG: sxtw z1.d, p0/m, z1.d
 ; CHECK-DAG: sxtw z0.d, p0/m, z0.d
@@ -85,8 +85,8 @@
   ret  %div
 }
 
-define  @udiv_widen_i32( %a,  %b) {
-; CHECK-LABEL: @udiv_widen_i32
+define  @udiv_promote_i32( %a,  %b) {
+; CHECK-LABEL: @udiv_promote_i32
 ; CHECK-DAG: ptrue p0.d
 ; CHECK-DAG: and z1.d, z1.d, #0x
 ; CHECK-DAG: and z0.d, z0.d, #0x
@@ -105,3 +105,179 @@
   %div = udiv  %a, %b
   ret  %div
 }
+
+;
+; SMIN
+;
+
+define  @smin_i8( %a,  %b,  %c) {
+; CHECK-LABEL: @smin_i8
+; CHECK-DAG: ptrue p0.b
+; CHECK-DAG: smin z0.b, p0/m, z0.b, z1.b
+; CHECK-NEXT: ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_i16( %a,  %b,  %c) {
+; CHECK-LABEL: @smin_i16
+; CHECK-DAG: ptrue p0.h
+; CHECK-DAG: smin z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT: ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smin_i32:
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: smin z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smin_i64( %a,  %b,  %c) {
+; CHECK-LABEL: smin_i64:
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: smin z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %cmp = icmp slt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+;
+; UMIN
+;
+
+define  @umin_i8( %a,  %b,  %c) {
+; CHECK-LABEL: @umin_i8
+; CHECK-DAG: ptrue p0.b
+; CHECK-DAG: umin z0.b, p0/m, z0.b, z1.b
+; CHECK-NEXT: ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @umin_i16( %a,  %b,  %c) {
+; CHECK-LABEL: @umin_i16
+; CHECK-DAG: ptrue p0.h
+; CHECK-DAG: umin z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT: ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @umin_i32( %a,  %b,  %c) {
+; CHECK-LABEL: umin_i32:
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: umin z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @umin_i64( %a,  %b,  %c) {
+; CHECK-LABEL: umin_i64:
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: umin z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %cmp = icmp ult  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+;
+; SMAX
+;
+
+define  @smax_i8( %a,  %b,  %c) {
+; CHECK-LABEL: @smax_i8
+; CHECK-DAG: ptrue p0.b
+; CHECK-DAG: smax z0.b, p0/m, z0.b, z1.b
+; CHECK-NEXT: ret
+  %cmp = icmp sgt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smax_i16( %a,  %b,  %c) {
+; CHECK-LABEL: @smax_i16
+; CHECK-DAG: ptrue p0.h
+; CHECK-DAG: smax z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT: ret
+  %cmp = icmp sgt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smax_i32( %a,  %b,  %c) {
+; CHECK-LABEL: smax_i32:
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: smax z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: ret
+  %cmp = icmp sgt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @smax_i64( %a,  %b,  %c) {
+; CHECK-LABEL: smax_i64:
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: smax z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %cmp = icmp sgt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+;
+; UMAX
+;
+
+define  @umax_i8( %a,  %b,  %c) {
+; CHECK-LABEL: @umax_i8
+; CHECK-DAG: ptrue p0.b
+; CHECK-DAG: umax z0.b, p0/m, z0.b, z1.b
+; CHECK-NEXT: ret
+  %cmp = icmp ugt  %a, %b
+  %min = select  %cmp,  %a,  %b
+  ret  %min
+}
+
+define  @umax_i16( %a,  %b,  %c) {
+; CHECK-LABEL: @umax_i16
+; CHECK-DAG: ptrue p0.h
+; CHECK-DAG: umax z0.h, p0/m, z0.h, z1.h
+; CHECK-NEXT: ret
+  %cmp = icmp ugt  %a, %b

[PATCH] D78812: [SVE][CodeGen] Fix legalisation for scalable types

2020-04-28 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin marked an inline comment as done.
kmclaughlin added inline comments.



Comment at: llvm/lib/CodeGen/TargetLoweringBase.cpp:1429
+ !isTypeLegal(EVT::getVectorVT(Context, EltTy, EltCnt))) {
+EltCnt.Min >>= 1;
 NumVectorRegs <<= 1;

I will create a separate patch to clean this up a bit by adding an overloaded 
`operator>>`


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78812/new/

https://reviews.llvm.org/D78812



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78812: [SVE][CodeGen] Fix legalisation for scalable types

2020-04-28 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin marked 2 inline comments as done.
kmclaughlin added inline comments.



Comment at: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll:107
+  ret  %div
+}

efriedma wrote:
> Maybe also worth adding a testcase for ``, assuming that 
> doesn't expose anything really tricky.
Promotion is not possible for `` since this would result in a 
``, which is also illegal. Instead will need to widen a type 
such as ``, which needs some more work. For fixed width 
vectors, the compiler will scalarise cases such as this (e.g. from 1 x i32 to 
just i32), which isn't something we can do for scalable vectors because of the 
runtime scaling.
This has never had much priority because in practice the vectoriser won't pick 
a VF of 1, so I think we can add support for this at a later point. Currently, 
tests which use types such as this will trigger the assert added to 
FoldBUILD_VECTOR in D78636.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78812/new/

https://reviews.llvm.org/D78812



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78812: [SVE][CodeGen] Fix legalisation for scalable types

2020-04-28 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 260603.
kmclaughlin added a comment.

- Use ElementCount with getVectorVT


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78812/new/

https://reviews.llvm.org/D78812

Files:
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

Index: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
===
--- llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
+++ llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
@@ -22,6 +22,37 @@
   ret  %div
 }
 
+define  @sdiv_split_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_split_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: sdiv z0.s, p0/m, z0.s, z2.s
+; CHECK-DAG: sdiv z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_widen_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_widen_i32
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sxtw z1.d, p0/m, z1.d
+; CHECK-DAG: sxtw z0.d, p0/m, z0.d
+; CHECK-DAG: sdiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_split_i64( %a,  %b) {
+; CHECK-LABEL: @sdiv_split_i64
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sdiv z0.d, p0/m, z0.d, z2.d
+; CHECK-DAG: sdiv z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
 ;
 ; UDIV
 ;
@@ -43,3 +74,34 @@
   %div = udiv  %a, %b
   ret  %div
 }
+
+define  @udiv_split_i32( %a,  %b) {
+; CHECK-LABEL: @udiv_split_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: udiv z0.s, p0/m, z0.s, z2.s
+; CHECK-DAG: udiv z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
+define  @udiv_widen_i32( %a,  %b) {
+; CHECK-LABEL: @udiv_widen_i32
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: and z1.d, z1.d, #0x
+; CHECK-DAG: and z0.d, z0.d, #0x
+; CHECK-DAG: udiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
+define  @udiv_split_i64( %a,  %b) {
+; CHECK-LABEL: @udiv_split_i64
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: udiv z0.d, p0/m, z0.d, z2.d
+; CHECK-DAG: udiv z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
Index: llvm/lib/CodeGen/TargetLoweringBase.cpp
===
--- llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1392,7 +1392,7 @@
 EVT ,
 unsigned ,
 MVT ) const {
-  unsigned NumElts = VT.getVectorNumElements();
+  ElementCount EltCnt = VT.getVectorElementCount();
 
   // If there is a wider vector type with the same element type as this one,
   // or a promoted vector type that has the same number of elements which
@@ -1400,7 +1400,7 @@
   // This handles things like <2 x float> -> <4 x float> and
   // <4 x i1> -> <4 x i32>.
   LegalizeTypeAction TA = getTypeAction(Context, VT);
-  if (NumElts != 1 && (TA == TypeWidenVector || TA == TypePromoteInteger)) {
+  if (EltCnt.Min != 1 && (TA == TypeWidenVector || TA == TypePromoteInteger)) {
 EVT RegisterEVT = getTypeToTransformTo(Context, VT);
 if (isTypeLegal(RegisterEVT)) {
   IntermediateVT = RegisterEVT;
@@ -1417,22 +1417,22 @@
 
   // FIXME: We don't support non-power-of-2-sized vectors for now.  Ideally we
   // could break down into LHS/RHS like LegalizeDAG does.
-  if (!isPowerOf2_32(NumElts)) {
-NumVectorRegs = NumElts;
-NumElts = 1;
+  if (!isPowerOf2_32(EltCnt.Min)) {
+NumVectorRegs = EltCnt.Min;
+EltCnt.Min = 1;
   }
 
   // Divide the input until we get to a supported size.  This will always
   // end with a scalar if the target doesn't support vectors.
-  while (NumElts > 1 && !isTypeLegal(
-   EVT::getVectorVT(Context, EltTy, NumElts))) {
-NumElts >>= 1;
+  while (EltCnt.Min > 1 &&
+ !isTypeLegal(EVT::getVectorVT(Context, EltTy, EltCnt))) {
+EltCnt.Min >>= 1;
 NumVectorRegs <<= 1;
   }
 
   NumIntermediates = NumVectorRegs;
 
-  EVT NewVT = EVT::getVectorVT(Context, EltTy, NumElts);
+  EVT NewVT = EVT::getVectorVT(Context, EltTy, EltCnt);
   if (!isTypeLegal(NewVT))
 NewVT = EltTy;
   IntermediateVT = NewVT;
Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
===
--- llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -731,10 +731,10 @@
 IntermediateVT.getVectorNumElements() : 1;
 
   // Convert the vector to the appropriate type if necessary.
-  unsigned DestVectorNoElts = NumIntermediates * IntermediateNumElts;
-
+  auto DestEltCnt = ElementCount(NumIntermediates * IntermediateNumElts,
+ ValueVT.isScalableVector());
   EVT BuiltVectorTy = EVT::getVectorVT(
-  *DAG.getContext(), 

[PATCH] D78812: [SVE][CodeGen] Fix legalisation for scalable types

2020-04-24 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, efriedma, huntergr.
Herald added subscribers: psnobl, rkruppe, hiraditya, tschuett.
Herald added a project: LLVM.

This patch handles illegal scalable types when lowering IR operations,
addressing several places where the value of isScalableVector() is
ignored.

For types such as , this means splitting the
operations. In this example, we would split it into two
operations of type  for the low and high halves.

In cases such as , the elements in the vector
will be promoted. In this case they will be promoted to
i64 (with a vector of type )


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D78812

Files:
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

Index: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
===
--- llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
+++ llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
@@ -22,6 +22,37 @@
   ret  %div
 }
 
+define  @sdiv_split_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_split_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: sdiv z0.s, p0/m, z0.s, z2.s
+; CHECK-DAG: sdiv z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_widen_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_widen_i32
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sxtw z1.d, p0/m, z1.d
+; CHECK-DAG: sxtw z0.d, p0/m, z0.d
+; CHECK-DAG: sdiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_split_i64( %a,  %b) {
+; CHECK-LABEL: @sdiv_split_i64
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sdiv z0.d, p0/m, z0.d, z2.d
+; CHECK-DAG: sdiv z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
 ;
 ; UDIV
 ;
@@ -43,3 +74,34 @@
   %div = udiv  %a, %b
   ret  %div
 }
+
+define  @udiv_split_i32( %a,  %b) {
+; CHECK-LABEL: @udiv_split_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: udiv z0.s, p0/m, z0.s, z2.s
+; CHECK-DAG: udiv z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
+define  @udiv_widen_i32( %a,  %b) {
+; CHECK-LABEL: @udiv_widen_i32
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: and z1.d, z1.d, #0x
+; CHECK-DAG: and z0.d, z0.d, #0x
+; CHECK-DAG: udiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
+define  @udiv_split_i64( %a,  %b) {
+; CHECK-LABEL: @udiv_split_i64
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: udiv z0.d, p0/m, z0.d, z2.d
+; CHECK-DAG: udiv z1.d, p0/m, z1.d, z3.d
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
Index: llvm/lib/CodeGen/TargetLoweringBase.cpp
===
--- llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1393,6 +1393,7 @@
 unsigned ,
 MVT ) const {
   unsigned NumElts = VT.getVectorNumElements();
+  bool IsScalable = VT.isScalableVector();
 
   // If there is a wider vector type with the same element type as this one,
   // or a promoted vector type that has the same number of elements which
@@ -1424,15 +1425,15 @@
 
   // Divide the input until we get to a supported size.  This will always
   // end with a scalar if the target doesn't support vectors.
-  while (NumElts > 1 && !isTypeLegal(
-   EVT::getVectorVT(Context, EltTy, NumElts))) {
+  while (NumElts > 1 &&
+ !isTypeLegal(EVT::getVectorVT(Context, EltTy, NumElts, IsScalable))) {
 NumElts >>= 1;
 NumVectorRegs <<= 1;
   }
 
   NumIntermediates = NumVectorRegs;
 
-  EVT NewVT = EVT::getVectorVT(Context, EltTy, NumElts);
+  EVT NewVT = EVT::getVectorVT(Context, EltTy, NumElts, IsScalable);
   if (!isTypeLegal(NewVT))
 NewVT = EltTy;
   IntermediateVT = NewVT;
Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
===
--- llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -734,7 +734,8 @@
   unsigned DestVectorNoElts = NumIntermediates * IntermediateNumElts;
 
   EVT BuiltVectorTy = EVT::getVectorVT(
-  *DAG.getContext(), IntermediateVT.getScalarType(), DestVectorNoElts);
+  *DAG.getContext(), IntermediateVT.getScalarType(), DestVectorNoElts,
+  ValueVT.isScalableVector());
   if (ValueVT != BuiltVectorTy) {
 if (SDValue Widened = widenVectorToPartType(DAG, Val, DL, BuiltVectorTy))
   Val = Widened;
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78569: [SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics

2020-04-24 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added a comment.

Thank you both for your comments on this patch, @efriedma & @sdesmalen!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78569/new/

https://reviews.llvm.org/D78569



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78569: [SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics

2020-04-24 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
kmclaughlin marked an inline comment as done.
Closed by commit rG53dd72a87aeb: [SVE][CodeGen] Lower SDIV  UDIV to SVE 
intrinsics (authored by kmclaughlin).

Changed prior to commit:
  https://reviews.llvm.org/D78569?vs=259610=259852#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78569/new/

https://reviews.llvm.org/D78569

Files:
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

Index: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
@@ -0,0 +1,45 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; SDIV
+;
+
+define  @sdiv_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: sdiv z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_i64( %a,  %b) {
+; CHECK-LABEL: @sdiv_i64
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sdiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+;
+; UDIV
+;
+
+define  @udiv_i32( %a,  %b) {
+; CHECK-LABEL: @udiv_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: udiv z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
+define  @udiv_i64( %a,  %b) {
+; CHECK-LABEL: @udiv_i64
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: udiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
Index: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
===
--- llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -145,6 +145,14 @@
 def AArch64lasta   :  SDNode<"AArch64ISD::LASTA", SDT_AArch64Reduce>;
 def AArch64lastb   :  SDNode<"AArch64ISD::LASTB", SDT_AArch64Reduce>;
 
+def SDT_AArch64DIV : SDTypeProfile<1, 3, [
+  SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisVec<3>,
+  SDTCVecEltisVT<1,i1>, SDTCisSameAs<2,3>
+]>;
+
+def AArch64sdiv_pred  :  SDNode<"AArch64ISD::SDIV_PRED", SDT_AArch64DIV>;
+def AArch64udiv_pred  :  SDNode<"AArch64ISD::UDIV_PRED", SDT_AArch64DIV>;
+
 def SDT_AArch64ReduceWithInit : SDTypeProfile<1, 3, [SDTCisVec<1>, SDTCisVec<3>]>;
 def AArch64clasta_n   : SDNode<"AArch64ISD::CLASTA_N",   SDT_AArch64ReduceWithInit>;
 def AArch64clastb_n   : SDNode<"AArch64ISD::CLASTB_N",   SDT_AArch64ReduceWithInit>;
@@ -239,8 +247,8 @@
   def : Pat<(mul nxv2i64:$Op1, nxv2i64:$Op2),
 (MUL_ZPmZ_D (PTRUE_D 31), $Op1, $Op2)>;
 
-  defm SDIV_ZPmZ  : sve_int_bin_pred_arit_2_div<0b100, "sdiv", int_aarch64_sve_sdiv>;
-  defm UDIV_ZPmZ  : sve_int_bin_pred_arit_2_div<0b101, "udiv", int_aarch64_sve_udiv>;
+  defm SDIV_ZPmZ  : sve_int_bin_pred_arit_2_div<0b100, "sdiv",  AArch64sdiv_pred>;
+  defm UDIV_ZPmZ  : sve_int_bin_pred_arit_2_div<0b101, "udiv",  AArch64udiv_pred>;
   defm SDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b110, "sdivr", int_aarch64_sve_sdivr>;
   defm UDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b111, "udivr", int_aarch64_sve_udivr>;
 
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -52,6 +52,10 @@
   ADC,
   SBC, // adc, sbc instructions
 
+  // Arithmetic instructions
+  SDIV_PRED,
+  UDIV_PRED,
+
   // Arithmetic instructions which write flags.
   ADDS,
   SUBS,
@@ -781,6 +785,8 @@
   SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG ) const;
   SDValue LowerSPLAT_VECTOR(SDValue Op, SelectionDAG ) const;
   SDValue LowerDUPQLane(SDValue Op, SelectionDAG ) const;
+  SDValue LowerDIV(SDValue Op, SelectionDAG ,
+   unsigned NewOp) const;
   SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG ) const;
   SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG ) const;
   SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG ) const;
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -883,8 +883,11 @@
 // splat of 0 or undef) once vector selects supported in SVE codegen. See
 // D68877 for more details.
 for (MVT VT : MVT::integer_scalable_vector_valuetypes()) {
-  if (isTypeLegal(VT))
+  if (isTypeLegal(VT)) {
 setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
+setOperationAction(ISD::SDIV, VT, Custom);
+setOperationAction(ISD::UDIV, VT, Custom);
+  }
 }
 setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);
 

[PATCH] D78569: [SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics

2020-04-23 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 259610.
kmclaughlin added a comment.

- Removed changes to handle legalisation from this patch (this will be included 
in a follow up)
- Added AArch64ISD nodes for SDIV_PRED & UDIV_PRED
- Changed LowerDIV to use the new ISD nodes rather than lowering to SVE 
intrinsics
- Update tests to use CHECK-DAG


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78569/new/

https://reviews.llvm.org/D78569

Files:
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

Index: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
@@ -0,0 +1,45 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; SDIV
+;
+
+define  @sdiv_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: sdiv z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_i64( %a,  %b) {
+; CHECK-LABEL: @sdiv_i64
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: sdiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+;
+; UDIV
+;
+
+define  @udiv_i32( %a,  %b) {
+; CHECK-LABEL: @udiv_i32
+; CHECK-DAG: ptrue p0.s
+; CHECK-DAG: udiv z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
+define  @udiv_i64( %a,  %b) {
+; CHECK-LABEL: @udiv_i64
+; CHECK-DAG: ptrue p0.d
+; CHECK-DAG: udiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
Index: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
===
--- llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -145,6 +145,14 @@
 def AArch64lasta   :  SDNode<"AArch64ISD::LASTA", SDT_AArch64Reduce>;
 def AArch64lastb   :  SDNode<"AArch64ISD::LASTB", SDT_AArch64Reduce>;
 
+def SDT_AArch64DIV : SDTypeProfile<1, 3, [
+  SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisVec<3>,
+  SDTCVecEltisVT<1,i1>, SDTCisSameAs<2,3>
+]>;
+
+def AArch64sdiv_pred  :  SDNode<"AArch64ISD::SDIV_PRED", SDT_AArch64DIV>;
+def AArch64udiv_pred  :  SDNode<"AArch64ISD::UDIV_PRED", SDT_AArch64DIV>;
+
 def SDT_AArch64ReduceWithInit : SDTypeProfile<1, 3, [SDTCisVec<1>, SDTCisVec<3>]>;
 def AArch64clasta_n   : SDNode<"AArch64ISD::CLASTA_N",   SDT_AArch64ReduceWithInit>;
 def AArch64clastb_n   : SDNode<"AArch64ISD::CLASTB_N",   SDT_AArch64ReduceWithInit>;
@@ -239,8 +247,8 @@
   def : Pat<(mul nxv2i64:$Op1, nxv2i64:$Op2),
 (MUL_ZPmZ_D (PTRUE_D 31), $Op1, $Op2)>;
 
-  defm SDIV_ZPmZ  : sve_int_bin_pred_arit_2_div<0b100, "sdiv", int_aarch64_sve_sdiv>;
-  defm UDIV_ZPmZ  : sve_int_bin_pred_arit_2_div<0b101, "udiv", int_aarch64_sve_udiv>;
+  defm SDIV_ZPmZ  : sve_int_bin_pred_arit_2_div<0b100, "sdiv",  AArch64sdiv_pred>;
+  defm UDIV_ZPmZ  : sve_int_bin_pred_arit_2_div<0b101, "udiv",  AArch64udiv_pred>;
   defm SDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b110, "sdivr", int_aarch64_sve_sdivr>;
   defm UDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b111, "udivr", int_aarch64_sve_udivr>;
 
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -52,6 +52,10 @@
   ADC,
   SBC, // adc, sbc instructions
 
+  // Arithmetic instructions
+  SDIV_PRED,
+  UDIV_PRED,
+
   // Arithmetic instructions which write flags.
   ADDS,
   SUBS,
@@ -781,6 +785,8 @@
   SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG ) const;
   SDValue LowerSPLAT_VECTOR(SDValue Op, SelectionDAG ) const;
   SDValue LowerDUPQLane(SDValue Op, SelectionDAG ) const;
+  SDValue LowerDIV(SDValue Op, SelectionDAG ,
+   unsigned NewOp) const;
   SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG ) const;
   SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG ) const;
   SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG ) const;
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -883,8 +883,11 @@
 // splat of 0 or undef) once vector selects supported in SVE codegen. See
 // D68877 for more details.
 for (MVT VT : MVT::integer_scalable_vector_valuetypes()) {
-  if (isTypeLegal(VT))
+  if (isTypeLegal(VT)) {
 setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
+setOperationAction(ISD::SDIV, VT, Custom);
+setOperationAction(ISD::UDIV, VT, Custom);
+  }
 }
 setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);
 

[PATCH] D77871: [AArch64] Armv8.6-a Matrix Mult Assembly + Intrinsics

2020-04-23 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin accepted this revision.
kmclaughlin added a comment.
This revision is now accepted and ready to land.

Thanks for the updates, @LukeGeeson, LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77871/new/

https://reviews.llvm.org/D77871



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77871: [AArch64] Armv8.6-a Matrix Mult Assembly + Intrinsics

2020-04-22 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added inline comments.



Comment at: clang/test/CodeGen/aarch64-v8.6a-neon-intrinsics.c:3
+// RUN: -fallow-half-arguments-and-returns -S -disable-O0-optnone -emit-llvm 
-o - %s \
+// RUN: | opt -S -mem2reg \
+// RUN: | FileCheck %s

Is it possible to use -sroa here as you did for the tests added in D77872? If 
so, I think this might make some of the `_lane` tests below a bit easier to 
follow.



Comment at: llvm/test/MC/AArch64/armv8.6a-simd-matmul-error.s:17
+// For USDOT and SUDOT (indexed), the index is in range [0,3] (regardless of 
data types)
+usdot v31.2s, v1.8b,  v2.4b[4]
+// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: vector lane must be an integer in 
range [0, 3].

The arrangement specifiers of the first two operands don't match for these 
tests, which is what the next set of tests below is checking for. It might be 
worth keeping these tests specific to just the index being out of range.



Comment at: llvm/test/MC/AArch64/armv8.6a-simd-matmul-error.s:26
+
+// The arrangement specifiers of the first two operands muct match.
+usdot v31.4s, v1.8b,  v2.4b[0]

muct -> must :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77871/new/

https://reviews.llvm.org/D77871



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D76929: [AArch64][SVE] Add SVE intrinsic for LD1RQ

2020-04-22 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG17f6e18acf5b: [AArch64][SVE] Add SVE intrinsic for LD1RQ 
(authored by kmclaughlin).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76929/new/

https://reviews.llvm.org/D76929

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64InstrFormats.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
@@ -1,6 +1,179 @@
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
 
 ;
+; LD1RQB
+;
+
+define  @ld1rqb_i8( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %addr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 16
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_lower_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_lower_bound:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #-128]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 -128
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_upper_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_upper_bound:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #112]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 112
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_out_of_lower_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_out_of_lower_bound:
+; CHECK: sub x8, x0, #129
+; CHECK-NEXT: ld1rqb { z0.b }, p0/z, [x8]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i64 -129
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_out_of_upper_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_out_of_upper_bound:
+; CHECK: add x8, x0, #113
+; CHECK-NEXT: ld1rqb { z0.b }, p0/z, [x8]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i64 113
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQH
+;
+
+define  @ld1rqh_i16( %pred, i16* %addr) {
+; CHECK-LABEL: ld1rqh_i16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8i16( %pred, i16* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_f16( %pred, half* %addr) {
+; CHECK-LABEL: ld1rqh_f16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8f16( %pred, half* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_i16_imm( %pred, i16* %addr) {
+; CHECK-LABEL: ld1rqh_i16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-64]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i16, i16* %addr, i16 -32
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8i16( %pred, i16* %ptr)
+  ret  %res
+}
+
+define  @ld1rqh_f16_imm( %pred, half* %addr) {
+; CHECK-LABEL: ld1rqh_f16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds half, half* %addr, i16 -8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8f16( %pred, half* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQW
+;
+
+define  @ld1rqw_i32( %pred, i32* %addr) {
+; CHECK-LABEL: ld1rqw_i32:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4i32( %pred, i32* %addr)
+  ret  %res
+}
+
+define  @ld1rqw_f32( %pred, float* %addr) {
+; CHECK-LABEL: ld1rqw_f32:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4f32( %pred, float* %addr)
+  ret  %res
+}
+
+define  @ld1rqw_i32_imm( %pred, i32* %addr) {
+; CHECK-LABEL: ld1rqw_i32_imm:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0, #112]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i32, i32* %addr, i32 28
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4i32( %pred, i32* %ptr)
+  ret  %res
+}
+
+define  @ld1rqw_f32_imm( %pred, float* %addr) {
+; CHECK-LABEL: ld1rqw_f32_imm:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0, #32]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds float, float* %addr, i32 8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4f32( %pred, float* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQD
+;
+
+define  @ld1rqd_i64( %pred, i64* %addr) {
+; CHECK-LABEL: ld1rqd_i64:
+; CHECK: ld1rqd { z0.d }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv2i64( %pred, i64* %addr)
+ 

[PATCH] D76929: [AArch64][SVE] Add SVE intrinsic for LD1RQ

2020-04-21 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 259035.
kmclaughlin marked an inline comment as done.
kmclaughlin added a comment.

- Use Load.getValue(0) when creating a bitcast in performLD1RQCombine


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76929/new/

https://reviews.llvm.org/D76929

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64InstrFormats.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
@@ -1,6 +1,179 @@
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
 
 ;
+; LD1RQB
+;
+
+define  @ld1rqb_i8( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %addr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 16
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_lower_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_lower_bound:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #-128]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 -128
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_upper_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_upper_bound:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #112]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 112
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_out_of_lower_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_out_of_lower_bound:
+; CHECK: sub x8, x0, #129
+; CHECK-NEXT: ld1rqb { z0.b }, p0/z, [x8]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i64 -129
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_out_of_upper_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_out_of_upper_bound:
+; CHECK: add x8, x0, #113
+; CHECK-NEXT: ld1rqb { z0.b }, p0/z, [x8]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i64 113
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQH
+;
+
+define  @ld1rqh_i16( %pred, i16* %addr) {
+; CHECK-LABEL: ld1rqh_i16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8i16( %pred, i16* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_f16( %pred, half* %addr) {
+; CHECK-LABEL: ld1rqh_f16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8f16( %pred, half* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_i16_imm( %pred, i16* %addr) {
+; CHECK-LABEL: ld1rqh_i16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-64]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i16, i16* %addr, i16 -32
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8i16( %pred, i16* %ptr)
+  ret  %res
+}
+
+define  @ld1rqh_f16_imm( %pred, half* %addr) {
+; CHECK-LABEL: ld1rqh_f16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds half, half* %addr, i16 -8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8f16( %pred, half* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQW
+;
+
+define  @ld1rqw_i32( %pred, i32* %addr) {
+; CHECK-LABEL: ld1rqw_i32:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4i32( %pred, i32* %addr)
+  ret  %res
+}
+
+define  @ld1rqw_f32( %pred, float* %addr) {
+; CHECK-LABEL: ld1rqw_f32:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4f32( %pred, float* %addr)
+  ret  %res
+}
+
+define  @ld1rqw_i32_imm( %pred, i32* %addr) {
+; CHECK-LABEL: ld1rqw_i32_imm:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0, #112]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i32, i32* %addr, i32 28
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4i32( %pred, i32* %ptr)
+  ret  %res
+}
+
+define  @ld1rqw_f32_imm( %pred, float* %addr) {
+; CHECK-LABEL: ld1rqw_f32_imm:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0, #32]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds float, float* %addr, i32 8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4f32( %pred, float* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQD
+;
+
+define  @ld1rqd_i64( %pred, i64* %addr) {
+; CHECK-LABEL: ld1rqd_i64:
+; CHECK: ld1rqd { z0.d }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv2i64( %pred, i64* %addr)
+  ret  %res
+}
+

[PATCH] D76929: [AArch64][SVE] Add SVE intrinsic for LD1RQ

2020-04-21 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11622
+  if (VT.isFloatingPoint())
+Load = DAG.getNode(ISD::BITCAST, DL, VT, Load);
+

sdesmalen wrote:
> kmclaughlin wrote:
> > sdesmalen wrote:
> > > I'd expect this to then use `Load.getValue(0)` ?
> > I think this will have the same effect, as `Load` just returns a single 
> > value
> `SDValue LoadChain = SDValue(Load.getNode(), 1);` suggests that `Load` has 
> two return values, the result of the load, and the Chain.
> 
I think you're right - I've changed this to use the result of the load as 
suggested


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76929/new/

https://reviews.llvm.org/D76929



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78569: [SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics

2020-04-21 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, c-rhodes, efriedma, cameron.mcinally.
Herald added subscribers: psnobl, rkruppe, hiraditya, kristof.beyls, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.

This patch maps IR operations for sdiv & udiv to the
@llvm.aarch64.sve.[s|u]div intrinsics.

A ptrue must be created during lowering as the div instructions
have only a predicated form.

Patch contains changes by Andrzej Warzynski.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D78569

Files:
  llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/TargetLoweringBase.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

Index: llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
@@ -0,0 +1,87 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; SDIV
+;
+
+define  @sdiv_i32( %a,  %b) {
+; CHECK-LABEL: @sdiv_i32
+; CHECK: ptrue p0.s
+; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_i64( %a,  %b) {
+; CHECK-LABEL: @sdiv_i64
+; CHECK: ptrue p0.d
+; CHECK-NEXT: sdiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_narrow( %a,  %b) {
+; CHECK-LABEL: @sdiv_narrow
+; CHECK: ptrue p0.s
+; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z2.s
+; CHECK-NEXT: sdiv z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+define  @sdiv_widen( %a,  %b) {
+; CHECK-LABEL: @sdiv_widen
+; CHECK: ptrue p0.d
+; CHECK-NEXT: sxtw z1.d, p0/m, z1.d
+; CHECK-NEXT: sxtw z0.d, p0/m, z0.d
+; CHECK-NEXT: sdiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = sdiv  %a, %b
+  ret  %div
+}
+
+;
+; UDIV
+;
+
+define  @udiv_i32( %a,  %b) {
+; CHECK-LABEL: @udiv_i32
+; CHECK: ptrue p0.s
+; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
+define  @udiv_i64( %a,  %b) {
+; CHECK-LABEL: @udiv_i64
+; CHECK: ptrue p0.d
+; CHECK-NEXT: udiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
+define  @udiv_narrow( %a,  %b) {
+; CHECK-LABEL: @udiv_narrow
+; CHECK: ptrue p0.s
+; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z2.s
+; CHECK-NEXT: udiv z1.s, p0/m, z1.s, z3.s
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
+
+define  @udiv_widen( %a,  %b) {
+; CHECK-LABEL: @udiv_widen
+; CHECK: ptrue p0.d
+; CHECK-NEXT: and z1.d, z1.d, #0x
+; CHECK-NEXT: and z0.d, z0.d, #0x
+; CHECK-NEXT: udiv z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT: ret
+  %div = udiv  %a, %b
+  ret  %div
+}
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.h
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -776,6 +776,8 @@
   SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG ) const;
   SDValue LowerSPLAT_VECTOR(SDValue Op, SelectionDAG ) const;
   SDValue LowerDUPQLane(SDValue Op, SelectionDAG ) const;
+  SDValue LowerDIV(SDValue Op, SelectionDAG ,
+   unsigned IntrID) const;
   SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG ) const;
   SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG ) const;
   SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG ) const;
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -883,8 +883,11 @@
 // splat of 0 or undef) once vector selects supported in SVE codegen. See
 // D68877 for more details.
 for (MVT VT : MVT::integer_scalable_vector_valuetypes()) {
-  if (isTypeLegal(VT))
+  if (isTypeLegal(VT)) {
 setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
+setOperationAction(ISD::SDIV, VT, Custom);
+setOperationAction(ISD::UDIV, VT, Custom);
+  }
 }
 setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);
 setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);
@@ -3337,6 +3340,10 @@
 return LowerSPLAT_VECTOR(Op, DAG);
   case ISD::EXTRACT_SUBVECTOR:
 return LowerEXTRACT_SUBVECTOR(Op, DAG);
+  case ISD::SDIV:
+return LowerDIV(Op, DAG, Intrinsic::aarch64_sve_sdiv);
+  case ISD::UDIV:
+return LowerDIV(Op, DAG, Intrinsic::aarch64_sve_udiv);
   case ISD::SRA:
   case ISD::SRL:
   case ISD::SHL:
@@ -7643,6 +7650,25 @@
   return DAG.getNode(ISD::BITCAST, DL, VT, TBL);
 }
 
+SDValue AArch64TargetLowering::LowerDIV(SDValue Op,
+SelectionDAG ,
+   

[PATCH] D78509: [AArch64][SVE] Add addressing mode for contiguous loads & stores

2020-04-21 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG0df40d6ef8b8: [AArch64][SVE] Add addressing mode for 
contiguous loads  stores (authored by kmclaughlin).

Changed prior to commit:
  https://reviews.llvm.org/D78509?vs=258954=258965#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78509/new/

https://reviews.llvm.org/D78509

Files:
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll
@@ -0,0 +1,184 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; ST1B
+;
+
+define void @st1b_i8( %data,  %pred, i8* %a, i64 %index) {
+; CHECK-LABEL: st1b_i8:
+; CHECK: st1b { z0.b }, p0, [x0, x1]
+; CHECK-NEXT: ret
+  %base = getelementptr i8, i8* %a, i64 %index
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,
+   %pred,
+  i8* %base)
+  ret void
+}
+
+
+
+define void @st1b_h( %data,  %pred, i8* %a, i64 %index) {
+; CHECK-LABEL: st1b_h:
+; CHECK: st1b { z0.h }, p0, [x0, x1]
+; CHECK-NEXT: ret
+  %base = getelementptr i8, i8* %a, i64 %index
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv8i8( %trunc,
+  %pred,
+ i8* %base)
+  ret void
+}
+
+define void @st1b_s( %data,  %pred, i8* %a, i64 %index) {
+; CHECK-LABEL: st1b_s:
+; CHECK: st1b { z0.s }, p0, [x0, x1]
+; CHECK-NEXT: ret
+  %base = getelementptr i8, i8* %a, i64 %index
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv4i8( %trunc,
+  %pred,
+ i8* %base)
+  ret void
+}
+
+define void @st1b_d( %data,  %pred, i8* %a, i64 %index) {
+; CHECK-LABEL: st1b_d:
+; CHECK: st1b { z0.d }, p0, [x0, x1]
+; CHECK-NEXT: ret
+  %base = getelementptr i8, i8* %a, i64 %index
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv2i8( %trunc,
+  %pred,
+ i8* %base)
+  ret void
+}
+
+;
+; ST1H
+;
+
+define void @st1h_i16( %data,  %pred, i16* %a, i64 %index) {
+; CHECK-LABEL: st1h_i16:
+; CHECK: st1h { z0.h }, p0, [x0, x1, lsl #1]
+; CHECK-NEXT: ret
+  %base = getelementptr i16, i16* %a, i64 %index
+  call void @llvm.aarch64.sve.st1.nxv8i16( %data,
+   %pred,
+  i16* %base)
+  ret void
+}
+
+define void @st1h_f16( %data,  %pred, half* %a, i64 %index) {
+; CHECK-LABEL: st1h_f16:
+; CHECK: st1h { z0.h }, p0, [x0, x1, lsl #1]
+; CHECK-NEXT: ret
+  %base = getelementptr half, half* %a, i64 %index
+  call void @llvm.aarch64.sve.st1.nxv8f16( %data,
+   %pred,
+  half* %base)
+  ret void
+}
+
+define void @st1h_s( %data,  %pred, i16* %addr) {
+; CHECK-LABEL: st1h_s:
+; CHECK: st1h { z0.s }, p0, [x0]
+; CHECK-NEXT: ret
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv4i16( %trunc,
+  %pred,
+ i16* %addr)
+  ret void
+}
+
+define void @st1h_d( %data,  %pred, i16* %a, i64 %index) {
+; CHECK-LABEL: st1h_d:
+; CHECK: st1h { z0.d }, p0, [x0, x1, lsl #1]
+; CHECK-NEXT: ret
+  %base = getelementptr i16, i16* %a, i64 %index
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv2i16( %trunc,
+  %pred,
+ i16* %base)
+  ret void
+}
+
+;
+; ST1W
+;
+
+define void @st1w_i32( %data,  %pred, i32* %a, i64 %index) {
+; CHECK-LABEL: st1w_i32:
+; CHECK: st1w { z0.s }, p0, [x0, x1, lsl #2]
+; CHECK-NEXT: ret
+  %base = getelementptr i32, i32* %a, i64 %index
+  call void @llvm.aarch64.sve.st1.nxv4i32( %data,
+   %pred,
+  i32* %base)
+  ret void
+}
+
+define void @st1w_f32( %data,  %pred, float* %a, i64 %index) {
+; CHECK-LABEL: st1w_f32:
+; CHECK: st1w { z0.s }, p0, [x0, x1, lsl #2]
+; CHECK-NEXT: ret
+  %base = getelementptr float, float* %a, i64 %index
+  call void @llvm.aarch64.sve.st1.nxv4f32( %data,
+   %pred,
+  

[PATCH] D78509: [AArch64][SVE] Add addressing mode for contiguous loads & stores

2020-04-21 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added a comment.

Thanks for taking a look at this, @fpetrogalli!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78509/new/

https://reviews.llvm.org/D78509



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78509: [AArch64][SVE] Add addressing mode for contiguous loads & stores

2020-04-21 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 258954.
kmclaughlin marked 5 inline comments as done.
kmclaughlin added a comment.

- Renamed ld1nf multiclass to ldnf1
- Split out existing reg+imm tests into their own files
- Renamed 'offset' to 'index' in reg+reg tests


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78509/new/

https://reviews.llvm.org/D78509

Files:
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
@@ -47,101 +47,6 @@
   ret void
 }
 
-define void @st1b_upper_bound( %data,  %pg, i8* %a) {
-; CHECK-LABEL: st1b_upper_bound:
-; CHECK: st1b { z0.b }, p0, [x0, #7, mul vl]
-; CHECK-NEXT: ret
-  %base_scalable = bitcast i8* %a to *
-  %base = getelementptr , * %base_scalable, i64 7
-  %base_scalar = bitcast * %base to i8*
-  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
-  ret void
-}
-
-define void @st1b_inbound( %data,  %pg, i8* %a) {
-; CHECK-LABEL: st1b_inbound:
-; CHECK: st1b { z0.b }, p0, [x0, #1, mul vl]
-; CHECK-NEXT: ret
-  %base_scalable = bitcast i8* %a to *
-  %base = getelementptr , * %base_scalable, i64 1
-  %base_scalar = bitcast * %base to i8*
-  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
-  ret void
-}
-
-define void @st1b_lower_bound( %data,  %pg, i8* %a) {
-; CHECK-LABEL: st1b_lower_bound:
-; CHECK: st1b { z0.b }, p0, [x0, #-8, mul vl]
-; CHECK-NEXT: ret
-  %base_scalable = bitcast i8* %a to *
-  %base = getelementptr , * %base_scalable, i64 -8
-  %base_scalar = bitcast * %base to i8*
-  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
-  ret void
-}
-
-define void @st1b_out_of_upper_bound( %data,  %pg, i8* %a) {
-; CHECK-LABEL: st1b_out_of_upper_bound:
-; CHECK: rdvl x[[OFFSET:[0-9]+]], #8
-; CHECK: add x[[BASE:[0-9]+]], x0, x[[OFFSET]]
-; CHECK: st1b { z0.b }, p0, [x[[BASE]]]
-; CHECK-NEXT: ret
-  %base_scalable = bitcast i8* %a to *
-  %base = getelementptr , * %base_scalable, i64 8
-  %base_scalar = bitcast * %base to i8*
-  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
-  ret void
-}
-
-define void @st1b_out_of_lower_bound( %data,  %pg, i8* %a) {
-; CHECK-LABEL: st1b_out_of_lower_bound:
-; CHECK: rdvl x[[OFFSET:[0-9]+]], #-9
-; CHECK: add x[[BASE:[0-9]+]], x0, x[[OFFSET]]
-; CHECK: st1b { z0.b }, p0, [x[[BASE]]]
-; CHECK-NEXT: ret
-  %base_scalable = bitcast i8* %a to *
-  %base = getelementptr , * %base_scalable, i64 -9
-  %base_scalar = bitcast * %base to i8*
-  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
-  ret void
-}
-
-define void @st1b_s_inbound( %data,  %pg, i8* %a) {
-; CHECK-LABEL: st1b_s_inbound:
-; CHECK: st1b { z0.s }, p0, [x0, #7, mul vl]
-; CHECK-NEXT: ret
-  %base_scalable = bitcast i8* %a to *
-  %base = getelementptr , * %base_scalable, i64 7
-  %base_scalar = bitcast * %base to i8*
-  %trunc = trunc  %data to 
-  call void @llvm.aarch64.sve.st1.nxv4i8( %trunc,  %pg, i8* %base_scalar)
-  ret void
-}
-
-define void @st1b_h_inbound( %data,  %pg, i8* %a) {
-; CHECK-LABEL: st1b_h_inbound:
-; CHECK: st1b { z0.h }, p0, [x0, #1, mul vl]
-; CHECK-NEXT: ret
-  %base_scalable = bitcast i8* %a to *
-  %base = getelementptr , * %base_scalable, i64 1
-  %base_scalar = bitcast * %base to i8*
-  %trunc = trunc  %data to 
-  call void @llvm.aarch64.sve.st1.nxv8i8( %trunc,  %pg, i8* %base_scalar)
-  ret void
-}
-
-define void @st1b_d_inbound( %data,  %pg, i8* %a) {
-; CHECK-LABEL: st1b_d_inbound:
-; CHECK: st1b { z0.d }, p0, [x0, #-7, mul vl]
-; CHECK-NEXT: ret
-  %base_scalable = bitcast i8* %a to *
-  %base = getelementptr , * %base_scalable, i64 -7
-  %base_scalar = bitcast * %base to i8*
-  %trunc = trunc  %data to 
-  call void @llvm.aarch64.sve.st1.nxv2i8( %trunc,  %pg, i8* %base_scalar)
-  ret void
-}
-
 ;
 ; ST1H
 ;
@@ -188,52 +93,6 @@
   ret void
 }
 
-define void @st1h_inbound( %data,  %pg, i16* %a) {
-; CHECK-LABEL: st1h_inbound:
-; CHECK: st1h { z0.h }, p0, [x0, #-1, mul vl]
-; CHECK-NEXT: ret
-  %base_scalable = bitcast i16* %a to *
-  %base = getelementptr , * %base_scalable, i64 -1
-  %base_scalar = bitcast * %base to i16*
-  call void @llvm.aarch64.sve.st1.nxv8i16( %data,  %pg, i16* %base_scalar)
-  ret void
-}
-
-define void @st1h_f16_inbound( %data,  %pg, half* %a) {
-; CHECK-LABEL: st1h_f16_inbound:
-; CHECK: st1h { z0.h }, p0, [x0, #-5, mul vl]
-; CHECK-NEXT: ret
-  %base_scalable = bitcast half* %a to *
-  

[PATCH] D78509: [AArch64][SVE] Add addressing mode for contiguous loads & stores

2020-04-20 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, fpetrogalli, efriedma.
Herald added subscribers: danielkiss, psnobl, rkruppe, hiraditya, 
kristof.beyls, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.

This patch adds the register + register addressing mode for
SVE contiguous load and store intrinsics (LD1 & ST1)


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D78509

Files:
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
@@ -83,8 +83,7 @@
 define void @st1b_out_of_upper_bound( %data,  %pg, i8* %a) {
 ; CHECK-LABEL: st1b_out_of_upper_bound:
 ; CHECK: rdvl x[[OFFSET:[0-9]+]], #8
-; CHECK: add x[[BASE:[0-9]+]], x0, x[[OFFSET]]
-; CHECK: st1b { z0.b }, p0, [x[[BASE]]]
+; CHECK: st1b { z0.b }, p0, [x0, x[[OFFSET]]]
 ; CHECK-NEXT: ret
   %base_scalable = bitcast i8* %a to *
   %base = getelementptr , * %base_scalable, i64 8
@@ -96,8 +95,7 @@
 define void @st1b_out_of_lower_bound( %data,  %pg, i8* %a) {
 ; CHECK-LABEL: st1b_out_of_lower_bound:
 ; CHECK: rdvl x[[OFFSET:[0-9]+]], #-9
-; CHECK: add x[[BASE:[0-9]+]], x0, x[[OFFSET]]
-; CHECK: st1b { z0.b }, p0, [x[[BASE]]]
+; CHECK: st1b { z0.b }, p0, [x0, x[[OFFSET]]]
 ; CHECK-NEXT: ret
   %base_scalable = bitcast i8* %a to *
   %base = getelementptr , * %base_scalable, i64 -9
Index: llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-st1-addressing-mode-reg-reg.ll
@@ -0,0 +1,184 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; ST1B
+;
+
+define void @st1b_i8( %data,  %pred, i8* %a, i64 %offset) {
+; CHECK-LABEL: st1b_i8:
+; CHECK: st1b { z0.b }, p0, [x0, x1]
+; CHECK-NEXT: ret
+  %base = getelementptr i8, i8* %a, i64 %offset
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,
+   %pred,
+  i8* %base)
+  ret void
+}
+
+
+
+define void @st1b_h( %data,  %pred, i8* %a, i64 %offset) {
+; CHECK-LABEL: st1b_h:
+; CHECK: st1b { z0.h }, p0, [x0, x1]
+; CHECK-NEXT: ret
+  %base = getelementptr i8, i8* %a, i64 %offset
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv8i8( %trunc,
+  %pred,
+ i8* %base)
+  ret void
+}
+
+define void @st1b_s( %data,  %pred, i8* %a, i64 %offset) {
+; CHECK-LABEL: st1b_s:
+; CHECK: st1b { z0.s }, p0, [x0, x1]
+; CHECK-NEXT: ret
+  %base = getelementptr i8, i8* %a, i64 %offset
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv4i8( %trunc,
+  %pred,
+ i8* %base)
+  ret void
+}
+
+define void @st1b_d( %data,  %pred, i8* %a, i64 %offset) {
+; CHECK-LABEL: st1b_d:
+; CHECK: st1b { z0.d }, p0, [x0, x1]
+; CHECK-NEXT: ret
+  %base = getelementptr i8, i8* %a, i64 %offset
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv2i8( %trunc,
+  %pred,
+ i8* %base)
+  ret void
+}
+
+;
+; ST1H
+;
+
+define void @st1h_i16( %data,  %pred, i16* %a, i64 %offset) {
+; CHECK-LABEL: st1h_i16:
+; CHECK: st1h { z0.h }, p0, [x0, x1, lsl #1]
+; CHECK-NEXT: ret
+  %base = getelementptr i16, i16* %a, i64 %offset
+  call void @llvm.aarch64.sve.st1.nxv8i16( %data,
+   %pred,
+  i16* %base)
+  ret void
+}
+
+define void @st1h_f16( %data,  %pred, half* %a, i64 %offset) {
+; CHECK-LABEL: st1h_f16:
+; CHECK: st1h { z0.h }, p0, [x0, x1, lsl #1]
+; CHECK-NEXT: ret
+  %base = getelementptr half, half* %a, i64 %offset
+  call void @llvm.aarch64.sve.st1.nxv8f16( %data,
+   %pred,
+  half* %base)
+  ret void
+}
+
+define void @st1h_s( %data,  %pred, i16* %addr) {
+; CHECK-LABEL: st1h_s:
+; CHECK: st1h { z0.s }, p0, [x0]
+; CHECK-NEXT: ret
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv4i16( %trunc,
+  %pred,
+ i16* %addr)
+  ret void
+}
+
+define void @st1h_d( %data,  %pred, i16* %a, i64 %offset) {
+; CHECK-LABEL: st1h_d:
+; CHECK: st1h { z0.d }, p0, [x0, x1, lsl #1]
+; CHECK-NEXT: ret
+  %base = getelementptr i16, i16* %a, i64 %offset
+  %trunc = trunc  

[PATCH] D78204: [AArch64][SVE] Remove LD1/ST1 dependency on llvm.masked.load/store

2020-04-20 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG33ffce5414ec: [AArch64][SVE] Remove LD1/ST1 dependency on 
llvm.masked.load/store (authored by kmclaughlin).

Changed prior to commit:
  https://reviews.llvm.org/D78204?vs=257702=258694#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78204/new/

https://reviews.llvm.org/D78204

Files:
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ldst1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
@@ -0,0 +1,367 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; ST1B
+;
+
+define void @st1b_i8( %data,  %pred, i8* %addr) {
+; CHECK-LABEL: st1b_i8:
+; CHECK: st1b { z0.b }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,
+   %pred,
+  i8* %addr)
+  ret void
+}
+
+define void @st1b_h( %data,  %pred, i8* %addr) {
+; CHECK-LABEL: st1b_h:
+; CHECK: st1b { z0.h }, p0, [x0]
+; CHECK-NEXT: ret
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv8i8( %trunc,
+  %pred,
+ i8* %addr)
+  ret void
+}
+
+define void @st1b_s( %data,  %pred, i8* %addr) {
+; CHECK-LABEL: st1b_s:
+; CHECK: st1b { z0.s }, p0, [x0]
+; CHECK-NEXT: ret
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv4i8( %trunc,
+  %pred,
+ i8* %addr)
+  ret void
+}
+
+define void @st1b_d( %data,  %pred, i8* %addr) {
+; CHECK-LABEL: st1b_d:
+; CHECK: st1b { z0.d }, p0, [x0]
+; CHECK-NEXT: ret
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv2i8( %trunc,
+  %pred,
+ i8* %addr)
+  ret void
+}
+
+define void @st1b_upper_bound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_upper_bound:
+; CHECK: st1b { z0.b }, p0, [x0, #7, mul vl]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 7
+  %base_scalar = bitcast * %base to i8*
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
+  ret void
+}
+
+define void @st1b_inbound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_inbound:
+; CHECK: st1b { z0.b }, p0, [x0, #1, mul vl]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 1
+  %base_scalar = bitcast * %base to i8*
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
+  ret void
+}
+
+define void @st1b_lower_bound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_lower_bound:
+; CHECK: st1b { z0.b }, p0, [x0, #-8, mul vl]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 -8
+  %base_scalar = bitcast * %base to i8*
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
+  ret void
+}
+
+define void @st1b_out_of_upper_bound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_out_of_upper_bound:
+; CHECK: rdvl x[[OFFSET:[0-9]+]], #8
+; CHECK: add x[[BASE:[0-9]+]], x0, x[[OFFSET]]
+; CHECK: st1b { z0.b }, p0, [x[[BASE]]]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 8
+  %base_scalar = bitcast * %base to i8*
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
+  ret void
+}
+
+define void @st1b_out_of_lower_bound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_out_of_lower_bound:
+; CHECK: rdvl x[[OFFSET:[0-9]+]], #-9
+; CHECK: add x[[BASE:[0-9]+]], x0, x[[OFFSET]]
+; CHECK: st1b { z0.b }, p0, [x[[BASE]]]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 -9
+  %base_scalar = bitcast * %base to i8*
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
+  ret void
+}
+
+define void @st1b_s_inbound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_s_inbound:
+; CHECK: st1b { z0.s }, p0, [x0, #7, mul vl]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 7
+  %base_scalar = bitcast * %base to i8*
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv4i8( %trunc,  %pg, i8* %base_scalar)
+  ret void
+}
+
+define void @st1b_h_inbound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_h_inbound:
+; CHECK: st1b { z0.h }, p0, [x0, #1, mul vl]
+; CHECK-NEXT: ret
+  

[PATCH] D76929: [AArch64][SVE] Add SVE intrinsic for LD1RQ

2020-04-15 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11622
+  if (VT.isFloatingPoint())
+Load = DAG.getNode(ISD::BITCAST, DL, VT, Load);
+

sdesmalen wrote:
> I'd expect this to then use `Load.getValue(0)` ?
I think this will have the same effect, as `Load` just returns a single value


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76929/new/

https://reviews.llvm.org/D76929



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78204: [AArch64][SVE] Remove LD1/ST1 dependency on llvm.masked.load/store

2020-04-15 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, efriedma, cameron.mcinally, c-rhodes.
Herald added subscribers: danielkiss, psnobl, rkruppe, hiraditya, 
kristof.beyls, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.

The SVE masked load and store intrinsics introduced in D76688 
 rely on
common llvm.masked.load/store nodes. This patch creates new ISD nodes
for LD1(S) & ST1 to remove this dependency.

Additionally, this adds support for sign & zero extending
loads and truncating stores.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D78204

Files:
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-ld1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-ldst1.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-st1.ll
@@ -0,0 +1,367 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; ST1B
+;
+
+define void @st1b_i8( %data,  %pred, i8* %addr) {
+; CHECK-LABEL: st1b_i8:
+; CHECK: st1b { z0.b }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,
+   %pred,
+  i8* %addr)
+  ret void
+}
+
+define void @st1b_h( %data,  %pred, i8* %addr) {
+; CHECK-LABEL: st1b_h:
+; CHECK: st1b { z0.h }, p0, [x0]
+; CHECK-NEXT: ret
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv8i8( %trunc,
+  %pred,
+ i8* %addr)
+  ret void
+}
+
+define void @st1b_s( %data,  %pred, i8* %addr) {
+; CHECK-LABEL: st1b_s:
+; CHECK: st1b { z0.s }, p0, [x0]
+; CHECK-NEXT: ret
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv4i8( %trunc,
+  %pred,
+ i8* %addr)
+  ret void
+}
+
+define void @st1b_d( %data,  %pred, i8* %addr) {
+; CHECK-LABEL: st1b_d:
+; CHECK: st1b { z0.d }, p0, [x0]
+; CHECK-NEXT: ret
+  %trunc = trunc  %data to 
+  call void @llvm.aarch64.sve.st1.nxv2i8( %trunc,
+  %pred,
+ i8* %addr)
+  ret void
+}
+
+define void @st1b_upper_bound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_upper_bound:
+; CHECK: st1b { z0.b }, p0, [x0, #7, mul vl]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 7
+  %base_scalar = bitcast * %base to i8*
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
+  ret void
+}
+
+define void @st1b_inbound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_inbound:
+; CHECK: st1b { z0.b }, p0, [x0, #1, mul vl]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 1
+  %base_scalar = bitcast * %base to i8*
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
+  ret void
+}
+
+define void @st1b_lower_bound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_lower_bound:
+; CHECK: st1b { z0.b }, p0, [x0, #-8, mul vl]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 -8
+  %base_scalar = bitcast * %base to i8*
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
+  ret void
+}
+
+define void @st1b_out_of_upper_bound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_out_of_upper_bound:
+; CHECK: rdvl x[[OFFSET:[0-9]+]], #8
+; CHECK: add x[[BASE:[0-9]+]], x0, x[[OFFSET]]
+; CHECK: st1b { z0.b }, p0, [x[[BASE]]]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 8
+  %base_scalar = bitcast * %base to i8*
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
+  ret void
+}
+
+define void @st1b_out_of_lower_bound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_out_of_lower_bound:
+; CHECK: rdvl x[[OFFSET:[0-9]+]], #-9
+; CHECK: add x[[BASE:[0-9]+]], x0, x[[OFFSET]]
+; CHECK: st1b { z0.b }, p0, [x[[BASE]]]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 -9
+  %base_scalar = bitcast * %base to i8*
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,  %pg, i8* %base_scalar)
+  ret void
+}
+
+define void @st1b_s_inbound( %data,  %pg, i8* %a) {
+; CHECK-LABEL: st1b_s_inbound:
+; CHECK: st1b { z0.s }, p0, [x0, #7, mul vl]
+; CHECK-NEXT: ret
+  %base_scalable = bitcast i8* %a to *
+  %base = getelementptr , * %base_scalable, i64 7
+  %base_scalar = bitcast * %base to i8*
+  %trunc = trunc  %data to 
+  call void 

[PATCH] D76929: [AArch64][SVE] Add SVE intrinsic for LD1RQ

2020-04-15 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 257657.
kmclaughlin marked an inline comment as done.
kmclaughlin added a comment.

Ensure LoadChain is always preserved in performLD1RQCombine


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76929/new/

https://reviews.llvm.org/D76929

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64InstrFormats.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
@@ -1,6 +1,179 @@
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
 
 ;
+; LD1RQB
+;
+
+define  @ld1rqb_i8( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %addr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 16
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_lower_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_lower_bound:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #-128]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 -128
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_upper_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_upper_bound:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #112]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 112
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_out_of_lower_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_out_of_lower_bound:
+; CHECK: sub x8, x0, #129
+; CHECK-NEXT: ld1rqb { z0.b }, p0/z, [x8]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i64 -129
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_out_of_upper_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_out_of_upper_bound:
+; CHECK: add x8, x0, #113
+; CHECK-NEXT: ld1rqb { z0.b }, p0/z, [x8]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i64 113
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQH
+;
+
+define  @ld1rqh_i16( %pred, i16* %addr) {
+; CHECK-LABEL: ld1rqh_i16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8i16( %pred, i16* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_f16( %pred, half* %addr) {
+; CHECK-LABEL: ld1rqh_f16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8f16( %pred, half* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_i16_imm( %pred, i16* %addr) {
+; CHECK-LABEL: ld1rqh_i16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-64]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i16, i16* %addr, i16 -32
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8i16( %pred, i16* %ptr)
+  ret  %res
+}
+
+define  @ld1rqh_f16_imm( %pred, half* %addr) {
+; CHECK-LABEL: ld1rqh_f16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds half, half* %addr, i16 -8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8f16( %pred, half* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQW
+;
+
+define  @ld1rqw_i32( %pred, i32* %addr) {
+; CHECK-LABEL: ld1rqw_i32:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4i32( %pred, i32* %addr)
+  ret  %res
+}
+
+define  @ld1rqw_f32( %pred, float* %addr) {
+; CHECK-LABEL: ld1rqw_f32:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4f32( %pred, float* %addr)
+  ret  %res
+}
+
+define  @ld1rqw_i32_imm( %pred, i32* %addr) {
+; CHECK-LABEL: ld1rqw_i32_imm:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0, #112]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i32, i32* %addr, i32 28
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4i32( %pred, i32* %ptr)
+  ret  %res
+}
+
+define  @ld1rqw_f32_imm( %pred, float* %addr) {
+; CHECK-LABEL: ld1rqw_f32_imm:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0, #32]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds float, float* %addr, i32 8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4f32( %pred, float* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQD
+;
+
+define  @ld1rqd_i64( %pred, i64* %addr) {
+; CHECK-LABEL: ld1rqd_i64:
+; CHECK: ld1rqd { z0.d }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv2i64( %pred, i64* %addr)
+  ret  %res
+}
+
+define  

[PATCH] D76929: [AArch64][SVE] Add SVE intrinsic for LD1RQ

2020-04-14 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 257349.
kmclaughlin marked 4 inline comments as done.
kmclaughlin edited the summary of this revision.
kmclaughlin added a comment.

Simplified performLD1RQCombine method & added negative tests where the 
immediate is out of range.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76929/new/

https://reviews.llvm.org/D76929

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64InstrFormats.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
@@ -1,6 +1,179 @@
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
 
 ;
+; LD1RQB
+;
+
+define  @ld1rqb_i8( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %addr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 16
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_lower_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_lower_bound:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #-128]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 -128
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_upper_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_upper_bound:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #112]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 112
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_out_of_lower_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_out_of_lower_bound:
+; CHECK: sub x8, x0, #129
+; CHECK-NEXT: ld1rqb { z0.b }, p0/z, [x8]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i64 -129
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm_out_of_upper_bound( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm_out_of_upper_bound:
+; CHECK: add x8, x0, #113
+; CHECK-NEXT: ld1rqb { z0.b }, p0/z, [x8]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i64 113
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQH
+;
+
+define  @ld1rqh_i16( %pred, i16* %addr) {
+; CHECK-LABEL: ld1rqh_i16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8i16( %pred, i16* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_f16( %pred, half* %addr) {
+; CHECK-LABEL: ld1rqh_f16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8f16( %pred, half* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_i16_imm( %pred, i16* %addr) {
+; CHECK-LABEL: ld1rqh_i16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-64]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i16, i16* %addr, i16 -32
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8i16( %pred, i16* %ptr)
+  ret  %res
+}
+
+define  @ld1rqh_f16_imm( %pred, half* %addr) {
+; CHECK-LABEL: ld1rqh_f16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds half, half* %addr, i16 -8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8f16( %pred, half* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQW
+;
+
+define  @ld1rqw_i32( %pred, i32* %addr) {
+; CHECK-LABEL: ld1rqw_i32:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4i32( %pred, i32* %addr)
+  ret  %res
+}
+
+define  @ld1rqw_f32( %pred, float* %addr) {
+; CHECK-LABEL: ld1rqw_f32:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4f32( %pred, float* %addr)
+  ret  %res
+}
+
+define  @ld1rqw_i32_imm( %pred, i32* %addr) {
+; CHECK-LABEL: ld1rqw_i32_imm:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0, #112]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i32, i32* %addr, i32 28
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4i32( %pred, i32* %ptr)
+  ret  %res
+}
+
+define  @ld1rqw_f32_imm( %pred, float* %addr) {
+; CHECK-LABEL: ld1rqw_f32_imm:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0, #32]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds float, float* %addr, i32 8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4f32( %pred, float* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQD
+;
+
+define  @ld1rqd_i64( %pred, i64* %addr) {
+; CHECK-LABEL: ld1rqd_i64:
+; CHECK: ld1rqd { z0.d }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = 

[PATCH] D76929: [AArch64][SVE] Add SVE intrinsic for LD1RQ

2020-04-14 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin marked 2 inline comments as done.
kmclaughlin added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11592
 
+static SDValue performLD1RQCombine(SDNode *N, SelectionDAG ) {
+  SDLoc DL(N);

andwar wrote:
> [Nit] I think that this method could be simplified and made more explicit:
> 
> ```
> static SDValue performLD1RQCombine(SDNode *N, SelectionDAG ) {
>   SDLoc DL(N);
>   EVT VT = N->getValueType(0);
> 
>   EVT LoadVT = VT;
>   if (VT.isFloatingPoint())
> LoadVT = VT.changeTypeToInteger();
> 
>   SDValue Ops[] = {N->getOperand(0), N->getOperand(2), N->getOperand(3)};
>   SDValue Load = DAG.getNode(AArch64ISD::LD1RQ, DL, {LoadVT, MVT::Other}, 
> Ops);
> 
>   if (VT.isFloatingPoint()) {
> SDValue LoadChain = SDValue(Load.getNode(), 1);
> Load = DAG.getMergeValues(
> {DAG.getNode(ISD::BITCAST, DL, VT, Load), LoadChain}, DL);
>   }
> 
>   return Load;
> }
> ```
> 
> This way:
>  * there's only 1 `return` statement
>  * you are being explicit about preserving the chain when generating the bit 
> cast
>  * `Load` replaces a bit non-descriptive `L`
> 
> This is a matter of style, so please use what feels best. 
Done, these changes do make it neater :)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76929/new/

https://reviews.llvm.org/D76929



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D76078: [AArch64][SVE] Add a pass for SVE intrinsic optimisations

2020-04-14 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG36c76de6789c: [AArch64][SVE] Add a pass for SVE intrinsic 
optimisations (authored by kmclaughlin).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76078/new/

https://reviews.llvm.org/D76078

Files:
  llvm/lib/Target/AArch64/AArch64.h
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/lib/Target/AArch64/CMakeLists.txt
  llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
  llvm/test/CodeGen/AArch64/O3-pipeline.ll
  llvm/test/CodeGen/AArch64/sve-intrinsic-opts-ptest.ll
  llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
@@ -0,0 +1,203 @@
+; RUN: opt -S -sve-intrinsic-opts -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck --check-prefix OPT %s
+
+define  @reinterpret_test_h( %a) {
+; OPT-LABEL: @reinterpret_test_h(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_h_rev( %a) {
+; OPT-LABEL: @reinterpret_test_h_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_test_w( %a) {
+; OPT-LABEL: @reinterpret_test_w(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_w_rev( %a) {
+; OPT-LABEL: @reinterpret_test_w_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_test_d( %a) {
+; OPT-LABEL: @reinterpret_test_d(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_d_rev( %a) {
+; OPT-LABEL: @reinterpret_test_d_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_reductions(i32 %cond,  %a,  %b,  %c) {
+; OPT-LABEL: reinterpret_reductions
+; OPT-NOT: convert
+; OPT-NOT: phi 
+; OPT: phi  [ %a, %br_phi_a ], [ %b, %br_phi_b ], [ %c, %br_phi_c ]
+; OPT-NOT: convert
+; OPT: ret
+
+entry:
+  switch i32 %cond, label %br_phi_c [
+ i32 43, label %br_phi_a
+ i32 45, label %br_phi_b
+  ]
+
+br_phi_a:
+  %a1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  br label %join
+
+br_phi_b:
+  %b1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %b)
+  br label %join
+
+br_phi_c:
+  %c1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %c)
+  br label %join
+
+join:
+  %pg = phi  [ %a1, %br_phi_a ], [ %b1, %br_phi_b ], [ %c1, %br_phi_c ]
+  %pg1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %pg)
+  ret  %pg1
+}
+
+; No transform as the reinterprets are converting from different types (nxv2i1 & nxv4i1)
+; As the incoming values to the phi must all be the same type, we cannot remove the reinterprets.
+define  @reinterpret_reductions_1(i32 %cond,  %a,  %b,  %c) {
+; OPT-LABEL: reinterpret_reductions_1
+; OPT: convert
+; OPT: phi  [ %a1, %br_phi_a ], [ %b1, %br_phi_b ], [ %c1, %br_phi_c ]
+; OPT-NOT: phi 
+; OPT: tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %pg)
+; OPT: ret
+
+entry:
+  switch i32 %cond, label %br_phi_c [
+ i32 43, label %br_phi_a
+ i32 45, label %br_phi_b
+  ]
+
+br_phi_a:
+  %a1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  br label %join
+
+br_phi_b:
+  %b1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %b)
+  br label %join
+

[PATCH] D76078: [AArch64][SVE] Add a pass for SVE intrinsic optimisations

2020-04-09 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added a comment.

Ping :)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76078/new/

https://reviews.llvm.org/D76078



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77054: [AArch64][SVE] Add SVE intrinsics for saturating add & subtract

2020-04-06 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG944e322f8897: [AArch64][SVE] Add SVE intrinsics for 
saturating add  subtract (authored by kmclaughlin).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77054/new/

https://reviews.llvm.org/D77054

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
@@ -134,6 +134,82 @@
   ret  %out
 }
 
+; SQADD
+
+define  @sqadd_i8( %a,  %b) {
+; CHECK-LABEL: sqadd_i8:
+; CHECK: sqadd z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i16( %a,  %b) {
+; CHECK-LABEL: sqadd_i16:
+; CHECK: sqadd z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i32( %a,  %b) {
+; CHECK-LABEL: sqadd_i32:
+; CHECK: sqadd z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i64( %a,  %b) {
+; CHECK-LABEL: sqadd_i64:
+; CHECK: sqadd z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
+; SQSUB
+
+define  @sqsub_i8( %a,  %b) {
+; CHECK-LABEL: sqsub_i8:
+; CHECK: sqsub z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i16( %a,  %b) {
+; CHECK-LABEL: sqsub_i16:
+; CHECK: sqsub z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i32( %a,  %b) {
+; CHECK-LABEL: sqsub_i32:
+; CHECK: sqsub z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i64( %a,  %b) {
+; CHECK-LABEL: sqsub_i64:
+; CHECK: sqsub z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
 ; UDOT
 
 define  @udot_i32( %a,  %b,  %c) {
@@ -169,6 +245,82 @@
   ret  %out
 }
 
+; UQADD
+
+define  @uqadd_i8( %a,  %b) {
+; CHECK-LABEL: uqadd_i8:
+; CHECK: uqadd z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i16( %a,  %b) {
+; CHECK-LABEL: uqadd_i16:
+; CHECK: uqadd z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i32( %a,  %b) {
+; CHECK-LABEL: uqadd_i32:
+; CHECK: uqadd z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i64( %a,  %b) {
+; CHECK-LABEL: uqadd_i64:
+; CHECK: uqadd z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
+; UQSUB
+
+define  @uqsub_i8( %a,  %b) {
+; CHECK-LABEL: uqsub_i8:
+; CHECK: uqsub z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i16( %a,  %b) {
+; CHECK-LABEL: uqsub_i16:
+; CHECK: uqsub z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i32( %a,  %b) {
+; CHECK-LABEL: uqsub_i32:
+; CHECK: uqsub z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i64( %a,  %b) {
+; CHECK-LABEL: uqsub_i64:
+; CHECK: uqsub z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  

[PATCH] D77054: [AArch64][SVE] Add SVE intrinsics for saturating add & subtract

2020-04-03 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 254742.
kmclaughlin added a comment.

Moved patterns for the new intrinsics into the// sve_int_bin_cons_arit_0// and 
//sve_int_arith_imm0// multiclasses


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77054/new/

https://reviews.llvm.org/D77054

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
@@ -134,6 +134,82 @@
   ret  %out
 }
 
+; SQADD
+
+define  @sqadd_i8( %a,  %b) {
+; CHECK-LABEL: sqadd_i8:
+; CHECK: sqadd z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i16( %a,  %b) {
+; CHECK-LABEL: sqadd_i16:
+; CHECK: sqadd z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i32( %a,  %b) {
+; CHECK-LABEL: sqadd_i32:
+; CHECK: sqadd z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i64( %a,  %b) {
+; CHECK-LABEL: sqadd_i64:
+; CHECK: sqadd z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
+; SQSUB
+
+define  @sqsub_i8( %a,  %b) {
+; CHECK-LABEL: sqsub_i8:
+; CHECK: sqsub z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i16( %a,  %b) {
+; CHECK-LABEL: sqsub_i16:
+; CHECK: sqsub z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i32( %a,  %b) {
+; CHECK-LABEL: sqsub_i32:
+; CHECK: sqsub z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i64( %a,  %b) {
+; CHECK-LABEL: sqsub_i64:
+; CHECK: sqsub z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
 ; UDOT
 
 define  @udot_i32( %a,  %b,  %c) {
@@ -169,6 +245,82 @@
   ret  %out
 }
 
+; UQADD
+
+define  @uqadd_i8( %a,  %b) {
+; CHECK-LABEL: uqadd_i8:
+; CHECK: uqadd z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i16( %a,  %b) {
+; CHECK-LABEL: uqadd_i16:
+; CHECK: uqadd z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i32( %a,  %b) {
+; CHECK-LABEL: uqadd_i32:
+; CHECK: uqadd z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i64( %a,  %b) {
+; CHECK-LABEL: uqadd_i64:
+; CHECK: uqadd z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
+; UQSUB
+
+define  @uqsub_i8( %a,  %b) {
+; CHECK-LABEL: uqsub_i8:
+; CHECK: uqsub z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i16( %a,  %b) {
+; CHECK-LABEL: uqsub_i16:
+; CHECK: uqsub z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i32( %a,  %b) {
+; CHECK-LABEL: uqsub_i32:
+; CHECK: uqsub z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i64( %a,  %b) {
+; CHECK-LABEL: uqsub_i64:
+; CHECK: uqsub z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv2i64( %a,
+ 

[PATCH] D77054: [AArch64][SVE] Add SVE intrinsics for saturating add & subtract

2020-04-02 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 254558.
kmclaughlin added a comment.

Added patterns to AArch64SVEInstrInfo.td to support llvm.[s|u]add & 
llvm.[s|u]sub again, which was removed by my previous patch


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77054/new/

https://reviews.llvm.org/D77054

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
@@ -134,6 +134,82 @@
   ret  %out
 }
 
+; SQADD
+
+define  @sqadd_i8( %a,  %b) {
+; CHECK-LABEL: sqadd_i8:
+; CHECK: sqadd z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i16( %a,  %b) {
+; CHECK-LABEL: sqadd_i16:
+; CHECK: sqadd z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i32( %a,  %b) {
+; CHECK-LABEL: sqadd_i32:
+; CHECK: sqadd z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i64( %a,  %b) {
+; CHECK-LABEL: sqadd_i64:
+; CHECK: sqadd z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
+; SQSUB
+
+define  @sqsub_i8( %a,  %b) {
+; CHECK-LABEL: sqsub_i8:
+; CHECK: sqsub z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i16( %a,  %b) {
+; CHECK-LABEL: sqsub_i16:
+; CHECK: sqsub z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i32( %a,  %b) {
+; CHECK-LABEL: sqsub_i32:
+; CHECK: sqsub z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i64( %a,  %b) {
+; CHECK-LABEL: sqsub_i64:
+; CHECK: sqsub z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
 ; UDOT
 
 define  @udot_i32( %a,  %b,  %c) {
@@ -169,6 +245,82 @@
   ret  %out
 }
 
+; UQADD
+
+define  @uqadd_i8( %a,  %b) {
+; CHECK-LABEL: uqadd_i8:
+; CHECK: uqadd z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i16( %a,  %b) {
+; CHECK-LABEL: uqadd_i16:
+; CHECK: uqadd z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i32( %a,  %b) {
+; CHECK-LABEL: uqadd_i32:
+; CHECK: uqadd z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i64( %a,  %b) {
+; CHECK-LABEL: uqadd_i64:
+; CHECK: uqadd z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
+; UQSUB
+
+define  @uqsub_i8( %a,  %b) {
+; CHECK-LABEL: uqsub_i8:
+; CHECK: uqsub z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i16( %a,  %b) {
+; CHECK-LABEL: uqsub_i16:
+; CHECK: uqsub z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i32( %a,  %b) {
+; CHECK-LABEL: uqsub_i32:
+; CHECK: uqsub z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i64( %a,  %b) {
+; CHECK-LABEL: uqsub_i64:
+; CHECK: uqsub z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv2i64( %a,
+ 

[PATCH] D77054: [AArch64][SVE] Add SVE intrinsics for saturating add & subtract

2020-03-30 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, c-rhodes, dancgr, efriedma, 
cameron.mcinally.
Herald added subscribers: danielkiss, psnobl, rkruppe, hiraditya, 
kristof.beyls, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.

Adds the following intrinsics:

- @llvm.aarch64.sve.[s|u]qadd.x
- @llvm.aarch64.sve.[s|u]qsub.x


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D77054

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-int-arith.ll
  llvm/test/CodeGen/AArch64/sve-int-imm.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
@@ -134,6 +134,82 @@
   ret  %out
 }
 
+; SQADD
+
+define  @sqadd_i8( %a,  %b) {
+; CHECK-LABEL: sqadd_i8:
+; CHECK: sqadd z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i16( %a,  %b) {
+; CHECK-LABEL: sqadd_i16:
+; CHECK: sqadd z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i32( %a,  %b) {
+; CHECK-LABEL: sqadd_i32:
+; CHECK: sqadd z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @sqadd_i64( %a,  %b) {
+; CHECK-LABEL: sqadd_i64:
+; CHECK: sqadd z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqadd.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
+; SQSUB
+
+define  @sqsub_i8( %a,  %b) {
+; CHECK-LABEL: sqsub_i8:
+; CHECK: sqsub z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i16( %a,  %b) {
+; CHECK-LABEL: sqsub_i16:
+; CHECK: sqsub z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i32( %a,  %b) {
+; CHECK-LABEL: sqsub_i32:
+; CHECK: sqsub z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i64( %a,  %b) {
+; CHECK-LABEL: sqsub_i64:
+; CHECK: sqsub z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sqsub.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
 ; UDOT
 
 define  @udot_i32( %a,  %b,  %c) {
@@ -169,6 +245,82 @@
   ret  %out
 }
 
+; UQADD
+
+define  @uqadd_i8( %a,  %b) {
+; CHECK-LABEL: uqadd_i8:
+; CHECK: uqadd z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i16( %a,  %b) {
+; CHECK-LABEL: uqadd_i16:
+; CHECK: uqadd z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i32( %a,  %b) {
+; CHECK-LABEL: uqadd_i32:
+; CHECK: uqadd z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv4i32( %a,
+%b)
+  ret  %out
+}
+
+define  @uqadd_i64( %a,  %b) {
+; CHECK-LABEL: uqadd_i64:
+; CHECK: uqadd z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqadd.x.nxv2i64( %a,
+%b)
+  ret  %out
+}
+
+; UQSUB
+
+define  @uqsub_i8( %a,  %b) {
+; CHECK-LABEL: uqsub_i8:
+; CHECK: uqsub z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv16i8( %a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i16( %a,  %b) {
+; CHECK-LABEL: uqsub_i16:
+; CHECK: uqsub z0.h, z0.h, z1.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv8i16( %a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i32( %a,  %b) {
+; CHECK-LABEL: uqsub_i32:
+; CHECK: uqsub z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.uqsub.x.nxv4i32( %a,
+%b)
+  ret  

[PATCH] D76929: [AArch64][SVE] Add SVE intrinsic for LD1RQ

2020-03-27 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: andwar, sdesmalen, efriedma, cameron.mcinally, 
dancgr.
Herald added subscribers: danielkiss, psnobl, rkruppe, hiraditya, 
kristof.beyls, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.

Adds the following intrinsic for contiguous load & replicate:

- @llvm.aarch64.sve.ld1rq


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D76929

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/lib/Target/AArch64/AArch64InstrFormats.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
@@ -1,6 +1,141 @@
 ; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
 
 ;
+; LD1RQB
+;
+
+define  @ld1rqb_i8( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %addr)
+  ret  %res
+}
+
+define  @ld1rqb_i8_imm( %pred, i8* %addr) {
+; CHECK-LABEL: ld1rqb_i8_imm:
+; CHECK: ld1rqb { z0.b }, p0/z, [x0, #16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i8, i8* %addr, i8 16
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv16i8( %pred, i8* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQH
+;
+
+define  @ld1rqh_i16( %pred, i16* %addr) {
+; CHECK-LABEL: ld1rqh_i16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8i16( %pred, i16* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_f16( %pred, half* %addr) {
+; CHECK-LABEL: ld1rqh_f16:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8f16( %pred, half* %addr)
+  ret  %res
+}
+
+define  @ld1rqh_i16_imm( %pred, i16* %addr) {
+; CHECK-LABEL: ld1rqh_i16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-64]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i16, i16* %addr, i16 -32
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8i16( %pred, i16* %ptr)
+  ret  %res
+}
+
+define  @ld1rqh_f16_imm( %pred, half* %addr) {
+; CHECK-LABEL: ld1rqh_f16_imm:
+; CHECK: ld1rqh { z0.h }, p0/z, [x0, #-16]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds half, half* %addr, i16 -8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv8f16( %pred, half* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQW
+;
+
+define  @ld1rqw_i32( %pred, i32* %addr) {
+; CHECK-LABEL: ld1rqw_i32:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4i32( %pred, i32* %addr)
+  ret  %res
+}
+
+define  @ld1rqw_f32( %pred, float* %addr) {
+; CHECK-LABEL: ld1rqw_f32:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4f32( %pred, float* %addr)
+  ret  %res
+}
+
+define  @ld1rqw_i32_imm( %pred, i32* %addr) {
+; CHECK-LABEL: ld1rqw_i32_imm:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0, #112]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i32, i32* %addr, i32 28
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4i32( %pred, i32* %ptr)
+  ret  %res
+}
+
+define  @ld1rqw_f32_imm( %pred, float* %addr) {
+; CHECK-LABEL: ld1rqw_f32_imm:
+; CHECK: ld1rqw { z0.s }, p0/z, [x0, #32]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds float, float* %addr, i32 8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv4f32( %pred, float* %ptr)
+  ret  %res
+}
+
+;
+; LD1RQD
+;
+
+define  @ld1rqd_i64( %pred, i64* %addr) {
+; CHECK-LABEL: ld1rqd_i64:
+; CHECK: ld1rqd { z0.d }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv2i64( %pred, i64* %addr)
+  ret  %res
+}
+
+define  @ld1rqd_f64( %pred, double* %addr) {
+; CHECK-LABEL: ld1rqd_f64:
+; CHECK: ld1rqd { z0.d }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv2f64( %pred, double* %addr)
+  ret  %res
+}
+
+define  @ld1rqd_i64_imm( %pred, i64* %addr) {
+; CHECK-LABEL: ld1rqd_i64_imm:
+; CHECK: ld1rqd { z0.d }, p0/z, [x0, #64]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds i64, i64* %addr, i64 8
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv2i64( %pred, i64* %ptr)
+  ret  %res
+}
+
+define  @ld1rqd_f64_imm( %pred, double* %addr) {
+; CHECK-LABEL: ld1rqd_f64_imm:
+; CHECK: ld1rqd { z0.d }, p0/z, [x0, #-128]
+; CHECK-NEXT: ret
+  %ptr = getelementptr inbounds double, double* %addr, i64 -16
+  %res = call  @llvm.aarch64.sve.ld1rq.nxv2f64( %pred, double* %ptr)
+  ret  %res
+}
+
+;
 ; LDNT1B
 ;
 
@@ -79,6 +214,14 @@
   ret  %res
 }
 
+declare  @llvm.aarch64.sve.ld1rq.nxv16i8(, i8*)
+declare  @llvm.aarch64.sve.ld1rq.nxv8i16(, i16*)
+declare  @llvm.aarch64.sve.ld1rq.nxv4i32(, i32*)
+declare  @llvm.aarch64.sve.ld1rq.nxv2i64(, i64*)
+declare  @llvm.aarch64.sve.ld1rq.nxv8f16(, half*)
+declare  @llvm.aarch64.sve.ld1rq.nxv4f32(, 

[PATCH] D76078: [AArch64][SVE] Add a pass for SVE intrinsic optimisations

2020-03-25 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 252575.
kmclaughlin marked an inline comment as done.
kmclaughlin added a comment.

Use SmallSetVector for the list of functions gathered by runOnModule to 
preserve the order of iteration


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76078/new/

https://reviews.llvm.org/D76078

Files:
  llvm/lib/Target/AArch64/AArch64.h
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/lib/Target/AArch64/CMakeLists.txt
  llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
  llvm/test/CodeGen/AArch64/O3-pipeline.ll
  llvm/test/CodeGen/AArch64/sve-intrinsic-opts-ptest.ll
  llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
@@ -0,0 +1,203 @@
+; RUN: opt -S -sve-intrinsic-opts -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck --check-prefix OPT %s
+
+define  @reinterpret_test_h( %a) {
+; OPT-LABEL: @reinterpret_test_h(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_h_rev( %a) {
+; OPT-LABEL: @reinterpret_test_h_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_test_w( %a) {
+; OPT-LABEL: @reinterpret_test_w(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_w_rev( %a) {
+; OPT-LABEL: @reinterpret_test_w_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_test_d( %a) {
+; OPT-LABEL: @reinterpret_test_d(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_d_rev( %a) {
+; OPT-LABEL: @reinterpret_test_d_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_reductions(i32 %cond,  %a,  %b,  %c) {
+; OPT-LABEL: reinterpret_reductions
+; OPT-NOT: convert
+; OPT-NOT: phi 
+; OPT: phi  [ %a, %br_phi_a ], [ %b, %br_phi_b ], [ %c, %br_phi_c ]
+; OPT-NOT: convert
+; OPT: ret
+
+entry:
+  switch i32 %cond, label %br_phi_c [
+ i32 43, label %br_phi_a
+ i32 45, label %br_phi_b
+  ]
+
+br_phi_a:
+  %a1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  br label %join
+
+br_phi_b:
+  %b1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %b)
+  br label %join
+
+br_phi_c:
+  %c1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %c)
+  br label %join
+
+join:
+  %pg = phi  [ %a1, %br_phi_a ], [ %b1, %br_phi_b ], [ %c1, %br_phi_c ]
+  %pg1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %pg)
+  ret  %pg1
+}
+
+; No transform as the reinterprets are converting from different types (nxv2i1 & nxv4i1)
+; As the incoming values to the phi must all be the same type, we cannot remove the reinterprets.
+define  @reinterpret_reductions_1(i32 %cond,  %a,  %b,  %c) {
+; OPT-LABEL: reinterpret_reductions_1
+; OPT: convert
+; OPT: phi  [ %a1, %br_phi_a ], [ %b1, %br_phi_b ], [ %c1, %br_phi_c ]
+; OPT-NOT: phi 
+; OPT: tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %pg)
+; OPT: ret
+
+entry:
+  switch i32 %cond, label %br_phi_c [
+ i32 43, label %br_phi_a
+ i32 45, label %br_phi_b
+  ]
+
+br_phi_a:
+  %a1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  br label %join
+
+br_phi_b:
+  %b1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %b)
+  br label %join
+

[PATCH] D76688: [AArch64][SVE] Add SVE intrinsics for masked loads & stores

2020-03-25 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG05606329e235: [AArch64][SVE] Add SVE intrinsics for masked 
loads  stores (authored by kmclaughlin).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76688/new/

https://reviews.llvm.org/D76688

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/test/CodeGen/AArch64/sve-intrinsics-ldst1.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-ldst1.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-ldst1.ll
@@ -0,0 +1,182 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; LD1B
+;
+
+define  @ld1b_i8( %pred, i8* %addr) {
+; CHECK-LABEL: ld1b_i8:
+; CHECK: ld1b { z0.b }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv16i8( %pred,
+   i8* %addr)
+  ret  %res
+}
+
+;
+; LD1H
+;
+
+define  @ld1h_i16( %pred, i16* %addr) {
+; CHECK-LABEL: ld1h_i16:
+; CHECK: ld1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv8i16( %pred,
+   i16* %addr)
+  ret  %res
+}
+
+define  @ld1h_f16( %pred, half* %addr) {
+; CHECK-LABEL: ld1h_f16:
+; CHECK: ld1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv8f16( %pred,
+half* %addr)
+  ret  %res
+}
+
+;
+; LD1W
+;
+
+define  @ld1w_i32( %pred, i32* %addr) {
+; CHECK-LABEL: ld1w_i32:
+; CHECK: ld1w { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv4i32( %pred,
+   i32* %addr)
+  ret  %res
+}
+
+define  @ld1w_f32( %pred, float* %addr) {
+; CHECK-LABEL: ld1w_f32:
+; CHECK: ld1w { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv4f32( %pred,
+ float* %addr)
+  ret  %res
+}
+
+;
+; LD1D
+;
+
+define  @ld1d_i64( %pred, i64* %addr) {
+; CHECK-LABEL: ld1d_i64:
+; CHECK: ld1d { z0.d }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv2i64( %pred,
+   i64* %addr)
+  ret  %res
+}
+
+define  @ld1d_f64( %pred, double* %addr) {
+; CHECK-LABEL: ld1d_f64:
+; CHECK: ld1d { z0.d }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv2f64( %pred,
+  double* %addr)
+  ret  %res
+}
+
+;
+; ST1B
+;
+
+define void @st1b_i8( %data,  %pred, i8* %addr) {
+; CHECK-LABEL: st1b_i8:
+; CHECK: st1b { z0.b }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,
+   %pred,
+  i8* %addr)
+  ret void
+}
+
+;
+; ST1H
+;
+
+define void @st1h_i16( %data,  %pred, i16* %addr) {
+; CHECK-LABEL: st1h_i16:
+; CHECK: st1h { z0.h }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv8i16( %data,
+   %pred,
+  i16* %addr)
+  ret void
+}
+
+define void @st1h_f16( %data,  %pred, half* %addr) {
+; CHECK-LABEL: st1h_f16:
+; CHECK: st1h { z0.h }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv8f16( %data,
+   %pred,
+  half* %addr)
+  ret void
+}
+
+;
+; ST1W
+;
+
+define void @st1w_i32( %data,  %pred, i32* %addr) {
+; CHECK-LABEL: st1w_i32:
+; CHECK: st1w { z0.s }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv4i32( %data,
+   %pred,
+  i32* %addr)
+  ret void
+}
+
+define void @st1w_f32( %data,  %pred, float* %addr) {
+; CHECK-LABEL: st1w_f32:
+; CHECK: st1w { z0.s }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv4f32( %data,
+   %pred,
+  float* %addr)
+  ret void
+}
+
+;
+; ST1D
+;
+
+define void @st1d_i64( %data,  %pred, i64* %addr) {
+; CHECK-LABEL: st1d_i64:
+; CHECK: st1d { z0.d }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv2i64( %data,
+   %pred,
+  i64* %addr)
+  ret void
+}
+
+define void @st1d_f64( %data,  %pred, double* %addr) {
+; CHECK-LABEL: st1d_f64:
+; CHECK: st1d { z0.d }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv2f64( %data,
+   %pred,
+  double* %addr)
+  ret void
+}
+
+declare  @llvm.aarch64.sve.ld1.nxv16i8(, 

[PATCH] D76078: [AArch64][SVE] Add a pass for SVE intrinsic optimisations

2020-03-24 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 252368.
kmclaughlin marked 3 inline comments as done.
kmclaughlin added a comment.

Use SmallPtrSet instead of SmallVector for storing functions found by 
runOnModule
Add more comments to clarify the purpose of the pass and some of the negative 
reinterpret tests


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76078/new/

https://reviews.llvm.org/D76078

Files:
  llvm/lib/Target/AArch64/AArch64.h
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/lib/Target/AArch64/CMakeLists.txt
  llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
  llvm/test/CodeGen/AArch64/O3-pipeline.ll
  llvm/test/CodeGen/AArch64/sve-intrinsic-opts-ptest.ll
  llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
@@ -0,0 +1,203 @@
+; RUN: opt -S -sve-intrinsic-opts -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck --check-prefix OPT %s
+
+define  @reinterpret_test_h( %a) {
+; OPT-LABEL: @reinterpret_test_h(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_h_rev( %a) {
+; OPT-LABEL: @reinterpret_test_h_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_test_w( %a) {
+; OPT-LABEL: @reinterpret_test_w(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_w_rev( %a) {
+; OPT-LABEL: @reinterpret_test_w_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_test_d( %a) {
+; OPT-LABEL: @reinterpret_test_d(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_d_rev( %a) {
+; OPT-LABEL: @reinterpret_test_d_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_reductions(i32 %cond,  %a,  %b,  %c) {
+; OPT-LABEL: reinterpret_reductions
+; OPT-NOT: convert
+; OPT-NOT: phi 
+; OPT: phi  [ %a, %br_phi_a ], [ %b, %br_phi_b ], [ %c, %br_phi_c ]
+; OPT-NOT: convert
+; OPT: ret
+
+entry:
+  switch i32 %cond, label %br_phi_c [
+ i32 43, label %br_phi_a
+ i32 45, label %br_phi_b
+  ]
+
+br_phi_a:
+  %a1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  br label %join
+
+br_phi_b:
+  %b1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %b)
+  br label %join
+
+br_phi_c:
+  %c1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %c)
+  br label %join
+
+join:
+  %pg = phi  [ %a1, %br_phi_a ], [ %b1, %br_phi_b ], [ %c1, %br_phi_c ]
+  %pg1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %pg)
+  ret  %pg1
+}
+
+; No transform as the reinterprets are converting from different types (nxv2i1 & nxv4i1)
+; As the incoming values to the phi must all be the same type, we cannot remove the reinterprets.
+define  @reinterpret_reductions_1(i32 %cond,  %a,  %b,  %c) {
+; OPT-LABEL: reinterpret_reductions_1
+; OPT: convert
+; OPT: phi  [ %a1, %br_phi_a ], [ %b1, %br_phi_b ], [ %c1, %br_phi_c ]
+; OPT-NOT: phi 
+; OPT: tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %pg)
+; OPT: ret
+
+entry:
+  switch i32 %cond, label %br_phi_c [
+ i32 43, label %br_phi_a
+ i32 45, label %br_phi_b
+  ]
+
+br_phi_a:
+  %a1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  br label %join
+
+br_phi_b:
+  %b1 = tail call  

[PATCH] D76688: [AArch64][SVE] Add SVE intrinsics for masked loads & stores

2020-03-24 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, 
dancgr.
Herald added subscribers: danielkiss, psnobl, rkruppe, hiraditya, 
kristof.beyls, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.

Implements the following intrinsics for contiguous loads & stores:

- @llvm.aarch64.sve.ld1
- @llvm.aarch64.sve.st1


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D76688

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/test/CodeGen/AArch64/sve-intrinsics-ldst1.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-ldst1.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-ldst1.ll
@@ -0,0 +1,182 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+;
+; LD1B
+;
+
+define  @ld1b_i8( %pred, i8* %addr) {
+; CHECK-LABEL: ld1b_i8:
+; CHECK: ld1b { z0.b }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv16i8( %pred,
+   i8* %addr)
+  ret  %res
+}
+
+;
+; LD1H
+;
+
+define  @ld1h_i16( %pred, i16* %addr) {
+; CHECK-LABEL: ld1h_i16:
+; CHECK: ld1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv8i16( %pred,
+   i16* %addr)
+  ret  %res
+}
+
+define  @ld1h_f16( %pred, half* %addr) {
+; CHECK-LABEL: ld1h_f16:
+; CHECK: ld1h { z0.h }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv8f16( %pred,
+half* %addr)
+  ret  %res
+}
+
+;
+; LD1W
+;
+
+define  @ld1w_i32( %pred, i32* %addr) {
+; CHECK-LABEL: ld1w_i32:
+; CHECK: ld1w { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv4i32( %pred,
+   i32* %addr)
+  ret  %res
+}
+
+define  @ld1w_f32( %pred, float* %addr) {
+; CHECK-LABEL: ld1w_f32:
+; CHECK: ld1w { z0.s }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv4f32( %pred,
+ float* %addr)
+  ret  %res
+}
+
+;
+; LD1D
+;
+
+define  @ld1d_i64( %pred, i64* %addr) {
+; CHECK-LABEL: ld1d_i64:
+; CHECK: ld1d { z0.d }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv2i64( %pred,
+   i64* %addr)
+  ret  %res
+}
+
+define  @ld1d_f64( %pred, double* %addr) {
+; CHECK-LABEL: ld1d_f64:
+; CHECK: ld1d { z0.d }, p0/z, [x0]
+; CHECK-NEXT: ret
+  %res = call  @llvm.aarch64.sve.ld1.nxv2f64( %pred,
+  double* %addr)
+  ret  %res
+}
+
+;
+; ST1B
+;
+
+define void @st1b_i8( %data,  %pred, i8* %addr) {
+; CHECK-LABEL: st1b_i8:
+; CHECK: st1b { z0.b }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv16i8( %data,
+   %pred,
+  i8* %addr)
+  ret void
+}
+
+;
+; ST1H
+;
+
+define void @st1h_i16( %data,  %pred, i16* %addr) {
+; CHECK-LABEL: st1h_i16:
+; CHECK: st1h { z0.h }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv8i16( %data,
+   %pred,
+  i16* %addr)
+  ret void
+}
+
+define void @st1h_f16( %data,  %pred, half* %addr) {
+; CHECK-LABEL: st1h_f16:
+; CHECK: st1h { z0.h }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv8f16( %data,
+   %pred,
+  half* %addr)
+  ret void
+}
+
+;
+; ST1W
+;
+
+define void @st1w_i32( %data,  %pred, i32* %addr) {
+; CHECK-LABEL: st1w_i32:
+; CHECK: st1w { z0.s }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv4i32( %data,
+   %pred,
+  i32* %addr)
+  ret void
+}
+
+define void @st1w_f32( %data,  %pred, float* %addr) {
+; CHECK-LABEL: st1w_f32:
+; CHECK: st1w { z0.s }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv4f32( %data,
+   %pred,
+  float* %addr)
+  ret void
+}
+
+;
+; ST1D
+;
+
+define void @st1d_i64( %data,  %pred, i64* %addr) {
+; CHECK-LABEL: st1d_i64:
+; CHECK: st1d { z0.d }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv2i64( %data,
+   %pred,
+  i64* %addr)
+  ret void
+}
+
+define void @st1d_f64( %data,  %pred, double* %addr) {
+; CHECK-LABEL: st1d_f64:
+; CHECK: st1d { z0.d }, p0, [x0]
+; CHECK-NEXT: ret
+  call void @llvm.aarch64.sve.st1.nxv2f64( %data,
+

[PATCH] D76078: [AArch64][SVE] Add a pass for SVE intrinsic optimisations

2020-03-20 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added a comment.

Thanks for reviewing this, @efriedma & @andwar!




Comment at: llvm/lib/Target/AArch64/AArch64TargetMachine.cpp:441
+  // Expand any SVE vector library calls that we can't code generate directly.
+  bool ExpandToOptimize = (TM->getOptLevel() != CodeGenOpt::None);
+  if (EnableSVEIntrinsicOpts && TM->getOptLevel() == CodeGenOpt::Aggressive)

efriedma wrote:
> unused bool?
Removed



Comment at: llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp:120
+if (!Reinterpret ||
+RequiredType != Reinterpret->getArgOperand(0)->getType())
+  return false;

andwar wrote:
> Isn't it guaranteed that `RequiredType == 
> Reinterpret->getArgOperand(0)->getType()` is always true? I.e., `PN` and the 
> incoming values have identical type.
The incoming values to `PN` will all have the same type, but this is making 
sure that the reinterprets are all converting from the same type (there is a 
test for this in sve-intrinsic-opts-reinterpret.ll called 
`reinterpret_reductions_1`, where the arguments to convert.to.svbool are a mix 
of nxv2i1 and nxv4i1)



Comment at: llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp:224
+  bool Changed = false;
+  for (auto II = BB->begin(), IE = BB->end(); II != IE;) {
+Instruction *I = &(*II);

andwar wrote:
> 1. Could this be a for-range loop instead?
> 
> 2. This loop seems to be a perfect candidate for `make_early_inc_range` 
> (https://github.com/llvm/llvm-project/blob/172f1460ae05ab5c33c757142c8bdb10acfbdbe1/llvm/include/llvm/ADT/STLExtras.h#L499),
>  e.g.
> ```
>  for (Instruction  : make_early_inc_range(BB))
>   Changed |= optimizeIntrinsic();
> ```
Changed this to use `make_early_inc_range` as suggested



Comment at: llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp:234
+  DT = ().getDomTree();
+  bool Changed = false;
+

efriedma wrote:
> You might want to check whether the module actually declares any of the SVE 
> intrinsics before you iterate over the whole function.
Thanks for the suggestion - I changed this to a module pass so that we can 
check if any of the SVE intrinsics we are interested in are declared first.



Comment at: llvm/test/CodeGen/AArch64/sve-intrinsic-opts-ptest.ll:18
+; OPT-LABEL: ptest_any2
+; OPT: %out = call i1 @llvm.aarch64.sve.ptest.any.nxv16i1( 
%1,  %2)
+  %mask = tail call  @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)

andwar wrote:
> What's `%1` and `%2`? Is it worth adding the calls that generated them in the 
> expected output?
I think that would make sense. I've added `%1` and `%2` to the expected output 
and added more checks to the other tests here.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76078/new/

https://reviews.llvm.org/D76078



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D76078: [AArch64][SVE] Add a pass for SVE intrinsic optimisations

2020-03-20 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 251643.
kmclaughlin marked 13 inline comments as done.
kmclaughlin added a comment.

- Changed this from a function pass to a module pass & now check if any of the 
relevant SVE intrinsics are declared first before iterating over functions
- Added more checks on expected output in the tests


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76078/new/

https://reviews.llvm.org/D76078

Files:
  llvm/lib/Target/AArch64/AArch64.h
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/lib/Target/AArch64/CMakeLists.txt
  llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
  llvm/test/CodeGen/AArch64/O3-pipeline.ll
  llvm/test/CodeGen/AArch64/sve-intrinsic-opts-ptest.ll
  llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
@@ -0,0 +1,199 @@
+; RUN: opt -S -sve-intrinsic-opts -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck --check-prefix OPT %s
+
+define  @reinterpret_test_h( %a) {
+; OPT-LABEL: @reinterpret_test_h(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_h_rev( %a) {
+; OPT-LABEL: @reinterpret_test_h_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_test_w( %a) {
+; OPT-LABEL: @reinterpret_test_w(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_w_rev( %a) {
+; OPT-LABEL: @reinterpret_test_w_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_test_d( %a) {
+; OPT-LABEL: @reinterpret_test_d(
+; OPT-NOT: convert
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_d_rev( %a) {
+; OPT-LABEL: @reinterpret_test_d_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_reductions(i32 %cond,  %a,  %b,  %c) {
+; OPT-LABEL: reinterpret_reductions
+; OPT-NOT: convert
+; OPT-NOT: phi 
+; OPT: phi  [ %a, %br_phi_a ], [ %b, %br_phi_b ], [ %c, %br_phi_c ]
+; OPT-NOT: convert
+; OPT: ret
+
+entry:
+  switch i32 %cond, label %br_phi_c [
+ i32 43, label %br_phi_a
+ i32 45, label %br_phi_b
+  ]
+
+br_phi_a:
+  %a1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  br label %join
+
+br_phi_b:
+  %b1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %b)
+  br label %join
+
+br_phi_c:
+  %c1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %c)
+  br label %join
+
+join:
+  %pg = phi  [ %a1, %br_phi_a ], [ %b1, %br_phi_b ], [ %c1, %br_phi_c ]
+  %pg1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %pg)
+  ret  %pg1
+}
+
+define  @reinterpret_reductions_1(i32 %cond,  %a,  %b,  %c) {
+; OPT-LABEL: reinterpret_reductions_1
+; OPT: convert
+; OPT: phi  [ %a1, %br_phi_a ], [ %b1, %br_phi_b ], [ %c1, %br_phi_c ]
+; OPT-NOT: phi 
+; OPT: tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %pg)
+; OPT: ret
+
+entry:
+  switch i32 %cond, label %br_phi_c [
+ i32 43, label %br_phi_a
+ i32 45, label %br_phi_b
+  ]
+
+br_phi_a:
+  %a1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  br label %join
+
+br_phi_b:
+  %b1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %b)
+  br label %join
+
+br_phi_c:
+  %c1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %c)
+  br label 

[PATCH] D75690: [SVE][Inline-Asm] Add constraints for SVE ACLE types

2020-03-17 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGaf64948e2a05: [SVE][Inline-Asm] Add constraints for SVE ACLE 
types (authored by kmclaughlin).

Changed prior to commit:
  https://reviews.llvm.org/D75690?vs=249092=250720#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75690/new/

https://reviews.llvm.org/D75690

Files:
  clang/lib/Basic/Targets/AArch64.cpp
  clang/lib/Basic/Targets/AArch64.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/test/CodeGen/aarch64-sve-inline-asm-crash.c
  clang/test/CodeGen/aarch64-sve-inline-asm-datatypes.c
  clang/test/CodeGen/aarch64-sve-inline-asm-negative-test.c

Index: clang/test/CodeGen/aarch64-sve-inline-asm-negative-test.c
===
--- /dev/null
+++ clang/test/CodeGen/aarch64-sve-inline-asm-negative-test.c
@@ -0,0 +1,21 @@
+// REQUIRES: aarch64-registered-target
+
+// RUN: not %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns \
+// RUN:   -target-feature +neon -S -O1 -o - %s | FileCheck %s
+
+// Assembler error
+// Output constraint : Set a vector constraint on an integer
+__SVFloat32_t funcB2()
+{
+  __SVFloat32_t ret ;
+  asm volatile (
+"fmov %[ret], wzr \n"
+: [ret] "=w" (ret)
+:
+:);
+
+  return ret ;
+}
+
+// CHECK: funcB2
+// CHECK-ERROR: error: invalid operand for instruction
Index: clang/test/CodeGen/aarch64-sve-inline-asm-datatypes.c
===
--- /dev/null
+++ clang/test/CodeGen/aarch64-sve-inline-asm-datatypes.c
@@ -0,0 +1,252 @@
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns \
+// RUN:   -target-feature +neon -S -O1 -o - -emit-llvm %s | FileCheck %s
+
+// Tests to check that all sve datatypes can be passed in as input operands
+// and passed out as output operands.
+
+#define SVINT_TEST(DT, KIND)\
+DT func_int_##DT##KIND(DT in)\
+{\
+  DT out;\
+  asm volatile (\
+"ptrue p0.b\n"\
+"mov %[out]." #KIND ", p0/m, %[in]." #KIND "\n"\
+: [out] "=w" (out)\
+: [in] "w" (in)\
+: "p0"\
+);\
+  return out;\
+}
+
+SVINT_TEST(__SVUint8_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint8_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint8_t,s);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.s, p0/m, $1.s\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint8_t,d);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.d, p0/m, $1.d\0A", "=w,w,~{p0}"( %in)
+
+SVINT_TEST(__SVUint16_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint16_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint16_t,s);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.s, p0/m, $1.s\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint16_t,d);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.d, p0/m, $1.d\0A", "=w,w,~{p0}"( %in)
+
+SVINT_TEST(__SVUint32_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint32_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint32_t,s);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.s, p0/m, $1.s\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint32_t,d);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.d, p0/m, $1.d\0A", "=w,w,~{p0}"( %in)
+
+SVINT_TEST(__SVUint64_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint64_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint64_t,s);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.s, p0/m, $1.s\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint64_t,d);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.d, p0/m, $1.d\0A", "=w,w,~{p0}"( %in)
+
+SVINT_TEST(__SVInt8_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVInt8_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVInt8_t,s);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.s, p0/m, $1.s\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVInt8_t,d);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.d, p0/m, $1.d\0A", "=w,w,~{p0}"( %in)
+
+SVINT_TEST(__SVInt16_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVInt16_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVInt16_t,s);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.s, p0/m, 

[PATCH] D76078: [AArch64][SVE] Add a pass for SVE intrinsic optimisations

2020-03-12 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, 
c-rhodes.
Herald added subscribers: danielkiss, psnobl, rkruppe, hiraditya, 
kristof.beyls, tschuett, mgorny.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.

Creates the SVEIntrinsicOpts pass. In this patch, the pass tries
to remove unnecessary reinterpret intrinsics which convert to
and from svbool_t (llvm.aarch64.sve.convert.[to|from].svbool)

For example, the reinterprets below are redundant:

  %1 = call  
@llvm.aarch64.sve.convert.to.svbool.nxv4i1( %a)
  %2 = call  
@llvm.aarch64.sve.convert.from.svbool.nxv4i1( %1)

The pass also looks for ptest intrinsics and phi instructions where
the operands are being needlessly converted to and from svbool_t.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D76078

Files:
  llvm/lib/Target/AArch64/AArch64.h
  llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
  llvm/lib/Target/AArch64/CMakeLists.txt
  llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
  llvm/test/CodeGen/AArch64/O3-pipeline.ll
  llvm/test/CodeGen/AArch64/sve-intrinsic-opts-ptest.ll
  llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
@@ -0,0 +1,196 @@
+; RUN: opt -S -sve-intrinsicopts -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck --check-prefix OPT %s
+
+define  @reinterpret_test_h( %a) {
+; OPT-LABEL: @reinterpret_test_h(
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_h_rev( %a) {
+; OPT-LABEL: @reinterpret_test_h_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv8i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv8i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_test_w( %a) {
+; OPT-LABEL: @reinterpret_test_w(
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_w_rev( %a) {
+; OPT-LABEL: @reinterpret_test_w_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv4i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv4i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_test_d( %a) {
+; OPT-LABEL: @reinterpret_test_d(
+; OPT: ret  %a
+  %1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %1)
+  ret  %2
+}
+
+; Reinterprets are not redundant because the second reinterpret zeros the
+; lanes that don't exist within its input.
+define  @reinterpret_test_d_rev( %a) {
+; OPT-LABEL: @reinterpret_test_d_rev(
+; OPT: %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %a)
+; OPT-NEXT: %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %1)
+; OPT-NEXT: ret  %2
+  %1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %a)
+  %2 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %1)
+  ret  %2
+}
+
+define  @reinterpret_reductions(i32 %cond,  %a,  %b,  %c) {
+; OPT-LABEL: reinterpret_reductions
+; OPT-NOT: convert
+; OPT-NOT: phi 
+; OPT: phi  [ %a, %br_phi_a ], [ %b, %br_phi_b ], [ %c, %br_phi_c ]
+; OPT-NOT: convert
+; OPT: ret
+
+entry:
+  switch i32 %cond, label %br_phi_c [
+ i32 43, label %br_phi_a
+ i32 45, label %br_phi_b
+  ]
+
+br_phi_a:
+  %a1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %a)
+  br label %join
+
+br_phi_b:
+  %b1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %b)
+  br label %join
+
+br_phi_c:
+  %c1 = tail call  @llvm.aarch64.sve.convert.to.svbool.nxv2i1( %c)
+  br label %join
+
+join:
+  %pg = phi  [ %a1, %br_phi_a ], [ %b1, %br_phi_b ], [ %c1, %br_phi_c ]
+  %pg1 = tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %pg)
+  ret  %pg1
+}
+
+define  @reinterpret_reductions_1(i32 %cond,  %a,  %b,  %c) {
+; OPT-LABEL: reinterpret_reductions_1
+; OPT: convert
+; OPT: phi  [ %a1, %br_phi_a ], [ %b1, %br_phi_b ], [ %c1, %br_phi_c ]
+; OPT-NOT: phi 
+; OPT: tail call  @llvm.aarch64.sve.convert.from.svbool.nxv2i1( %pg)
+; OPT: ret
+
+entry:
+  switch i32 %cond, label 

[PATCH] D75858: [AArch64][SVE] Add SVE intrinsics for address calculations

2020-03-10 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG0bba37a32024: [AArch64][SVE] Add SVE intrinsics for address 
calculations (authored by kmclaughlin).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75858/new/

https://reviews.llvm.org/D75858

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-adr.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-adr.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-adr.ll
@@ -0,0 +1,101 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -verify-machineinstrs < %s | FileCheck %s
+
+;
+; ADRB
+;
+
+define  @adrb_i32( %a,  %b) {
+; CHECK-LABEL: adrb_i32:
+; CHECK: adr z0.s, [z0.s, z1.s]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrb.nxv4i32( %a,
+ %b)
+  ret  %out
+}
+
+define  @adrb_i64( %a,  %b) {
+; CHECK-LABEL: adrb_i64:
+; CHECK: adr z0.d, [z0.d, z1.d]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrb.nxv2i64( %a,
+ %b)
+  ret  %out
+}
+
+;
+; ADRH
+;
+
+define  @adrh_i32( %a,  %b) {
+; CHECK-LABEL: adrh_i32:
+; CHECK: adr z0.s, [z0.s, z1.s, lsl #1]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrh.nxv4i32( %a,
+ %b)
+  ret  %out
+}
+
+define  @adrh_i64( %a,  %b) {
+; CHECK-LABEL: adrh_i64:
+; CHECK: adr z0.d, [z0.d, z1.d, lsl #1]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrh.nxv2i64( %a,
+ %b)
+  ret  %out
+}
+
+;
+; ADRW
+;
+
+define  @adrw_i32( %a,  %b) {
+; CHECK-LABEL: adrw_i32:
+; CHECK: adr z0.s, [z0.s, z1.s, lsl #2]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrw.nxv4i32( %a,
+ %b)
+  ret  %out
+}
+
+define  @adrw_i64( %a,  %b) {
+; CHECK-LABEL: adrw_i64:
+; CHECK: adr z0.d, [z0.d, z1.d, lsl #2]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrw.nxv2i64( %a,
+ %b)
+  ret  %out
+}
+
+;
+; ADRD
+;
+
+define  @adrd_i32( %a,  %b) {
+; CHECK-LABEL: adrd_i32:
+; CHECK: adr z0.s, [z0.s, z1.s, lsl #3]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrd.nxv4i32( %a,
+ %b)
+  ret  %out
+}
+
+define  @adrd_i64( %a,  %b) {
+; CHECK-LABEL: adrd_i64:
+; CHECK: adr z0.d, [z0.d, z1.d, lsl #3]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrd.nxv2i64( %a,
+ %b)
+  ret  %out
+}
+
+declare  @llvm.aarch64.sve.adrb.nxv4i32(, )
+declare  @llvm.aarch64.sve.adrb.nxv2i64(, )
+
+declare  @llvm.aarch64.sve.adrh.nxv4i32(, )
+declare  @llvm.aarch64.sve.adrh.nxv2i64(, )
+
+declare  @llvm.aarch64.sve.adrw.nxv4i32(, )
+declare  @llvm.aarch64.sve.adrw.nxv2i64(, )
+
+declare  @llvm.aarch64.sve.adrd.nxv4i32(, )
+declare  @llvm.aarch64.sve.adrd.nxv2i64(, )
Index: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
===
--- llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -917,6 +917,24 @@
   defm ADR_LSL_ZZZ_S  : sve_int_bin_cons_misc_0_a_32_lsl<0b10, "adr">;
   defm ADR_LSL_ZZZ_D  : sve_int_bin_cons_misc_0_a_64_lsl<0b11, "adr">;
 
+  def : Pat<(nxv4i32 (int_aarch64_sve_adrb nxv4i32:$Op1, nxv4i32:$Op2)),
+(ADR_LSL_ZZZ_S_0 $Op1, $Op2)>;
+  def : Pat<(nxv4i32 (int_aarch64_sve_adrh nxv4i32:$Op1, nxv4i32:$Op2)),
+(ADR_LSL_ZZZ_S_1 $Op1, $Op2)>;
+  def : Pat<(nxv4i32 (int_aarch64_sve_adrw nxv4i32:$Op1, nxv4i32:$Op2)),
+(ADR_LSL_ZZZ_S_2 $Op1, $Op2)>;
+  def : Pat<(nxv4i32 (int_aarch64_sve_adrd nxv4i32:$Op1, nxv4i32:$Op2)),
+(ADR_LSL_ZZZ_S_3 $Op1, $Op2)>;
+
+  def : Pat<(nxv2i64 (int_aarch64_sve_adrb nxv2i64:$Op1, nxv2i64:$Op2)),
+(ADR_LSL_ZZZ_D_0 $Op1, $Op2)>;
+  def : Pat<(nxv2i64 (int_aarch64_sve_adrh nxv2i64:$Op1, nxv2i64:$Op2)),
+(ADR_LSL_ZZZ_D_1 $Op1, $Op2)>;
+  def : Pat<(nxv2i64 (int_aarch64_sve_adrw nxv2i64:$Op1, nxv2i64:$Op2)),
+(ADR_LSL_ZZZ_D_2 $Op1, $Op2)>;
+  def : Pat<(nxv2i64 (int_aarch64_sve_adrd nxv2i64:$Op1, nxv2i64:$Op2)),
+(ADR_LSL_ZZZ_D_3 $Op1, $Op2)>;
+
   defm TBL_ZZZ  : sve_int_perm_tbl<"tbl", AArch64tbl>;
 
   defm ZIP1_ZZZ : sve_int_perm_bin_perm_zz<0b000, "zip1", AArch64zip1>;
Index: llvm/include/llvm/IR/IntrinsicsAArch64.td
===
--- llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -1286,6 +1286,15 @@
 def int_aarch64_sve_index : 

[PATCH] D75858: [AArch64][SVE] Add SVE intrinsics for address calculations

2020-03-09 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, andwar, efriedma, dancgr, 
cameron.mcinally.
Herald added subscribers: danielkiss, psnobl, rkruppe, hiraditya, 
kristof.beyls, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.

Adds the @llvm.aarch64.sve.adr[b|h|w|d] intrinsics


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D75858

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/test/CodeGen/AArch64/sve-intrinsics-adr.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-adr.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-adr.ll
@@ -0,0 +1,101 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -verify-machineinstrs < %s | FileCheck %s
+
+;
+; ADRB
+;
+
+define  @adrb_i32( %a,  %b) {
+; CHECK-LABEL: adrb_i32:
+; CHECK: adr z0.s, [z0.s, z1.s]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrb.nxv4i32( %a,
+ %b)
+  ret  %out
+}
+
+define  @adrb_i64( %a,  %b) {
+; CHECK-LABEL: adrb_i64:
+; CHECK: adr z0.d, [z0.d, z1.d]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrb.nxv2i64( %a,
+ %b)
+  ret  %out
+}
+
+;
+; ADRH
+;
+
+define  @adrh_i32( %a,  %b) {
+; CHECK-LABEL: adrh_i32:
+; CHECK: adr z0.s, [z0.s, z1.s, lsl #1]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrh.nxv4i32( %a,
+ %b)
+  ret  %out
+}
+
+define  @adrh_i64( %a,  %b) {
+; CHECK-LABEL: adrh_i64:
+; CHECK: adr z0.d, [z0.d, z1.d, lsl #1]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrh.nxv2i64( %a,
+ %b)
+  ret  %out
+}
+
+;
+; ADRW
+;
+
+define  @adrw_i32( %a,  %b) {
+; CHECK-LABEL: adrw_i32:
+; CHECK: adr z0.s, [z0.s, z1.s, lsl #2]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrw.nxv4i32( %a,
+ %b)
+  ret  %out
+}
+
+define  @adrw_i64( %a,  %b) {
+; CHECK-LABEL: adrw_i64:
+; CHECK: adr z0.d, [z0.d, z1.d, lsl #2]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrw.nxv2i64( %a,
+ %b)
+  ret  %out
+}
+
+;
+; ADRD
+;
+
+define  @adrd_i32( %a,  %b) {
+; CHECK-LABEL: adrd_i32:
+; CHECK: adr z0.s, [z0.s, z1.s, lsl #3]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrd.nxv4i32( %a,
+ %b)
+  ret  %out
+}
+
+define  @adrd_i64( %a,  %b) {
+; CHECK-LABEL: adrd_i64:
+; CHECK: adr z0.d, [z0.d, z1.d, lsl #3]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.adrd.nxv2i64( %a,
+ %b)
+  ret  %out
+}
+
+declare  @llvm.aarch64.sve.adrb.nxv4i32(, )
+declare  @llvm.aarch64.sve.adrb.nxv2i64(, )
+
+declare  @llvm.aarch64.sve.adrh.nxv4i32(, )
+declare  @llvm.aarch64.sve.adrh.nxv2i64(, )
+
+declare  @llvm.aarch64.sve.adrw.nxv4i32(, )
+declare  @llvm.aarch64.sve.adrw.nxv2i64(, )
+
+declare  @llvm.aarch64.sve.adrd.nxv4i32(, )
+declare  @llvm.aarch64.sve.adrd.nxv2i64(, )
Index: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
===
--- llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -894,6 +894,24 @@
   defm ADR_LSL_ZZZ_S  : sve_int_bin_cons_misc_0_a_32_lsl<0b10, "adr">;
   defm ADR_LSL_ZZZ_D  : sve_int_bin_cons_misc_0_a_64_lsl<0b11, "adr">;
 
+  def : Pat<(nxv4i32 (int_aarch64_sve_adrb nxv4i32:$Op1, nxv4i32:$Op2)),
+(ADR_LSL_ZZZ_S_0 $Op1, $Op2)>;
+  def : Pat<(nxv4i32 (int_aarch64_sve_adrh nxv4i32:$Op1, nxv4i32:$Op2)),
+(ADR_LSL_ZZZ_S_1 $Op1, $Op2)>;
+  def : Pat<(nxv4i32 (int_aarch64_sve_adrw nxv4i32:$Op1, nxv4i32:$Op2)),
+(ADR_LSL_ZZZ_S_2 $Op1, $Op2)>;
+  def : Pat<(nxv4i32 (int_aarch64_sve_adrd nxv4i32:$Op1, nxv4i32:$Op2)),
+(ADR_LSL_ZZZ_S_3 $Op1, $Op2)>;
+
+  def : Pat<(nxv2i64 (int_aarch64_sve_adrb nxv2i64:$Op1, nxv2i64:$Op2)),
+(ADR_LSL_ZZZ_D_0 $Op1, $Op2)>;
+  def : Pat<(nxv2i64 (int_aarch64_sve_adrh nxv2i64:$Op1, nxv2i64:$Op2)),
+(ADR_LSL_ZZZ_D_1 $Op1, $Op2)>;
+  def : Pat<(nxv2i64 (int_aarch64_sve_adrw nxv2i64:$Op1, nxv2i64:$Op2)),
+(ADR_LSL_ZZZ_D_2 $Op1, $Op2)>;
+  def : Pat<(nxv2i64 (int_aarch64_sve_adrd nxv2i64:$Op1, nxv2i64:$Op2)),
+(ADR_LSL_ZZZ_D_3 $Op1, $Op2)>;
+
   defm TBL_ZZZ  : sve_int_perm_tbl<"tbl", AArch64tbl>;
 
   defm ZIP1_ZZZ : sve_int_perm_bin_perm_zz<0b000, "zip1", AArch64zip1>;
Index: llvm/include/llvm/IR/IntrinsicsAArch64.td
===
--- llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ 

[PATCH] D75690: [SVE][Inline-Asm] Add constraints for SVE ACLE types

2020-03-09 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 249092.
kmclaughlin added a comment.

- Added -emit-llvm to the RUN line of aarch64-sve-inline-asm-datatypes.c test. 
Added some more tests here for the Upl & y constraints and removed 
aarch64-sve-inline-asm-vec-low.c.
- Addressed formatting suggestions on previous patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75690/new/

https://reviews.llvm.org/D75690

Files:
  clang/lib/Basic/Targets/AArch64.cpp
  clang/lib/Basic/Targets/AArch64.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/test/CodeGen/aarch64-sve-inline-asm-crash.c
  clang/test/CodeGen/aarch64-sve-inline-asm-datatypes.c
  clang/test/CodeGen/aarch64-sve-inline-asm-negative-test.c

Index: clang/test/CodeGen/aarch64-sve-inline-asm-negative-test.c
===
--- /dev/null
+++ clang/test/CodeGen/aarch64-sve-inline-asm-negative-test.c
@@ -0,0 +1,21 @@
+// REQUIRES: aarch64-registered-target
+
+// RUN: not %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns \
+// RUN:   -target-feature +neon -S -O1 -o - %s | FileCheck %s
+
+// Assembler error
+// Output constraint : Set a vector constraint on an integer
+__SVFloat32_t funcB2()
+{
+  __SVFloat32_t ret ;
+  asm volatile (
+"fmov %[ret], wzr \n"
+: [ret] "=w" (ret)
+:
+:);
+
+  return ret ;
+}
+
+// CHECK: funcB2
+// CHECK-ERROR: error: invalid operand for instruction
Index: clang/test/CodeGen/aarch64-sve-inline-asm-datatypes.c
===
--- /dev/null
+++ clang/test/CodeGen/aarch64-sve-inline-asm-datatypes.c
@@ -0,0 +1,252 @@
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns \
+// RUN:   -target-feature +neon -S -O1 -o - -emit-llvm %s | FileCheck %s
+
+// Tests to check that all sve datatypes can be passed in as input operands
+// and passed out as output operands.
+
+#define SVINT_TEST(DT, KIND)\
+DT func_int_##DT##KIND(DT in)\
+{\
+  DT out;\
+  asm volatile (\
+"ptrue p0.b\n"\
+"mov %[out]." #KIND ", p0/m, %[in]." #KIND "\n"\
+: [out] "=w" (out)\
+: [in] "w" (in)\
+: "p0"\
+);\
+  return out;\
+}
+
+SVINT_TEST(__SVUint8_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint8_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint8_t,s);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.s, p0/m, $1.s\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint8_t,d);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.d, p0/m, $1.d\0A", "=w,w,~{p0}"( %in)
+
+SVINT_TEST(__SVUint16_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint16_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint16_t,s);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.s, p0/m, $1.s\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint16_t,d);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.d, p0/m, $1.d\0A", "=w,w,~{p0}"( %in)
+
+SVINT_TEST(__SVUint32_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint32_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint32_t,s);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.s, p0/m, $1.s\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint32_t,d);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.d, p0/m, $1.d\0A", "=w,w,~{p0}"( %in)
+
+SVINT_TEST(__SVUint64_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint64_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint64_t,s);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.s, p0/m, $1.s\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVUint64_t,d);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.d, p0/m, $1.d\0A", "=w,w,~{p0}"( %in)
+
+SVINT_TEST(__SVInt8_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVInt8_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVInt8_t,s);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.s, p0/m, $1.s\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVInt8_t,d);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.d, p0/m, $1.d\0A", "=w,w,~{p0}"( %in)
+
+SVINT_TEST(__SVInt16_t,b);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.b, p0/m, $1.b\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVInt16_t,h);
+// CHECK: call  asm sideeffect "ptrue p0.b\0Amov $0.h, p0/m, $1.h\0A", "=w,w,~{p0}"( %in)
+SVINT_TEST(__SVInt16_t,s);
+// CHECK: call  asm 

[PATCH] D75690: [SVE][Inline-Asm] Add constraints for SVE ACLE types

2020-03-09 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin marked 2 inline comments as done.
kmclaughlin added a comment.

Thanks for reviewing this, @efriedma!




Comment at: clang/lib/Basic/Targets/AArch64.h:95
+case 'U':   // Three-character constraint; add "@3" hint for later parsing.
+  R = std::string("@3") + std::string(Constraint, 3);
+  Constraint += 2;

efriedma wrote:
> Is "@3" some LLVM thing?  I don't think I've seen it before.
The "@" is used to indicate that this is a multi-letter constraint of a given 
length. This is parsed later by InlineAsm.cpp, see 
https://reviews.llvm.org/D66524



Comment at: clang/test/CodeGen/aarch64-sve-inline-asm-datatypes.c:2
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns \
+// RUN:   -target-feature +neon -S -O1 -o - %s | FileCheck %s
+

efriedma wrote:
> `REQUIRES: aarch64-registered-target` is necessary for any test that isn't 
> using -emit-llvm.
> 
> Generally, I'd prefer tests using -emit-llvm so we can independently verify 
> that the IR is what we expect, vs. the backend processing the IR the way we 
> expect.
I've added -emit-llvm here and updated the CHECK lines accordingly


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75690/new/

https://reviews.llvm.org/D75690



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75690: [SVE][Inline-Asm] Add constraints for SVE ACLE types

2020-03-05 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, huntergr, rovka, cameron.mcinally, 
efriedma.
Herald added subscribers: cfe-commits, psnobl, rkruppe, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: clang.
kmclaughlin added a parent revision: D75297: [TypeSize] Allow returning 
scalable size in implicit conversion to uint64_t.

Adds the constraints described below to ensure that we
can tie variables of SVE ACLE types to operands in inline-asm:

- y: SVE registers Z0-Z7
- Upl: One of the low eight SVE predicate registers (P0-P7)
- Upa: Full range of SVE predicate registers (P0-P15)


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D75690

Files:
  clang/lib/Basic/Targets/AArch64.cpp
  clang/lib/Basic/Targets/AArch64.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/test/CodeGen/aarch64-sve-inline-asm-crash.c
  clang/test/CodeGen/aarch64-sve-inline-asm-datatypes.c
  clang/test/CodeGen/aarch64-sve-inline-asm-negative-test.c
  clang/test/CodeGen/aarch64-sve-inline-asm-vec-low.c

Index: clang/test/CodeGen/aarch64-sve-inline-asm-vec-low.c
===
--- /dev/null
+++ clang/test/CodeGen/aarch64-sve-inline-asm-vec-low.c
@@ -0,0 +1,39 @@
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -W -Wall -o - %s | FileCheck %s
+
+#define svfcmla_test_f16(Zda, Zn, Zm, rot, imm3) \
+({ \
+__SVFloat16_t _Zn = Zn;\
+__SVFloat16_t _Zm = Zm;\
+__SVFloat16_t _res = Zda;\
+__asm__("fcmla %[__res].h, %[__Zn].h, %[__Zm].h[" #imm3 "], %[__rot]"\
+: [__res] "+w" (_res) \
+: [__Zn] "w" (_Zn), [__Zm] "y" (_Zm), [__rot] "i" (rot) \
+:  \
+);\
+_res; \
+})
+
+
+// CHECK: fcmla {{z[0-9]+\.h}}, {{z[0-9]+\.h}}, {{z[0-7]\.h}}{{\[[0-9]+\]}}, #270
+__SVFloat16_t test_svfcmla_lane_f16(__SVFloat16_t aZda, __SVFloat16_t aZn, __SVFloat16_t aZm) {
+return svfcmla_test_f16(aZda, aZn, aZm, 270, 0);
+}
+
+#define svfcmla_test_f32(Zda, Zn, Zm, rot, imm3) \
+({ \
+__SVFloat32_t _Zn = Zn;\
+__SVFloat32_t _Zm = Zm;\
+__SVFloat32_t _res = Zda;\
+__asm__("fcmla %[__res].s, %[__Zn].s, %[__Zm].s[" #imm3 "], %[__rot]"\
+: [__res] "+w" (_res) \
+: [__Zn] "w" (_Zn), [__Zm] "x" (_Zm), [__rot] "i" (rot) \
+:  \
+);\
+_res; \
+})
+
+
+// CHECK: fcmla {{z[0-9]+\.s}}, {{z[0-9]+\.s}}, {{z[0-9][0-5]?\.s}}{{\[[0-9]+\]}}, #270
+__SVFloat32_t test_svfcmla_lane_f(__SVFloat32_t aZda, __SVFloat32_t aZn, __SVFloat32_t aZm) {
+return svfcmla_test_f32(aZda, aZn, aZm, 270, 0);
+}
Index: clang/test/CodeGen/aarch64-sve-inline-asm-negative-test.c
===
--- /dev/null
+++ clang/test/CodeGen/aarch64-sve-inline-asm-negative-test.c
@@ -0,0 +1,19 @@
+// RUN: not %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns \
+// RUN:   -target-feature +neon -S -O1 -o - %s | FileCheck %s
+
+// Assembler error
+// Output constraint : Set a vector constraint on an integer
+__SVFloat32_t funcB2()
+{
+  __SVFloat32_t ret ;
+  asm volatile (
+"fmov %[ret], wzr \n"
+: [ret] "=w" (ret)
+:
+:);
+
+  return ret ;
+}
+
+// CHECK: funcB2
+// CHECK-ERROR: error: invalid operand for instruction
Index: clang/test/CodeGen/aarch64-sve-inline-asm-datatypes.c
===
--- /dev/null
+++ clang/test/CodeGen/aarch64-sve-inline-asm-datatypes.c
@@ -0,0 +1,209 @@
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns \
+// RUN:   -target-feature +neon -S -O1 -o - %s | FileCheck %s
+
+// Tests to check that all sve datatypes can be passed in as input operands
+// and passed out as output operands.
+
+#define SVINT_TEST(DT, KIND)\
+DT func_int_##DT##KIND(DT in)\
+{\
+  DT out;\
+  asm volatile (\
+"ptrue p0.b\n"\
+"mov %[out]." #KIND ", p0/m, %[in]." #KIND "\n"\
+: [out] "=w" (out)\
+: [in] "w" (in)\
+: "p0"\
+);\
+  return out;\
+}
+
+SVINT_TEST(__SVUint8_t,b);
+// CHECK: mov {{z[0-9]+}}.b, p0/m, {{z[0-9]+}}.b
+SVINT_TEST(__SVUint8_t,h);
+// CHECK: mov {{z[0-9]+}}.h, p0/m, {{z[0-9]+}}.h
+SVINT_TEST(__SVUint8_t,s);
+// CHECK: mov {{z[0-9]+}}.s, p0/m, {{z[0-9]+}}.s
+SVINT_TEST(__SVUint8_t,d);
+// CHECK: mov {{z[0-9]+}}.d, p0/m, {{z[0-9]+}}.d
+
+SVINT_TEST(__SVUint16_t,b);
+// CHECK: mov {{z[0-9]+}}.b, p0/m, {{z[0-9]+}}.b
+SVINT_TEST(__SVUint16_t,h);
+// CHECK: mov {{z[0-9]+}}.h, p0/m, {{z[0-9]+}}.h
+SVINT_TEST(__SVUint16_t,s);
+// CHECK: mov {{z[0-9]+}}.s, p0/m, {{z[0-9]+}}.s

[PATCH] D75160: [AArch64][SVE] Add SVE2 intrinsic for xar

2020-03-04 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGf5502c7035a9: [AArch64][SVE] Add SVE2 intrinsic for xar 
(authored by kmclaughlin).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75160/new/

https://reviews.llvm.org/D75160

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve2-bitwise-ternary.ll

Index: llvm/test/CodeGen/AArch64/sve2-bitwise-ternary.ll
===
--- llvm/test/CodeGen/AArch64/sve2-bitwise-ternary.ll
+++ llvm/test/CodeGen/AArch64/sve2-bitwise-ternary.ll
@@ -258,6 +258,50 @@
   ret  %res
 }
 
+;
+; XAR (vector, bitwise, unpredicated)
+;
+
+define  @xar_b( %a,  %b) {
+; CHECK-LABEL: xar_b:
+; CHECK: xar z0.b, z0.b, z1.b, #1
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.xar.nxv16i8( %a,
+%b,
+   i32 1)
+  ret  %out
+}
+
+define  @xar_h( %a,  %b) {
+; CHECK-LABEL: xar_h:
+; CHECK: xar z0.h, z0.h, z1.h, #2
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.xar.nxv8i16( %a,
+%b,
+   i32 2)
+  ret  %out
+}
+
+define  @xar_s( %a,  %b) {
+; CHECK-LABEL: xar_s:
+; CHECK: xar z0.s, z0.s, z1.s, #3
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.xar.nxv4i32( %a,
+%b,
+   i32 3)
+  ret  %out
+}
+
+define  @xar_d( %a,  %b) {
+; CHECK-LABEL: xar_d:
+; CHECK: xar z0.d, z0.d, z1.d, #4
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.xar.nxv2i64( %a,
+%b,
+   i32 4)
+  ret  %out
+}
+
 declare  @llvm.aarch64.sve.eor3.nxv16i8(,,)
 declare  @llvm.aarch64.sve.eor3.nxv8i16(,,)
 declare  @llvm.aarch64.sve.eor3.nxv4i32(,,)
@@ -282,3 +326,7 @@
 declare  @llvm.aarch64.sve.nbsl.nxv8i16(,,)
 declare  @llvm.aarch64.sve.nbsl.nxv4i32(,,)
 declare  @llvm.aarch64.sve.nbsl.nxv2i64(,,)
+declare  @llvm.aarch64.sve.xar.nxv16i8(, , i32)
+declare  @llvm.aarch64.sve.xar.nxv8i16(, , i32)
+declare  @llvm.aarch64.sve.xar.nxv4i32(, , i32)
+declare  @llvm.aarch64.sve.xar.nxv2i64(, , i32)
Index: llvm/lib/Target/AArch64/SVEInstrFormats.td
===
--- llvm/lib/Target/AArch64/SVEInstrFormats.td
+++ llvm/lib/Target/AArch64/SVEInstrFormats.td
@@ -3927,7 +3927,7 @@
   let ElementSize = ElementSizeNone;
 }
 
-multiclass sve2_int_rotate_right_imm {
+multiclass sve2_int_rotate_right_imm {
   def _B : sve2_int_rotate_right_imm<{0,0,0,1}, asm, ZPR8, vecshiftR8>;
   def _H : sve2_int_rotate_right_imm<{0,0,1,?}, asm, ZPR16, vecshiftR16> {
 let Inst{19} = imm{3};
@@ -3939,6 +3939,10 @@
 let Inst{22}= imm{5};
 let Inst{20-19} = imm{4-3};
   }
+  def : SVE_3_Op_Imm_Pat(NAME # _B)>;
+  def : SVE_3_Op_Imm_Pat(NAME # _H)>;
+  def : SVE_3_Op_Imm_Pat(NAME # _S)>;
+  def : SVE_3_Op_Imm_Pat(NAME # _D)>;
 }
 
 //===--===//
Index: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
===
--- llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -1908,7 +1908,7 @@
   defm NBSL_  : sve2_int_bitwise_ternary_op<0b111, "nbsl",  int_aarch64_sve_nbsl>;
 
   // SVE2 bitwise xor and rotate right by immediate
-  defm XAR_ZZZI : sve2_int_rotate_right_imm<"xar">;
+  defm XAR_ZZZI : sve2_int_rotate_right_imm<"xar", int_aarch64_sve_xar>;
 
   // SVE2 extract vector (immediate offset, constructive)
   def EXT_ZZI_B : sve2_int_perm_extract_i_cons<"ext">;
Index: llvm/include/llvm/IR/IntrinsicsAArch64.td
===
--- llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -2098,13 +2098,16 @@
 def int_aarch64_sve_pmullb_pair : AdvSIMD_2VectorArg_Intrinsic;
 def int_aarch64_sve_pmullt_pair : AdvSIMD_2VectorArg_Intrinsic;
 
+//
 // SVE2 bitwise ternary operations.
+//
 def int_aarch64_sve_eor3   : AdvSIMD_3VectorArg_Intrinsic;
 def int_aarch64_sve_bcax   : AdvSIMD_3VectorArg_Intrinsic;
 def int_aarch64_sve_bsl: AdvSIMD_3VectorArg_Intrinsic;
 def int_aarch64_sve_bsl1n  : AdvSIMD_3VectorArg_Intrinsic;
 def int_aarch64_sve_bsl2n  : AdvSIMD_3VectorArg_Intrinsic;
 def int_aarch64_sve_nbsl   : AdvSIMD_3VectorArg_Intrinsic;
+def int_aarch64_sve_xar: AdvSIMD_2VectorArgIndexed_Intrinsic;
 
 //
 // SVE2 - Optional AES, SHA-3 and SM4

[PATCH] D74912: [AArch64][SVE] Add SVE2 intrinsics for bit permutation & table lookup

2020-02-26 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin added a comment.

Thanks for reviewing this, @sdesmalen & @efriedma!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74912/new/

https://reviews.llvm.org/D74912



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D74912: [AArch64][SVE] Add SVE2 intrinsics for bit permutation & table lookup

2020-02-26 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG9c859fc54d92: [AArch64][SVE] Add SVE2 intrinsics for bit 
permutation  table lookup (authored by kmclaughlin).

Changed prior to commit:
  https://reviews.llvm.org/D74912?vs=246487=246661#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74912/new/

https://reviews.llvm.org/D74912

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve2-intrinsics-bit-permutation.ll
  llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll

Index: llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll
@@ -0,0 +1,181 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+
+;
+; TBL2
+;
+
+define  @tbl2_b( %a,  %unused,
+   %b,  %c) {
+; CHECK-LABEL: tbl2_b:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.b, { z1.b, z2.b }, z3.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv16i8( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_h( %a,  %unused,
+   %b,  %c) {
+; CHECK-LABEL: tbl2_h:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.h, { z1.h, z2.h }, z3.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv8i16( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_s( %a,  %unused,
+   %b,  %c) {
+; CHECK-LABEL: tbl2_s:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.s, { z1.s, z2.s }, z3.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv4i32( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_d( %a,  %unused,
+   %b,  %c) {
+; CHECK-LABEL: tbl2_d:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.d, { z1.d, z2.d }, z3.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv2i64( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_fh( %a,  %unused,
+ %b,  %c) {
+; CHECK-LABEL: tbl2_fh:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.h, { z1.h, z2.h }, z3.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv8f16( %a,
+  %b,
+  %c)
+  ret  %out
+}
+
+define  @tbl2_fs( %a,  %unused,
+  %b,  %c) {
+; CHECK-LABEL: tbl2_fs:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.s, { z1.s, z2.s }, z3.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv4f32( %a,
+   %b,
+   %c)
+  ret  %out
+}
+
+define  @tbl2_fd( %a,  %unused,
+   %b,  %c) {
+; CHECK-LABEL: tbl2_fd:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.d, { z1.d, z2.d }, z3.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv2f64( %a,
+%b,
+%c)
+  ret  %out
+}
+
+;
+; TBX
+;
+
+define  @tbx_b( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_b:
+; CHECK: tbx z0.b, z1.b, z2.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv16i8( %a,
+%b,
+%c)
+  ret  %out
+}
+
+define  @tbx_h( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_h:
+; CHECK: tbx z0.h, z1.h, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv8i16( %a,
+%b,
+%c)
+  ret  %out
+}
+
+define  @ftbx_h( %a,  %b,  %c) {
+; CHECK-LABEL: ftbx_h:
+; CHECK: tbx z0.h, z1.h, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv8f16( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbx_s( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_s:
+; CHECK: tbx z0.s, z1.s, z2.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv4i32( %a,
+

[PATCH] D75160: [AArch64][SVE] Add SVE2 intrinsic for xar

2020-02-26 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: andwar, c-rhodes, dancgr, efriedma.
Herald added subscribers: psnobl, rkruppe, hiraditya, kristof.beyls, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.

Implements the @llvm.aarch64.sve.xar intrinsic


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D75160

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve2-bitwise-ternary.ll

Index: llvm/test/CodeGen/AArch64/sve2-bitwise-ternary.ll
===
--- llvm/test/CodeGen/AArch64/sve2-bitwise-ternary.ll
+++ llvm/test/CodeGen/AArch64/sve2-bitwise-ternary.ll
@@ -258,6 +258,50 @@
   ret  %res
 }
 
+;
+; XAR (vector, bitwise, unpredicated)
+;
+
+define  @xar_b( %a,  %b) {
+; CHECK-LABEL: xar_b:
+; CHECK: xar z0.b, z0.b, z1.b, #1
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.xar.nxv16i8( %a,
+%b,
+   i32 1)
+  ret  %out
+}
+
+define  @xar_h( %a,  %b) {
+; CHECK-LABEL: xar_h:
+; CHECK: xar z0.h, z0.h, z1.h, #2
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.xar.nxv8i16( %a,
+%b,
+   i32 2)
+  ret  %out
+}
+
+define  @xar_s( %a,  %b) {
+; CHECK-LABEL: xar_s:
+; CHECK: xar z0.s, z0.s, z1.s, #3
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.xar.nxv4i32( %a,
+%b,
+   i32 3)
+  ret  %out
+}
+
+define  @xar_d( %a,  %b) {
+; CHECK-LABEL: xar_d:
+; CHECK: xar z0.d, z0.d, z1.d, #4
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.xar.nxv2i64( %a,
+%b,
+   i32 4)
+  ret  %out
+}
+
 declare  @llvm.aarch64.sve.eor3.nxv16i8(,,)
 declare  @llvm.aarch64.sve.eor3.nxv8i16(,,)
 declare  @llvm.aarch64.sve.eor3.nxv4i32(,,)
@@ -282,3 +326,7 @@
 declare  @llvm.aarch64.sve.nbsl.nxv8i16(,,)
 declare  @llvm.aarch64.sve.nbsl.nxv4i32(,,)
 declare  @llvm.aarch64.sve.nbsl.nxv2i64(,,)
+declare  @llvm.aarch64.sve.xar.nxv16i8(, , i32)
+declare  @llvm.aarch64.sve.xar.nxv8i16(, , i32)
+declare  @llvm.aarch64.sve.xar.nxv4i32(, , i32)
+declare  @llvm.aarch64.sve.xar.nxv2i64(, , i32)
Index: llvm/lib/Target/AArch64/SVEInstrFormats.td
===
--- llvm/lib/Target/AArch64/SVEInstrFormats.td
+++ llvm/lib/Target/AArch64/SVEInstrFormats.td
@@ -3878,7 +3878,7 @@
   let ElementSize = ElementSizeNone;
 }
 
-multiclass sve2_int_rotate_right_imm {
+multiclass sve2_int_rotate_right_imm {
   def _B : sve2_int_rotate_right_imm<{0,0,0,1}, asm, ZPR8, vecshiftR8>;
   def _H : sve2_int_rotate_right_imm<{0,0,1,?}, asm, ZPR16, vecshiftR16> {
 let Inst{19} = imm{3};
@@ -3890,6 +3890,10 @@
 let Inst{22}= imm{5};
 let Inst{20-19} = imm{4-3};
   }
+  def : SVE_3_Op_Imm_Pat(NAME # _B)>;
+  def : SVE_3_Op_Imm_Pat(NAME # _H)>;
+  def : SVE_3_Op_Imm_Pat(NAME # _S)>;
+  def : SVE_3_Op_Imm_Pat(NAME # _D)>;
 }
 
 //===--===//
Index: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
===
--- llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -1862,7 +1862,7 @@
   defm NBSL_  : sve2_int_bitwise_ternary_op<0b111, "nbsl",  int_aarch64_sve_nbsl>;
 
   // SVE2 bitwise xor and rotate right by immediate
-  defm XAR_ZZZI : sve2_int_rotate_right_imm<"xar">;
+  defm XAR_ZZZI : sve2_int_rotate_right_imm<"xar", int_aarch64_sve_xar>;
 
   // SVE2 extract vector (immediate offset, constructive)
   def EXT_ZZI_B : sve2_int_perm_extract_i_cons<"ext">;
Index: llvm/include/llvm/IR/IntrinsicsAArch64.td
===
--- llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -2021,13 +2021,16 @@
 def int_aarch64_sve_pmullb_pair : AdvSIMD_2VectorArg_Intrinsic;
 def int_aarch64_sve_pmullt_pair : AdvSIMD_2VectorArg_Intrinsic;
 
+//
 // SVE2 bitwise ternary operations.
+//
 def int_aarch64_sve_eor3   : AdvSIMD_3VectorArg_Intrinsic;
 def int_aarch64_sve_bcax   : AdvSIMD_3VectorArg_Intrinsic;
 def int_aarch64_sve_bsl: AdvSIMD_3VectorArg_Intrinsic;
 def int_aarch64_sve_bsl1n  : AdvSIMD_3VectorArg_Intrinsic;
 def int_aarch64_sve_bsl2n  : AdvSIMD_3VectorArg_Intrinsic;
 def int_aarch64_sve_nbsl   : AdvSIMD_3VectorArg_Intrinsic;
+def int_aarch64_sve_xar: AdvSIMD_2VectorArgIndexed_Intrinsic;
 
 //
 // SVE2 - Optional AES, 

[PATCH] D74912: [AArch64][SVE] Add SVE2 intrinsics for bit permutation & table lookup

2020-02-25 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 246487.
kmclaughlin added a comment.

Addressed review comments:

- Removed SelectTableSVE2 from AArch64ISelDAGToDAG.cpp and added tablegen 
patterns for the tbl2 intrinsic
- Updated tests to use operands that are not consecutive to ensure that the 
result is still two consecutive registers


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74912/new/

https://reviews.llvm.org/D74912

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve2-intrinsics-bit-permutation.ll
  llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll

Index: llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll
@@ -0,0 +1,181 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+
+;
+; TBL2
+;
+
+define  @tbl2_b( %a,  %unused,
+   %b,  %c) {
+; CHECK-LABEL: tbl2_b:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.b, { z1.b, z2.b }, z3.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv16i8( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_h( %a,  %unused,
+   %b,  %c) {
+; CHECK-LABEL: tbl2_h:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.h, { z1.h, z2.h }, z3.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv8i16( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_s( %a,  %unused,
+   %b,  %c) {
+; CHECK-LABEL: tbl2_s:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.s, { z1.s, z2.s }, z3.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv4i32( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_d( %a,  %unused,
+   %b,  %c) {
+; CHECK-LABEL: tbl2_d:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.d, { z1.d, z2.d }, z3.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv2i64( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_fh( %a,  %unused,
+ %b,  %c) {
+; CHECK-LABEL: tbl2_fh:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.h, { z1.h, z2.h }, z3.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv8f16( %a,
+  %b,
+  %c)
+  ret  %out
+}
+
+define  @tbl2_fs( %a,  %unused,
+  %b,  %c) {
+; CHECK-LABEL: tbl2_fs:
+; CHECK: z1.d, z0.d
+; CHECK-NEXT: tbl z0.s, { z1.s, z2.s }, z3.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv4f32( %a,
+   %b,
+   %c)
+  ret  %out
+}
+
+define  @tbl2_fd( %a,  %unused,
+   %b,  %c) {
+; CHECK-LABEL: tbl2_fd:
+; CHECK: mov z1.d, z0.d
+; CHECK-NEXT: tbl z0.d, { z1.d, z2.d }, z3.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv2f64( %a,
+%b,
+%c)
+  ret  %out
+}
+
+;
+; TBX
+;
+
+define  @tbx_b( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_b:
+; CHECK: tbx z0.b, z1.b, z2.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv16i8( %a,
+%b,
+%c)
+  ret  %out
+}
+
+define  @tbx_h( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_h:
+; CHECK: tbx z0.h, z1.h, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv8i16( %a,
+%b,
+%c)
+  ret  %out
+}
+
+define  @ftbx_h( %a,  %b,  %c) {
+; CHECK-LABEL: ftbx_h:
+; CHECK: tbx z0.h, z1.h, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv8f16( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbx_s( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_s:
+; CHECK: tbx z0.s, z1.s, z2.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv4i32( %a,
+ 

[PATCH] D74734: [AArch64][SVE] Add the SVE dupq_lane intrinsic

2020-02-24 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
kmclaughlin marked an inline comment as done.
Closed by commit rGf87f23c81cae: [AArch64][SVE] Add the SVE dupq_lane intrinsic 
(authored by kmclaughlin).

Changed prior to commit:
  https://reviews.llvm.org/D74734?vs=245012=246194#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74734/new/

https://reviews.llvm.org/D74734

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll

Index: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
===
--- llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
+++ llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
@@ -297,6 +297,179 @@
 }
 
 ;
+; DUPQ
+;
+
+define  @dupq_i8( %a) {
+; CHECK-LABEL: dupq_i8:
+; CHECK: mov z0.q, q0
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv16i8( %a, i64 0)
+  ret  %out
+}
+
+define  @dupq_i16( %a) {
+; CHECK-LABEL: dupq_i16:
+; CHECK: mov z0.q, z0.q[1]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv8i16( %a, i64 1)
+  ret  %out
+}
+
+define  @dupq_i32( %a) {
+; CHECK-LABEL: dupq_i32:
+; CHECK: mov z0.q, z0.q[2]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv4i32( %a, i64 2)
+  ret  %out
+}
+
+define  @dupq_i64( %a) {
+; CHECK-LABEL: dupq_i64:
+; CHECK: mov z0.q, z0.q[3]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv2i64( %a, i64 3)
+  ret  %out
+}
+
+define  @dupq_f16( %a) {
+; CHECK-LABEL: dupq_f16:
+; CHECK: mov z0.q, q0
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv8f16( %a, i64 0)
+  ret  %out
+}
+
+define  @dupq_f32( %a) {
+; CHECK-LABEL: dupq_f32:
+; CHECK: mov z0.q, z0.q[1]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv4f32( %a, i64 1)
+  ret  %out
+}
+
+define  @dupq_f64( %a) {
+; CHECK-LABEL: dupq_f64:
+; CHECK: mov z0.q, z0.q[2]
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv2f64( %a, i64 2)
+  ret  %out
+}
+
+;
+; DUPQ_LANE
+;
+
+define  @dupq_lane_i8( %a, i64 %idx) {
+; CHECK-LABEL: dupq_lane_i8:
+; CHECK-DAG:  index [[Z1:z[0-9]+]].d, #0, #1
+; CHECK-DAG:  and   [[Z2:z[0-9]+]].d, [[Z1]].d, #0x1
+; CHECK-DAG:  add   [[X1:x[0-9]+]], x0, x0
+; CHECK-DAG:  mov   [[Z3:z[0-9]+]].d, [[X1]]
+; CHECK:  add   [[Z4:z[0-9]+]].d, [[Z2]].d, [[Z3]].d
+; CHECK-NEXT: tbl   z0.d, { z0.d }, [[Z4]].d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv16i8( %a, i64 %idx)
+  ret  %out
+}
+
+; NOTE: Identical operation to dupq_lane_i8 (i.e. element type is irrelevant).
+define  @dupq_lane_i16( %a, i64 %idx) {
+; CHECK-LABEL: dupq_lane_i16:
+; CHECK-DAG:  index [[Z1:z[0-9]+]].d, #0, #1
+; CHECK-DAG:  and   [[Z2:z[0-9]+]].d, [[Z1]].d, #0x1
+; CHECK-DAG:  add   [[X1:x[0-9]+]], x0, x0
+; CHECK-DAG:  mov   [[Z3:z[0-9]+]].d, [[X1]]
+; CHECK:  add   [[Z4:z[0-9]+]].d, [[Z2]].d, [[Z3]].d
+; CHECK: tbl z0.d, { z0.d }, [[Z4]].d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv8i16( %a, i64 %idx)
+  ret  %out
+}
+
+; NOTE: Identical operation to dupq_lane_i8 (i.e. element type is irrelevant).
+define  @dupq_lane_i32( %a, i64 %idx) {
+; CHECK-LABEL: dupq_lane_i32:
+; CHECK-DAG:  index [[Z1:z[0-9]+]].d, #0, #1
+; CHECK-DAG:  and   [[Z2:z[0-9]+]].d, [[Z1]].d, #0x1
+; CHECK-DAG:  add   [[X1:x[0-9]+]], x0, x0
+; CHECK-DAG:  mov   [[Z3:z[0-9]+]].d, [[X1]]
+; CHECK:  add   [[Z4:z[0-9]+]].d, [[Z2]].d, [[Z3]].d
+; CHECK: tbl z0.d, { z0.d }, [[Z4]].d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv4i32( %a, i64 %idx)
+  ret  %out
+}
+
+; NOTE: Identical operation to dupq_lane_i8 (i.e. element type is irrelevant).
+define  @dupq_lane_i64( %a, i64 %idx) {
+; CHECK-LABEL: dupq_lane_i64:
+; CHECK-DAG:  index [[Z1:z[0-9]+]].d, #0, #1
+; CHECK-DAG:  and   [[Z2:z[0-9]+]].d, [[Z1]].d, #0x1
+; CHECK-DAG:  add   [[X1:x[0-9]+]], x0, x0
+; CHECK-DAG:  mov   [[Z3:z[0-9]+]].d, [[X1]]
+; CHECK:  add   [[Z4:z[0-9]+]].d, [[Z2]].d, [[Z3]].d
+; CHECK: tbl z0.d, { z0.d }, [[Z4]].d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv2i64( %a, i64 %idx)
+  ret  %out
+}
+
+; NOTE: Identical operation to dupq_lane_i8 (i.e. element type is irrelevant).
+define  @dupq_lane_f16( %a, i64 %idx) {
+; CHECK-LABEL: dupq_lane_f16:
+; CHECK-DAG:  index [[Z1:z[0-9]+]].d, #0, #1
+; CHECK-DAG:  and   [[Z2:z[0-9]+]].d, [[Z1]].d, #0x1
+; CHECK-DAG:  add   [[X1:x[0-9]+]], x0, x0
+; CHECK-DAG:  mov   [[Z3:z[0-9]+]].d, [[X1]]
+; CHECK:  add   [[Z4:z[0-9]+]].d, [[Z2]].d, [[Z3]].d
+; CHECK: tbl z0.d, { z0.d }, [[Z4]].d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.dupq.lane.nxv8f16( %a, i64 %idx)
+  ret  %out
+}
+
+; NOTE: Identical operation to dupq_lane_i8 (i.e. element type is irrelevant).
+define  @dupq_lane_f32( %a, i64 %idx) {
+; 

[PATCH] D74734: [AArch64][SVE] Add the SVE dupq_lane intrinsic

2020-02-24 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin marked 4 inline comments as done.
kmclaughlin added a comment.

Thanks for taking a look at this, @sdesmalen!




Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:7496
+  auto CIdx = dyn_cast(Idx128);
+  if (CIdx && (CIdx->getZExtValue() <= 3)) {
+auto CI = DAG.getTargetConstant(CIdx->getZExtValue(), DL, MVT::i64);

sdesmalen wrote:
> nit: can you replace `auto` in these cases with SDValue? (which I think this 
> is?)
Replaced other cases of auto here with SDValue or SDNode


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74734/new/

https://reviews.llvm.org/D74734



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D74833: [AArch64][SVE] Add intrinsics for SVE2 cryptographic instructions

2020-02-24 Thread Kerry McLaughlin via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGf2ff153401fa: [AArch64][SVE] Add intrinsics for SVE2 
cryptographic instructions (authored by kmclaughlin).

Changed prior to commit:
  https://reviews.llvm.org/D74833?vs=245393=246171#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74833/new/

https://reviews.llvm.org/D74833

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll

Index: llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve2-intrinsics-crypto.ll
@@ -0,0 +1,99 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2-aes,+sve2-sha3,+sve2-sm4 -asm-verbose=0 < %s | FileCheck %s
+
+;
+; AESD
+;
+
+define  @aesd_i8( %a,  %b) {
+; CHECK-LABEL: aesd_i8:
+; CHECK: aesd z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.aesd( %a,
+ %b)
+  ret  %out
+}
+
+;
+; AESIMC
+;
+
+define  @aesimc_i8( %a) {
+; CHECK-LABEL: aesimc_i8:
+; CHECK: aesimc z0.b, z0.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.aesimc( %a)
+  ret  %out
+}
+
+;
+; AESE
+;
+
+define  @aese_i8( %a,  %b) {
+; CHECK-LABEL: aese_i8:
+; CHECK: aese z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.aese( %a,
+ %b)
+  ret  %out
+}
+
+;
+; AESMC
+;
+
+define  @aesmc_i8( %a) {
+; CHECK-LABEL: aesmc_i8:
+; CHECK: aesmc z0.b, z0.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.aesmc( %a)
+  ret  %out
+}
+
+;
+; RAX1
+;
+
+define  @rax1_i64( %a,  %b) {
+; CHECK-LABEL: rax1_i64:
+; CHECK: rax1 z0.d, z0.d, z1.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.rax1( %a,
+ %b)
+  ret  %out
+}
+
+;
+; SM4E
+;
+
+define  @sm4e_i32( %a,  %b) {
+; CHECK-LABEL: sm4e_i32:
+; CHECK: sm4e z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sm4e( %a,
+ %b)
+  ret  %out
+}
+
+;
+; SM4EKEY
+;
+
+define  @sm4ekey_i32( %a,  %b) {
+; CHECK-LABEL: sm4ekey_i32:
+; CHECK: sm4ekey z0.s, z0.s, z1.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.sm4ekey( %a,
+%b)
+  ret  %out
+}
+
+
+declare  @llvm.aarch64.sve.aesd(, )
+declare  @llvm.aarch64.sve.aesimc()
+declare  @llvm.aarch64.sve.aese(, )
+declare  @llvm.aarch64.sve.aesmc()
+declare  @llvm.aarch64.sve.rax1(, )
+declare  @llvm.aarch64.sve.sm4e(, )
+declare  @llvm.aarch64.sve.sm4ekey(, )
Index: llvm/lib/Target/AArch64/SVEInstrFormats.td
===
--- llvm/lib/Target/AArch64/SVEInstrFormats.td
+++ llvm/lib/Target/AArch64/SVEInstrFormats.td
@@ -7101,6 +7101,12 @@
   let Inst{4-0}   = Zd;
 }
 
+multiclass sve2_crypto_cons_bin_op {
+  def NAME : sve2_crypto_cons_bin_op;
+  def : SVE_2_Op_Pat(NAME)>;
+}
+
 class sve2_crypto_des_bin_op opc, string asm, ZPRRegOp zprty>
 : I<(outs zprty:$Zdn), (ins zprty:$_Zdn, zprty:$Zm),
   asm, "\t$Zdn, $_Zdn, $Zm",
@@ -7118,8 +7124,14 @@
   let Constraints = "$Zdn = $_Zdn";
 }
 
-class sve2_crypto_unary_op
-: I<(outs ZPR8:$Zdn), (ins ZPR8:$_Zdn),
+multiclass sve2_crypto_des_bin_op opc, string asm, ZPRRegOp zprty,
+  SDPatternOperator op, ValueType vt> {
+  def NAME : sve2_crypto_des_bin_op;
+  def : SVE_2_Op_Pat(NAME)>;
+}
+
+class sve2_crypto_unary_op
+: I<(outs zprty:$Zdn), (ins zprty:$_Zdn),
   asm, "\t$Zdn, $_Zdn",
   "",
   []>, Sched<[]> {
@@ -7132,6 +7144,11 @@
   let Constraints = "$Zdn = $_Zdn";
 }
 
+multiclass sve2_crypto_unary_op {
+  def NAME : sve2_crypto_unary_op;
+  def : SVE_1_Op_Pat(NAME)>;
+}
+
 /// Addressing modes
 def am_sve_indexed_s4 :ComplexPattern", [], [SDNPWantRoot]>;
 def am_sve_indexed_s6 :ComplexPattern", [], [SDNPWantRoot]>;
Index: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
===
--- llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -1917,12 +1917,12 @@
 
 let Predicates = [HasSVE2AES] in {
   // SVE2 crypto destructive binary operations
-  def AESE_ZZZ_B : sve2_crypto_des_bin_op<0b00, "aese", ZPR8>;
-  def AESD_ZZZ_B : sve2_crypto_des_bin_op<0b01, "aesd", ZPR8>;
+  defm AESE_ZZZ_B : sve2_crypto_des_bin_op<0b00, "aese", ZPR8, int_aarch64_sve_aese, nxv16i8>;
+  defm AESD_ZZZ_B : sve2_crypto_des_bin_op<0b01, "aesd", ZPR8, int_aarch64_sve_aesd, nxv16i8>;
 
   // SVE2 crypto unary operations
-  def AESMC_ZZ_B  : sve2_crypto_unary_op<0b0, "aesmc">;
-  def AESIMC_ZZ_B : sve2_crypto_unary_op<0b1, "aesimc">;
+  defm AESMC_ZZ_B  : sve2_crypto_unary_op<0b0, 

[PATCH] D74912: [AArch64][SVE] Add SVE2 intrinsics for bit permutation & table lookup

2020-02-21 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin marked 4 inline comments as done.
kmclaughlin added a comment.

Thanks for reviewing this, @andwar!




Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:2035
+
+def int_aarch64_sve_bdep_x : AdvSIMD_2VectorArg_Intrinsic;
+def int_aarch64_sve_bext_x : AdvSIMD_2VectorArg_Intrinsic;

andwar wrote:
> What does `_x` mean here?
_x indicates that this is an unpredicated intrinsic.



Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:3592
+  if (VT == MVT::nxv16i8) {
+SelectTableSVE2(Node, 2, AArch64::TBL__B);
+return;

andwar wrote:
> `NumVecs` seems be always 2 in this patch. Will we need this to work for 
> other values in the future too?
> 
> [Nit] `2` is a bit of a magic number here. What about `2` -> `/*NumVecs=*/2`
I agree that it's not very clear what 2 is used for here. As NumVecs will 
always be the same value for the tbl2 intrinsic and SelectTableSVE2 is unlikely 
to be used for anything else, I've removed it from the list of parameters & 
added a comment there to explain the value used.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74912/new/

https://reviews.llvm.org/D74912



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D74912: [AArch64][SVE] Add SVE2 intrinsics for bit permutation & table lookup

2020-02-21 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin updated this revision to Diff 245835.
kmclaughlin added a comment.

- Removed NumVecs parameter from SelectTableSVE2 as the value is always the 
same (2)
- Removed unnecessary -asm-verbose=0 from the RUN line of 
sve2-intrinsics-bit-permutation.ll


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74912/new/

https://reviews.llvm.org/D74912

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve2-intrinsics-bit-permutation.ll
  llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll

Index: llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll
@@ -0,0 +1,167 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+
+;
+; TBL2
+;
+
+define  @tbl2_b( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_b:
+; CHECK: tbl z0.b, { z0.b, z1.b }, z2.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv16i8( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_h( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_h:
+; CHECK: tbl z0.h, { z0.h, z1.h }, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv8i16( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_s( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_s:
+; CHECK: tbl z0.s, { z0.s, z1.s }, z2.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv4i32( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_d( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_d:
+; CHECK: tbl z0.d, { z0.d, z1.d }, z2.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv2i64( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_fh( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_fh:
+; CHECK: tbl z0.h, { z0.h, z1.h }, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv8f16( %a,
+  %b,
+  %c)
+  ret  %out
+}
+
+define  @tbl2_fs( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_fs:
+; CHECK: tbl z0.s, { z0.s, z1.s }, z2.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv4f32( %a,
+   %b,
+   %c)
+  ret  %out
+}
+
+define  @tbl2_fd( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_fd:
+; CHECK: tbl z0.d, { z0.d, z1.d }, z2.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv2f64( %a,
+%b,
+%c)
+  ret  %out
+}
+
+;
+; TBX
+;
+
+define  @tbx_b( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_b:
+; CHECK: tbx z0.b, z1.b, z2.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv16i8( %a,
+%b,
+%c)
+  ret  %out
+}
+
+define  @tbx_h( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_h:
+; CHECK: tbx z0.h, z1.h, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv8i16( %a,
+%b,
+%c)
+  ret  %out
+}
+
+define  @ftbx_h( %a,  %b,  %c) {
+; CHECK-LABEL: ftbx_h:
+; CHECK: tbx z0.h, z1.h, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv8f16( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbx_s( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_s:
+; CHECK: tbx z0.s, z1.s, z2.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv4i32( %a,
+%b,
+%c)
+  ret  %out
+}
+
+define  @ftbx_s( %a,  %b,  %c) {
+; CHECK-LABEL: ftbx_s:
+; CHECK: tbx z0.s, z1.s, z2.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv4f32( %a,
+  %b,
+  %c)
+  ret  %out
+}
+
+define  @tbx_d( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_d:
+; CHECK: tbx z0.d, z1.d, z2.d
+; CHECK-NEXT: ret
+  

[PATCH] D74912: [AArch64][SVE] Add SVE2 intrinsics for bit permutation & table lookup

2020-02-20 Thread Kerry McLaughlin via Phabricator via cfe-commits
kmclaughlin created this revision.
kmclaughlin added reviewers: sdesmalen, andwar, dancgr, cameron.mcinally, 
efriedma.
Herald added subscribers: psnobl, rkruppe, hiraditya, kristof.beyls, tschuett.
Herald added a reviewer: rengolin.
Herald added a project: LLVM.

Implements the following intrinsics:

- @llvm.aarch64.sve.bdep.x
- @llvm.aarch64.sve.bext.x
- @llvm.aarch64.sve.bgrp.x
- @llvm.aarch64.sve.tbl2
- @llvm.aarch64.sve.tbx

The SelectTableSVE2 function in this patch is used to select the TBL2
intrinsic & ensures that the vector registers allocated are consecutive.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D74912

Files:
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
  llvm/lib/Target/AArch64/SVEInstrFormats.td
  llvm/test/CodeGen/AArch64/sve2-intrinsics-bit-permutation.ll
  llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll

Index: llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll
@@ -0,0 +1,167 @@
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+
+;
+; TBL2
+;
+
+define  @tbl2_b( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_b:
+; CHECK: tbl z0.b, { z0.b, z1.b }, z2.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv16i8( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_h( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_h:
+; CHECK: tbl z0.h, { z0.h, z1.h }, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv8i16( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_s( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_s:
+; CHECK: tbl z0.s, { z0.s, z1.s }, z2.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv4i32( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_d( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_d:
+; CHECK: tbl z0.d, { z0.d, z1.d }, z2.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv2i64( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbl2_fh( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_fh:
+; CHECK: tbl z0.h, { z0.h, z1.h }, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv8f16( %a,
+  %b,
+  %c)
+  ret  %out
+}
+
+define  @tbl2_fs( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_fs:
+; CHECK: tbl z0.s, { z0.s, z1.s }, z2.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv4f32( %a,
+   %b,
+   %c)
+  ret  %out
+}
+
+define  @tbl2_fd( %a,  %b,  %c) {
+; CHECK-LABEL: tbl2_fd:
+; CHECK: tbl z0.d, { z0.d, z1.d }, z2.d
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbl2.nxv2f64( %a,
+%b,
+%c)
+  ret  %out
+}
+
+;
+; TBX
+;
+
+define  @tbx_b( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_b:
+; CHECK: tbx z0.b, z1.b, z2.b
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv16i8( %a,
+%b,
+%c)
+  ret  %out
+}
+
+define  @tbx_h( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_h:
+; CHECK: tbx z0.h, z1.h, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv8i16( %a,
+%b,
+%c)
+  ret  %out
+}
+
+define  @ftbx_h( %a,  %b,  %c) {
+; CHECK-LABEL: ftbx_h:
+; CHECK: tbx z0.h, z1.h, z2.h
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv8f16( %a,
+ %b,
+ %c)
+  ret  %out
+}
+
+define  @tbx_s( %a,  %b,  %c) {
+; CHECK-LABEL: tbx_s:
+; CHECK: tbx z0.s, z1.s, z2.s
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.tbx.nxv4i32( %a,
+%b,
+%c)
+  ret  %out
+}
+
+define  @ftbx_s( %a,  %b,  %c) {
+; CHECK-LABEL: ftbx_s:
+; CHECK: tbx z0.s, z1.s, z2.s
+; CHECK-NEXT: ret
+  %out = call  

  1   2   3   >