[PATCH] D53633: [AArch64] Implement FP16FML intrinsics

2019-04-02 Thread Sjoerd Meijer via Phabricator via cfe-commits
SjoerdMeijer added a comment.

FYI: a new ACLE version has been published, please find it here:   
https://developer.arm.com/architectures/system-architectures/software-standards/acle

The "Neon Intrinsics" section contains these new intrinsics.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D53633/new/

https://reviews.llvm.org/D53633



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53633: [AArch64] Implement FP16FML intrinsics

2019-02-15 Thread Ahmed Bougacha via Phabricator via cfe-commits
ab added inline comments.



Comment at: cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c:12
+
+float32x2_t test_vfmlal_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
+// CHECK-LABEL: define <2 x float> @test_vfmlal_low_u32(<2 x float> %a, <4 x 
half> %b, <4 x half> %c)

SjoerdMeijer wrote:
> SjoerdMeijer wrote:
> > ab wrote:
> > > Hey folks, I'm curious: where does the "_u32" suffix come from? Should it 
> > > be _f16?
> > > 
> > > Also, are there any new ACLE/intrinsic list documents? As far as I can 
> > > tell there hasn't been any release since IHI0073B/IHI0053D.
> > > Also, are there any new ACLE/intrinsic list documents? As far as I can 
> > > tell there hasn't been any release since IHI0073B/IHI0053D.
> > 
> > I've checked, and an updated ACLE that includes these FP16FML intrinsics is 
> > coming soon.
> > 
> > > where does the "_u32" suffix come from? Should it be _f16?
> > 
> > Good question. It could probably be _f32 or _f16, but _u32 doesn't seem to 
> > make much sense. Looks like the spec says _u32, and that's also what GCC 
> > has implemented. I think we want to update the spec and fix the name before 
> > the updated spec is available. Will chase this, and let you know once I 
> > know more.
> An update on this: we should change this to _f32 (because the first suffixes 
> were refering to the ouput type). The ACLE will be updated accordingly, and 
> also GCC will change its current implementation (from _u32 to _f32).  Many 
> thanks for raising this issue.
> Is there a volunteer to prepare a patch? Or do you have one already? :-) I 
> could look at it, but that will be towards the end of next week.
> I've checked, and an updated ACLE that includes these FP16FML intrinsics is 
> coming soon.

Great, thanks!

> An update on this: we should change this to _f32 (because the first suffixes 
> were refering to the ouput type).

Hmm, I was thinking _f16 based on the vmlal intrinsics: they seem to be named 
after the multiplication type rather than that of the accumulator/output.

Either way seems fine to me though, I'll defer to you folks.

> The ACLE will be updated accordingly, and also GCC will change its current 
> implementation (from _u32 to _f32). Many thanks for raising this issue.
Is there a volunteer to prepare a patch? Or do you have one already? :-) I 
could look at it, but that will be towards the end of next week.

Sure: D58306 (with _f16 though, let me know what you think of vmlal)

Thanks for checking!


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D53633/new/

https://reviews.llvm.org/D53633



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53633: [AArch64] Implement FP16FML intrinsics

2019-02-15 Thread Sjoerd Meijer via Phabricator via cfe-commits
SjoerdMeijer added inline comments.



Comment at: cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c:12
+
+float32x2_t test_vfmlal_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
+// CHECK-LABEL: define <2 x float> @test_vfmlal_low_u32(<2 x float> %a, <4 x 
half> %b, <4 x half> %c)

SjoerdMeijer wrote:
> ab wrote:
> > Hey folks, I'm curious: where does the "_u32" suffix come from? Should it 
> > be _f16?
> > 
> > Also, are there any new ACLE/intrinsic list documents? As far as I can tell 
> > there hasn't been any release since IHI0073B/IHI0053D.
> > Also, are there any new ACLE/intrinsic list documents? As far as I can tell 
> > there hasn't been any release since IHI0073B/IHI0053D.
> 
> I've checked, and an updated ACLE that includes these FP16FML intrinsics is 
> coming soon.
> 
> > where does the "_u32" suffix come from? Should it be _f16?
> 
> Good question. It could probably be _f32 or _f16, but _u32 doesn't seem to 
> make much sense. Looks like the spec says _u32, and that's also what GCC has 
> implemented. I think we want to update the spec and fix the name before the 
> updated spec is available. Will chase this, and let you know once I know more.
An update on this: we should change this to _f32 (because the first suffixes 
were refering to the ouput type). The ACLE will be updated accordingly, and 
also GCC will change its current implementation (from _u32 to _f32).  Many 
thanks for raising this issue.
Is there a volunteer to prepare a patch? Or do you have one already? :-) I 
could look at it, but that will be towards the end of next week.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D53633/new/

https://reviews.llvm.org/D53633



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53633: [AArch64] Implement FP16FML intrinsics

2019-02-15 Thread Sjoerd Meijer via Phabricator via cfe-commits
SjoerdMeijer added inline comments.



Comment at: cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c:12
+
+float32x2_t test_vfmlal_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
+// CHECK-LABEL: define <2 x float> @test_vfmlal_low_u32(<2 x float> %a, <4 x 
half> %b, <4 x half> %c)

ab wrote:
> Hey folks, I'm curious: where does the "_u32" suffix come from? Should it be 
> _f16?
> 
> Also, are there any new ACLE/intrinsic list documents? As far as I can tell 
> there hasn't been any release since IHI0073B/IHI0053D.
> Also, are there any new ACLE/intrinsic list documents? As far as I can tell 
> there hasn't been any release since IHI0073B/IHI0053D.

I've checked, and an updated ACLE that includes these FP16FML intrinsics is 
coming soon.

> where does the "_u32" suffix come from? Should it be _f16?

Good question. It could probably be _f32 or _f16, but _u32 doesn't seem to make 
much sense. Looks like the spec says _u32, and that's also what GCC has 
implemented. I think we want to update the spec and fix the name before the 
updated spec is available. Will chase this, and let you know once I know more.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D53633/new/

https://reviews.llvm.org/D53633



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53633: [AArch64] Implement FP16FML intrinsics

2019-02-14 Thread Ahmed Bougacha via Phabricator via cfe-commits
ab added inline comments.
Herald added a subscriber: jdoerfert.
Herald added a project: LLVM.



Comment at: cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c:12
+
+float32x2_t test_vfmlal_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
+// CHECK-LABEL: define <2 x float> @test_vfmlal_low_u32(<2 x float> %a, <4 x 
half> %b, <4 x half> %c)

Hey folks, I'm curious: where does the "_u32" suffix come from? Should it be 
_f16?

Also, are there any new ACLE/intrinsic list documents? As far as I can tell 
there hasn't been any release since IHI0073B/IHI0053D.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D53633/new/

https://reviews.llvm.org/D53633



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53633: [AArch64] Implement FP16FML intrinsics

2018-10-25 Thread Bryan Chan via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL345344: [AArch64] Implement FP16FML intrinsics (authored by 
bryanpkc, committed by ).
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D53633?vs=170811=171230#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D53633

Files:
  cfe/trunk/include/clang/Basic/arm_neon.td
  cfe/trunk/include/clang/Basic/arm_neon_incl.td
  cfe/trunk/lib/Basic/Targets/AArch64.cpp
  cfe/trunk/lib/Basic/Targets/AArch64.h
  cfe/trunk/lib/CodeGen/CGBuiltin.cpp
  cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
  cfe/trunk/test/Preprocessor/aarch64-target-features.c
  cfe/trunk/utils/TableGen/NeonEmitter.cpp

Index: cfe/trunk/utils/TableGen/NeonEmitter.cpp
===
--- cfe/trunk/utils/TableGen/NeonEmitter.cpp
+++ cfe/trunk/utils/TableGen/NeonEmitter.cpp
@@ -494,6 +494,7 @@
 std::pair emitDagSaveTemp(DagInit *DI);
 std::pair emitDagSplat(DagInit *DI);
 std::pair emitDagDup(DagInit *DI);
+std::pair emitDagDupTyped(DagInit *DI);
 std::pair emitDagShuffle(DagInit *DI);
 std::pair emitDagCast(DagInit *DI, bool IsBitCast);
 std::pair emitDagCall(DagInit *DI);
@@ -897,6 +898,18 @@
 Float = true;
 ElementBitwidth = 16;
 break;
+  case '0':
+Float = true;
+if (AppliedQuad)
+  Bitwidth /= 2;
+ElementBitwidth = 16;
+break;
+  case '1':
+Float = true;
+if (!AppliedQuad)
+  Bitwidth *= 2;
+ElementBitwidth = 16;
+break;
   case 'g':
 if (AppliedQuad)
   Bitwidth /= 2;
@@ -1507,6 +1520,8 @@
 return emitDagShuffle(DI);
   if (Op == "dup")
 return emitDagDup(DI);
+  if (Op == "dup_typed")
+return emitDagDupTyped(DI);
   if (Op == "splat")
 return emitDagSplat(DI);
   if (Op == "save_temp")
@@ -1771,6 +1786,28 @@
   return std::make_pair(T, S);
 }
 
+std::pair Intrinsic::DagEmitter::emitDagDupTyped(DagInit *DI) {
+  assert_with_loc(DI->getNumArgs() == 2, "dup_typed() expects two arguments");
+  std::pair A = emitDagArg(DI->getArg(0),
+  DI->getArgNameStr(0));
+  std::pair B = emitDagArg(DI->getArg(1),
+  DI->getArgNameStr(1));
+  assert_with_loc(B.first.isScalar(),
+  "dup_typed() requires a scalar as the second argument");
+
+  Type T = A.first;
+  assert_with_loc(T.isVector(), "dup_typed() used but target type is scalar!");
+  std::string S = "(" + T.str() + ") {";
+  for (unsigned I = 0; I < T.getNumElements(); ++I) {
+if (I != 0)
+  S += ", ";
+S += B.second;
+  }
+  S += "}";
+
+  return std::make_pair(T, S);
+}
+
 std::pair Intrinsic::DagEmitter::emitDagSplat(DagInit *DI) {
   assert_with_loc(DI->getNumArgs() == 2, "splat() expects two arguments");
   std::pair A = emitDagArg(DI->getArg(0),
Index: cfe/trunk/include/clang/Basic/arm_neon.td
===
--- cfe/trunk/include/clang/Basic/arm_neon.td
+++ cfe/trunk/include/clang/Basic/arm_neon.td
@@ -206,6 +206,15 @@
 : Op<(call "vdot", $p0, $p1,
   (bitcast $p1, (splat(bitcast "uint32x4_t", $p2), $p3)))>;
 
+def OP_FMLAL_LN : Op<(call "vfmlal_low", $p0, $p1,
+   (dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
+def OP_FMLSL_LN : Op<(call "vfmlsl_low", $p0, $p1,
+   (dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
+def OP_FMLAL_LN_Hi  : Op<(call "vfmlal_high", $p0, $p1,
+   (dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
+def OP_FMLSL_LN_Hi  : Op<(call "vfmlsl_high", $p0, $p1,
+   (dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
+
 //===--===//
 // Instructions
 //===--===//
@@ -1640,3 +1649,21 @@
   // Variants indexing into a 128-bit vector are A64 only.
   def UDOT_LANEQ : SOpInst<"vdot_laneq", "dd89i", "iUiQiQUi", OP_DOT_LNQ>;
 }
+
+// v8.2-A FP16 fused multiply-add long instructions.
+let ArchGuard = "defined(__ARM_FEATURE_FP16FML) && defined(__aarch64__)" in {
+  def VFMLAL_LOW  : SInst<"vfmlal_low", "ffHH", "UiQUi">;
+  def VFMLSL_LOW  : SInst<"vfmlsl_low", "ffHH", "UiQUi">;
+  def VFMLAL_HIGH : SInst<"vfmlal_high", "ffHH", "UiQUi">;
+  def VFMLSL_HIGH : SInst<"vfmlsl_high", "ffHH", "UiQUi">;
+
+  def VFMLAL_LANE_LOW  : SOpInst<"vfmlal_lane_low", "ffH0i", "UiQUi", OP_FMLAL_LN>;
+  def VFMLSL_LANE_LOW  : SOpInst<"vfmlsl_lane_low", "ffH0i", "UiQUi", OP_FMLSL_LN>;
+  def VFMLAL_LANE_HIGH : SOpInst<"vfmlal_lane_high", "ffH0i", "UiQUi", OP_FMLAL_LN_Hi>;
+  def VFMLSL_LANE_HIGH : SOpInst<"vfmlsl_lane_high", "ffH0i", "UiQUi", OP_FMLSL_LN_Hi>;
+
+  def VFMLAL_LANEQ_LOW  : SOpInst<"vfmlal_laneq_low", "ffH1i", "UiQUi", OP_FMLAL_LN>;
+  def VFMLSL_LANEQ_LOW  : SOpInst<"vfmlsl_laneq_low", 

[PATCH] D53633: [AArch64] Implement FP16FML intrinsics

2018-10-24 Thread Bryan Chan via Phabricator via cfe-commits
bryanpkc added a comment.

In https://reviews.llvm.org/D53633#1274621, @t.p.northover wrote:

> I think this is reasonable.


Thanks Tim. Could you also review https://reviews.llvm.org/D53632, which is the 
LLVM part of this implementation?


Repository:
  rC Clang

https://reviews.llvm.org/D53633



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53633: [AArch64] Implement FP16FML intrinsics

2018-10-24 Thread Tim Northover via Phabricator via cfe-commits
t.p.northover accepted this revision.
t.p.northover added a comment.
This revision is now accepted and ready to land.

I think this is reasonable.


Repository:
  rC Clang

https://reviews.llvm.org/D53633



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53633: [AArch64] Implement FP16FML intrinsics

2018-10-23 Thread Bryan Chan via Phabricator via cfe-commits
bryanpkc created this revision.
bryanpkc added reviewers: SjoerdMeijer, bogden, efriedma, t.p.northover.
Herald added subscribers: cfe-commits, kristof.beyls, javed.absar.

Generate the FP16FML intrinsics into arm_neon.h (AArch64 only for now).
Add two new type modifiers to NeonEmitter to handle the new prototypes.
Define __ARM_FEATURE_FP16FML when +fp16fml is enabled and guard the
intrinsics with the macro in arm_neon.h.

Based on a patch by Gao Yiling.


Repository:
  rC Clang

https://reviews.llvm.org/D53633

Files:
  include/clang/Basic/arm_neon.td
  include/clang/Basic/arm_neon_incl.td
  lib/Basic/Targets/AArch64.cpp
  lib/Basic/Targets/AArch64.h
  lib/CodeGen/CGBuiltin.cpp
  test/CodeGen/aarch64-neon-fp16fml.c
  test/Preprocessor/aarch64-target-features.c
  utils/TableGen/NeonEmitter.cpp

Index: utils/TableGen/NeonEmitter.cpp
===
--- utils/TableGen/NeonEmitter.cpp
+++ utils/TableGen/NeonEmitter.cpp
@@ -494,6 +494,7 @@
 std::pair emitDagSaveTemp(DagInit *DI);
 std::pair emitDagSplat(DagInit *DI);
 std::pair emitDagDup(DagInit *DI);
+std::pair emitDagDupTyped(DagInit *DI);
 std::pair emitDagShuffle(DagInit *DI);
 std::pair emitDagCast(DagInit *DI, bool IsBitCast);
 std::pair emitDagCall(DagInit *DI);
@@ -897,6 +898,18 @@
 Float = true;
 ElementBitwidth = 16;
 break;
+  case '0':
+Float = true;
+if (AppliedQuad)
+  Bitwidth /= 2;
+ElementBitwidth = 16;
+break;
+  case '1':
+Float = true;
+if (!AppliedQuad)
+  Bitwidth *= 2;
+ElementBitwidth = 16;
+break;
   case 'g':
 if (AppliedQuad)
   Bitwidth /= 2;
@@ -1507,6 +1520,8 @@
 return emitDagShuffle(DI);
   if (Op == "dup")
 return emitDagDup(DI);
+  if (Op == "dup_typed")
+return emitDagDupTyped(DI);
   if (Op == "splat")
 return emitDagSplat(DI);
   if (Op == "save_temp")
@@ -1771,6 +1786,28 @@
   return std::make_pair(T, S);
 }
 
+std::pair Intrinsic::DagEmitter::emitDagDupTyped(DagInit *DI) {
+  assert_with_loc(DI->getNumArgs() == 2, "dup_typed() expects two arguments");
+  std::pair A = emitDagArg(DI->getArg(0),
+  DI->getArgNameStr(0));
+  std::pair B = emitDagArg(DI->getArg(1),
+  DI->getArgNameStr(1));
+  assert_with_loc(B.first.isScalar(),
+  "dup_typed() requires a scalar as the second argument");
+
+  Type T = A.first;
+  assert_with_loc(T.isVector(), "dup_typed() used but target type is scalar!");
+  std::string S = "(" + T.str() + ") {";
+  for (unsigned I = 0; I < T.getNumElements(); ++I) {
+if (I != 0)
+  S += ", ";
+S += B.second;
+  }
+  S += "}";
+
+  return std::make_pair(T, S);
+}
+
 std::pair Intrinsic::DagEmitter::emitDagSplat(DagInit *DI) {
   assert_with_loc(DI->getNumArgs() == 2, "splat() expects two arguments");
   std::pair A = emitDagArg(DI->getArg(0),
Index: test/Preprocessor/aarch64-target-features.c
===
--- test/Preprocessor/aarch64-target-features.c
+++ test/Preprocessor/aarch64-target-features.c
@@ -93,16 +93,20 @@
 // RUN: %clang -target aarch64-none-linux-gnu -march=armv8.2a+dotprod -x c -E -dM %s -o - | FileCheck --check-prefix=CHECK-DOTPROD %s
 // CHECK-DOTPROD: __ARM_FEATURE_DOTPROD 1
 
-// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16fml+fp16 -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
-// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16+fp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
-// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+fp16+nofp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
-// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
-// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16 -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
-// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16fml+fp16 -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
-// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16+fp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
-// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16+nofp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
-// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml -x c -E -dM %s -o - | FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
-// RUN: %clang -target