[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-15 Thread via cfe-commits

https://github.com/Lukacma closed 
https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-09 Thread via cfe-commits

https://github.com/Lukacma updated 
https://github.com/llvm/llvm-project/pull/128019



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-09 Thread Paul Walker via cfe-commits


@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   Builder.CreateStore(errorValue, swiftErrorTemp);
 }
 
+// Mfloat8 type is loaded as scalar type, but is treated as single
+// vector type for other operations. We need to bitcast it to the 
vector
+// type here.
+if (auto *EltTy =

paulwalker-arm wrote:

The underlying storage for `__mfp8` is an FPR and until we decide whether to 
use a dedicated target type, or LLVM gains an opaque 8-bit floating point type 
our only option is to represent it as an i8 vector type.

The reason for using `i8` was for some specific code reuse but as this PR 
showed, that reuse is not total and so I'd rather we just be honest and insert 
the relevant bitcasts when necessary.  This will put us in good stead if we 
decide to go the target type route.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-09 Thread via cfe-commits


@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   Builder.CreateStore(errorValue, swiftErrorTemp);
 }
 
+// Mfloat8 type is loaded as scalar type, but is treated as single
+// vector type for other operations. We need to bitcast it to the 
vector
+// type here.
+if (auto *EltTy =

Lukacma wrote:

I’m not yet confident in my understanding of the trade-offs between the two 
approaches, beside that one impacts target-specific code while the other 
affects target-independent code. As such, I don’t feel well-positioned to 
contribute meaningfully to this discussion. That said, I’d appreciate it if we 
could reach alignment here, as I’d like to merge this patch soon.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-09 Thread Paul Walker via cfe-commits


@@ -4179,9 +4183,19 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const CallExpr 
*E,
   unsigned IntrinsicID,
   bool IsZExtReturn) {
   QualType LangPTy = E->getArg(1)->getType();
-  llvm::Type *MemEltTy = CGM.getTypes().ConvertTypeForMem(
+  llvm::Type *MemEltTy = CGM.getTypes().ConvertType(
   LangPTy->castAs()->getPointeeType());
 
+  // Mfloat8 types is stored as a vector, so extra work
+  // to extract sclar element type is necessary.
+  if (MemEltTy->isVectorTy()) {
+#ifndef NDEBUG

paulwalker-arm wrote:

With the asserts simplified you should no longer require `NDEBUG`.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-08 Thread Momchil Velikov via cfe-commits


@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   Builder.CreateStore(errorValue, swiftErrorTemp);
 }
 
+// Mfloat8 type is loaded as scalar type, but is treated as single
+// vector type for other operations. We need to bitcast it to the 
vector
+// type here.
+if (auto *EltTy =

momchil-velikov wrote:

> Does the ABI say this?

It doesn't. Unfortunately this discussion was split and I didn't replicate all 
my comments here.

> Momchil Velikov  15 Apr at 16:11
> The ABI spec (naturally) does not say anything about <1 x i8> . It says (in a 
> somewhat obscure way) that the value > is passed in a FPR.
> And then clang/llvm decide to implement the ABI by mapping to <1 x T>.

I consider the "natural" mapping of `__mfp8` to LLVM types to be `i8` and `<1 x 
i8>` to be merely a hack coming from the peculiar way of implementing ABIs in 
clang/llvm (by implicit contracts and "mutual understading"). As such `<1 x 
i8>` out to be applicable only for values that are arguments passed in 
registers.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-07 Thread Paul Walker via cfe-commits

https://github.com/paulwalker-arm edited 
https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-07 Thread Paul Walker via cfe-commits


@@ -4226,9 +4242,21 @@ Value *CodeGenFunction::EmitSVEMaskedStore(const 
CallExpr *E,
SmallVectorImpl &Ops,
unsigned IntrinsicID) {
   QualType LangPTy = E->getArg(1)->getType();
-  llvm::Type *MemEltTy = CGM.getTypes().ConvertTypeForMem(
+  llvm::Type *MemEltTy = CGM.getTypes().ConvertType(
   LangPTy->castAs()->getPointeeType());
 
+  // Mfloat8 types is stored as a vector, so extra work
+  // to extract sclar element type is necessary.
+  if (MemEltTy->isVectorTy()) {
+#ifndef NDEBUG
+auto *VecTy = cast(MemEltTy);
+ElementCount EC = VecTy->getElementCount();
+assert(EC.isScalar() && VecTy->getElementType() == Int8Ty &&
+  "Only <1 x i8> expected");
+#endif

paulwalker-arm wrote:

As above.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-07 Thread Paul Walker via cfe-commits


@@ -4179,9 +4183,21 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const CallExpr 
*E,
   unsigned IntrinsicID,
   bool IsZExtReturn) {
   QualType LangPTy = E->getArg(1)->getType();
-  llvm::Type *MemEltTy = CGM.getTypes().ConvertTypeForMem(
+  llvm::Type *MemEltTy = CGM.getTypes().ConvertType(
   LangPTy->castAs()->getPointeeType());
 
+  // Mfloat8 types is stored as a vector, so extra work
+  // to extract sclar element type is necessary.
+  if (MemEltTy->isVectorTy()) {
+#ifndef NDEBUG
+auto *VecTy = cast(MemEltTy);
+ElementCount EC = VecTy->getElementCount();
+assert(EC.isScalar() && VecTy->getElementType() == Int8Ty &&
+  "Only <1 x i8> expected");
+#endif

paulwalker-arm wrote:

I think `assert(MemEltTy == FixedVectorType::get(Int8Ty, 1) && ` should 
work here?

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-07 Thread Paul Walker via cfe-commits

https://github.com/paulwalker-arm approved this pull request.

I've not verified every line of the test files but what I've seen looks good, 
as do the code changes.  Other than a few stylistic suggestions this looks good 
to me.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-07 Thread Paul Walker via cfe-commits


@@ -4179,9 +4183,21 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const CallExpr 
*E,
   unsigned IntrinsicID,
   bool IsZExtReturn) {
   QualType LangPTy = E->getArg(1)->getType();
-  llvm::Type *MemEltTy = CGM.getTypes().ConvertTypeForMem(
+  llvm::Type *MemEltTy = CGM.getTypes().ConvertType(

paulwalker-arm wrote:

Is this change necessary? ConvertTypeForMem should now return the same vector 
type for mfloat8? 

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-06 Thread via cfe-commits


@@ -2056,9 +2056,21 @@ void NeonEmitter::createIntrinsic(const Record *R,
   auto &Entry = IntrinsicMap[Name];
 
   for (auto &I : NewTypeSpecs) {
+
+// MFloat8 type is only available on AArch64. If encountered set ArchGuard
+// correctly.
+std::string savedArchGuard = ArchGuard;
+if (Type(I.first, ".").isMFloat8()) {
+  if (ArchGuard.empty()) {
+ArchGuard = "defined(__aarch64__)";
+  } else if (ArchGuard.find("defined(__aarch64__)") == std::string::npos) {
+ArchGuard = "defined(__aarch64__) && (" + savedArchGuard + ")";
+  }
+}
 Entry.emplace_back(R, Name, Proto, I.first, I.second, CK, Body, *this,
ArchGuard, TargetGuard, IsUnavailable, BigEndianSafe);
 Out.push_back(&Entry.back());
+ArchGuard = savedArchGuard;

Lukacma wrote:

Sorry ! Fixed now

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-06 Thread via cfe-commits


@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   Builder.CreateStore(errorValue, swiftErrorTemp);
 }
 
+// Mfloat8 type is loaded as scalar type, but is treated as single
+// vector type for other operations. We need to bitcast it to the 
vector
+// type here.
+if (auto *EltTy =

Lukacma wrote:

Done

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-02 Thread Paul Walker via cfe-commits


@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   Builder.CreateStore(errorValue, swiftErrorTemp);
 }
 
+// Mfloat8 type is loaded as scalar type, but is treated as single
+// vector type for other operations. We need to bitcast it to the 
vector
+// type here.
+if (auto *EltTy =

paulwalker-arm wrote:

Not sure what the fallout will be from this but I think the problem here is we 
should not have loaded a scalar in the first place.  Looking at 
`CodeGenTypes::ConvertTypeForMem()` I can see that we're using a different type 
for the memory representation than the normal one, which I think is a mistake.

Changing this so the types are consistent will remove the need for this code 
but I suspect it'll prompt further work elsewhere. My hope is that work sits in 
target specific areas relating to modelling the builtin so seem reasonable. 
Please shout though if it starts to get out of control.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-05-02 Thread Paul Walker via cfe-commits


@@ -2056,9 +2056,21 @@ void NeonEmitter::createIntrinsic(const Record *R,
   auto &Entry = IntrinsicMap[Name];
 
   for (auto &I : NewTypeSpecs) {
+
+// MFloat8 type is only available on AArch64. If encountered set ArchGuard
+// correctly.
+std::string savedArchGuard = ArchGuard;
+if (Type(I.first, ".").isMFloat8()) {
+  if (ArchGuard.empty()) {
+ArchGuard = "defined(__aarch64__)";
+  } else if (ArchGuard.find("defined(__aarch64__)") == std::string::npos) {
+ArchGuard = "defined(__aarch64__) && (" + savedArchGuard + ")";
+  }
+}
 Entry.emplace_back(R, Name, Proto, I.first, I.second, CK, Body, *this,
ArchGuard, TargetGuard, IsUnavailable, BigEndianSafe);
 Out.push_back(&Entry.back());
+ArchGuard = savedArchGuard;

paulwalker-arm wrote:

If you use the new variable (whose current name breaks the coding standard) in 
the call to `Entry.emplace_back` you'll not need to restore ArchGuard.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-29 Thread via cfe-commits

Lukacma wrote:

I have adjusted NeonEmitter to automatically emit correct attribute for mfloat8 
intrinsics and merged them to the original record.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-29 Thread via cfe-commits

https://github.com/Lukacma updated 
https://github.com/llvm/llvm-project/pull/128019

>From c331c4c260b6432b6ae96723f78c16b189e9297a Mon Sep 17 00:00:00 2001
From: Marian Lukac 
Date: Thu, 20 Feb 2025 15:35:45 +
Subject: [PATCH 1/2] [Clang][AArch64] Add fp8 variants for untyped NEON
 intrinsics

This patch adds fp8 variants to existing intrinsics, whose operation
doesn't depend on arguments being a specific type.
---
 clang/include/clang/Basic/arm_neon.td |   74 +-
 clang/lib/AST/Type.cpp|5 +
 clang/lib/CodeGen/CGCall.cpp  |9 +
 clang/lib/CodeGen/TargetBuiltins/ARM.cpp  |   20 +
 clang/lib/Sema/SemaInit.cpp   |2 +
 .../fp8-intrinsics/acle_neon_fp8_untyped.c| 1114 +
 6 files changed, 1220 insertions(+), 4 deletions(-)
 create mode 100644 
clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_untyped.c

diff --git a/clang/include/clang/Basic/arm_neon.td 
b/clang/include/clang/Basic/arm_neon.td
index ab0051efe5159..90f0e90e4a7f8 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -2090,17 +2090,17 @@ let ArchGuard = "defined(__aarch64__) || 
defined(__arm64ec__)", TargetGuard = "r
 
 // Lookup table read with 2-bit/4-bit indices
 let ArchGuard = "defined(__aarch64__)", TargetGuard = "lut" in {
-  def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcQcQUcQPc",
+  def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcmQcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_1>]>;
-  def VLUTI2_B_Q  : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcQcQUcQPc",
+  def VLUTI2_B_Q  : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcmQcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_3>]>;
   def VLUTI2_H: SInst<"vluti2_lane", "Q.(]>;
   def VLUTI2_H_Q  : SInst<"vluti2_laneq", "Q.(]>;
-  def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPc",
+  def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_0>]>;
-  def VLUTI4_B_Q  : SInst<"vluti4_laneq", "..UI", "QcQUcQPc",
+  def VLUTI4_B_Q  : SInst<"vluti4_laneq", "..UI", "QcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_1>]>;
   def VLUTI4_H_X2 : SInst<"vluti4_lane_x2", ".2(]>;
@@ -2194,4 +2194,70 @@ let ArchGuard = "defined(__aarch64__)", TargetGuard = 
"fp8,neon" in {
   // fscale
   def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">;
   def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">;
+}
+
+//FP8 versions of untyped intrinsics
+let ArchGuard = "defined(__aarch64__)" in {
+  def VGET_LANE_MF8 : IInst<"vget_lane", "1.I", "mQm", [ImmCheck<1, 
ImmCheckLaneIndex, 0>]>;
+  def SPLAT_MF8 : WInst<"splat_lane", ".(!q)I", "mQm", [ImmCheck<1, 
ImmCheckLaneIndex, 0>]>;
+  def SPLATQ_MF8 : WInst<"splat_laneq", ".(!Q)I", "mQm", [ImmCheck<1, 
ImmCheckLaneIndex, 0>]>;
+  def VSET_LANE_MF8 : IInst<"vset_lane", ".1.I", "mQm", [ImmCheck<2, 
ImmCheckLaneIndex, 1>]>;
+  def VCREATE_MF8 : NoTestOpInst<"vcreate", ".(IU>)", "m", OP_CAST> { let 
BigEndianSafe = 1; }
+  let InstName = "vmov" in {
+def VDUP_N_MF8 : WOpInst<"vdup_n", ".1", "mQm", OP_DUP>;
+def VMOV_N_MF8 : WOpInst<"vmov_n", ".1", "mQm", OP_DUP>;
+  }
+  let InstName = "" in
+def VDUP_LANE_MF8: WOpInst<"vdup_lane", ".qI", "mQm", OP_DUP_LN>;
+  def VCOMBINE_MF8 : NoTestOpInst<"vcombine", "Q..", "m", OP_CONC>;
+  let InstName = "vmov" in {
+def VGET_HIGH_MF8 : NoTestOpInst<"vget_high", ".Q", "m", OP_HI>;
+def VGET_LOW_MF8 : NoTestOpInst<"vget_low", ".Q", "m", OP_LO>;
+  }
+  let InstName = "vtbl" in {
+def VTBL1_MF8 : WInst<"vtbl1", "..p", "m">;
+def VTBL2_MF8 : WInst<"vtbl2", ".2p", "m">;
+def VTBL3_MF8 : WInst<"vtbl3", ".3p", "m">;
+def VTBL4_MF8 : WInst<"vtbl4", ".4p", "m">;
+  }
+  let InstName = "vtbx" in {
+def VTBX1_MF8 : WInst<"vtbx1", "...p", "m">;
+def VTBX2_MF8 : WInst<"vtbx2", "..2p", "m">;
+def VTBX3_MF8 : WInst<"vtbx3", "..3p", "m">;
+def VTBX4_MF8 : WInst<"vtbx4", "..4p", "m">;
+  }
+  def VEXT_MF8 : WInst<"vext", "...I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 
0>]>;
+  def VREV64_MF8 : WOpInst<"vrev64", "..", "mQm", OP_REV64>;
+  def VREV32_MF8 : WOpInst<"vrev32", "..", "mQm", OP_REV32>;
+  def VREV16_MF8 : WOpInst<"vrev16", "..", "mQm", OP_REV16>;
+  let isHiddenLInst = 1 in 
+  def VBSL_MF8 : SInst<"vbsl", ".U..", "mQm">;
+  def VTRN_MF8 : WInst<"vtrn", "2..", "mQm">;
+  def VZIP_MF8 : WInst<"vzip", "2..", "mQm">;
+  def VUZP_MF8 : WInst<"vuzp", "2..", "mQm">;
+  def COPY_LANE_MF8 : IOpInst<"vcopy_lane", "..I.I", "m", OP_COPY_LN>;
+  def COPYQ_LANE_MF8 : IOpInst<"vcopy_lane", "..IqI", "Qm", OP_COPY_LN>;
+  def COPY_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..IQI", "m", OP_COPY_LN>;
+  def COPYQ_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..I.I", "Qm", OP_COPY_LN>;
+  def VDUP_LANE2_MF8 : WOpInst<"vdup_laneq", ".QI", "mQm", OP_DUP_LN>;
+  def VTRN1_MF8 : SOpInst<"vtrn1", "...", "mQm", OP_TRN1>;
+  def VZIP1_MF8 : SOpInst<"vzip1", "..."

[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-29 Thread via cfe-commits

https://github.com/Lukacma updated 
https://github.com/llvm/llvm-project/pull/128019

>From c331c4c260b6432b6ae96723f78c16b189e9297a Mon Sep 17 00:00:00 2001
From: Marian Lukac 
Date: Thu, 20 Feb 2025 15:35:45 +
Subject: [PATCH] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics

This patch adds fp8 variants to existing intrinsics, whose operation
doesn't depend on arguments being a specific type.
---
 clang/include/clang/Basic/arm_neon.td |   74 +-
 clang/lib/AST/Type.cpp|5 +
 clang/lib/CodeGen/CGCall.cpp  |9 +
 clang/lib/CodeGen/TargetBuiltins/ARM.cpp  |   20 +
 clang/lib/Sema/SemaInit.cpp   |2 +
 .../fp8-intrinsics/acle_neon_fp8_untyped.c| 1114 +
 6 files changed, 1220 insertions(+), 4 deletions(-)
 create mode 100644 
clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_untyped.c

diff --git a/clang/include/clang/Basic/arm_neon.td 
b/clang/include/clang/Basic/arm_neon.td
index ab0051efe5159..90f0e90e4a7f8 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -2090,17 +2090,17 @@ let ArchGuard = "defined(__aarch64__) || 
defined(__arm64ec__)", TargetGuard = "r
 
 // Lookup table read with 2-bit/4-bit indices
 let ArchGuard = "defined(__aarch64__)", TargetGuard = "lut" in {
-  def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcQcQUcQPc",
+  def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcmQcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_1>]>;
-  def VLUTI2_B_Q  : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcQcQUcQPc",
+  def VLUTI2_B_Q  : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcmQcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_3>]>;
   def VLUTI2_H: SInst<"vluti2_lane", "Q.(]>;
   def VLUTI2_H_Q  : SInst<"vluti2_laneq", "Q.(]>;
-  def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPc",
+  def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_0>]>;
-  def VLUTI4_B_Q  : SInst<"vluti4_laneq", "..UI", "QcQUcQPc",
+  def VLUTI4_B_Q  : SInst<"vluti4_laneq", "..UI", "QcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_1>]>;
   def VLUTI4_H_X2 : SInst<"vluti4_lane_x2", ".2(]>;
@@ -2194,4 +2194,70 @@ let ArchGuard = "defined(__aarch64__)", TargetGuard = 
"fp8,neon" in {
   // fscale
   def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">;
   def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">;
+}
+
+//FP8 versions of untyped intrinsics
+let ArchGuard = "defined(__aarch64__)" in {
+  def VGET_LANE_MF8 : IInst<"vget_lane", "1.I", "mQm", [ImmCheck<1, 
ImmCheckLaneIndex, 0>]>;
+  def SPLAT_MF8 : WInst<"splat_lane", ".(!q)I", "mQm", [ImmCheck<1, 
ImmCheckLaneIndex, 0>]>;
+  def SPLATQ_MF8 : WInst<"splat_laneq", ".(!Q)I", "mQm", [ImmCheck<1, 
ImmCheckLaneIndex, 0>]>;
+  def VSET_LANE_MF8 : IInst<"vset_lane", ".1.I", "mQm", [ImmCheck<2, 
ImmCheckLaneIndex, 1>]>;
+  def VCREATE_MF8 : NoTestOpInst<"vcreate", ".(IU>)", "m", OP_CAST> { let 
BigEndianSafe = 1; }
+  let InstName = "vmov" in {
+def VDUP_N_MF8 : WOpInst<"vdup_n", ".1", "mQm", OP_DUP>;
+def VMOV_N_MF8 : WOpInst<"vmov_n", ".1", "mQm", OP_DUP>;
+  }
+  let InstName = "" in
+def VDUP_LANE_MF8: WOpInst<"vdup_lane", ".qI", "mQm", OP_DUP_LN>;
+  def VCOMBINE_MF8 : NoTestOpInst<"vcombine", "Q..", "m", OP_CONC>;
+  let InstName = "vmov" in {
+def VGET_HIGH_MF8 : NoTestOpInst<"vget_high", ".Q", "m", OP_HI>;
+def VGET_LOW_MF8 : NoTestOpInst<"vget_low", ".Q", "m", OP_LO>;
+  }
+  let InstName = "vtbl" in {
+def VTBL1_MF8 : WInst<"vtbl1", "..p", "m">;
+def VTBL2_MF8 : WInst<"vtbl2", ".2p", "m">;
+def VTBL3_MF8 : WInst<"vtbl3", ".3p", "m">;
+def VTBL4_MF8 : WInst<"vtbl4", ".4p", "m">;
+  }
+  let InstName = "vtbx" in {
+def VTBX1_MF8 : WInst<"vtbx1", "...p", "m">;
+def VTBX2_MF8 : WInst<"vtbx2", "..2p", "m">;
+def VTBX3_MF8 : WInst<"vtbx3", "..3p", "m">;
+def VTBX4_MF8 : WInst<"vtbx4", "..4p", "m">;
+  }
+  def VEXT_MF8 : WInst<"vext", "...I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 
0>]>;
+  def VREV64_MF8 : WOpInst<"vrev64", "..", "mQm", OP_REV64>;
+  def VREV32_MF8 : WOpInst<"vrev32", "..", "mQm", OP_REV32>;
+  def VREV16_MF8 : WOpInst<"vrev16", "..", "mQm", OP_REV16>;
+  let isHiddenLInst = 1 in 
+  def VBSL_MF8 : SInst<"vbsl", ".U..", "mQm">;
+  def VTRN_MF8 : WInst<"vtrn", "2..", "mQm">;
+  def VZIP_MF8 : WInst<"vzip", "2..", "mQm">;
+  def VUZP_MF8 : WInst<"vuzp", "2..", "mQm">;
+  def COPY_LANE_MF8 : IOpInst<"vcopy_lane", "..I.I", "m", OP_COPY_LN>;
+  def COPYQ_LANE_MF8 : IOpInst<"vcopy_lane", "..IqI", "Qm", OP_COPY_LN>;
+  def COPY_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..IQI", "m", OP_COPY_LN>;
+  def COPYQ_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..I.I", "Qm", OP_COPY_LN>;
+  def VDUP_LANE2_MF8 : WOpInst<"vdup_laneq", ".QI", "mQm", OP_DUP_LN>;
+  def VTRN1_MF8 : SOpInst<"vtrn1", "...", "mQm", OP_TRN1>;
+  def VZIP1_MF8 : SOpInst<"vzip1", "...", "mQ

[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-29 Thread Paul Walker via cfe-commits

paulwalker-arm wrote:

> @paulwalker-arm the reasoning behind creating separate records, is that 
> mfloat type is not available for aarch32 architectures and therefore all 
> intrinsics using it need to be gated behind `ArchGuard = 
> "defined(__aarch64__)"` .

I see.  How practical would it be for NEONEmitter to infer the ArchGuard based 
on the type? I'm assuming ArchGuard is either unset of set to what we need for 
all the cases we care about.  This is not a firm ask but it would be nice to 
reuse the existing definitions if possible.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-28 Thread via cfe-commits

Lukacma wrote:

@paulwalker-arm the reasoning behind creating separate records, is that mfloat 
type is not available for aarch32 architectures and therefore all intrinsics 
using it need to be gated behind `ArchGuard = "defined(__aarch64__)"` .

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-23 Thread via cfe-commits

Lukacma wrote:

> For my education can you explain why the fp8 variants are broken out into 
> their own definitions. Taking VREV64_MF8 as an example, it looks like you 
> should be able to add the new type strings to the current definition?

That's a good question. Its been a while since I implemented this patch, so I 
have forgotten my reasoning behind this, but I think this might be because I 
was originally not sure if we want to target guard these, behind fp8 feature 
flag. Since it looks like we are not doing that I can merge them back to their 
original intrinsics

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-17 Thread Paul Walker via cfe-commits


@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   Builder.CreateStore(errorValue, swiftErrorTemp);
 }
 
+// Mfloat8 type is loaded as scalar type, but is treated as single
+// vector type for other operations. We need to bitcast it to the 
vector
+// type here.
+if (auto *EltTy =

paulwalker-arm wrote:

Does the ABI say this?  My understand is that values of type _mfp8 are 
floating-point 8-bit values that are passes as _mfp8. The pretend it's an `i8` 
in some cases and `<1 x i8>` in others is purely an implementation detail 
within clang.

This is not to say the code is invalid, but we should be cautious with how far 
down the rabbit hole we go.

FYI: As part of @MacDue's work to improve streaming-mode code generation I 
asked him to add the MVT `aarch64mfp8` along with support to load and store it. 
 I expect over time we'll migrate away from using `i8` as our scalar type.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-17 Thread Paul Walker via cfe-commits

paulwalker-arm wrote:

For my education can you explain why the fp8 variants are broken out into their 
own definitions.  Taking `VREV64_MF8` as an example, it looks like you should 
be able to add the new type strings to the current definition?

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-14 Thread Momchil Velikov via cfe-commits


@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   Builder.CreateStore(errorValue, swiftErrorTemp);
 }
 
+// Mfloat8 type is loaded as scalar type, but is treated as single
+// vector type for other operations. We need to bitcast it to the 
vector
+// type here.
+if (auto *EltTy =

momchil-velikov wrote:

I don't see an issue here. That is exactly what should happen regardless of the 
target architecture any time the ABI for that architecture says values of type 
`T` are passed as `<1 x T>`.


https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-04 Thread via cfe-commits

Lukacma wrote:

Failures should be fixedn now

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-04 Thread via cfe-commits


@@ -5464,6 +5464,15 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   Builder.CreateStore(errorValue, swiftErrorTemp);
 }
 
+// Mfloat8 type is loaded as scalar type, but is treated as single
+// vector type for other operations. We need to bitcast it to the 
vector
+// type here.
+if (auto *EltTy =

Lukacma wrote:

I am not sure if this is the best way to solve this issue so would appreciate 
your feedback on this.

https://github.com/llvm/llvm-project/pull/128019
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (PR #128019)

2025-04-04 Thread via cfe-commits

https://github.com/Lukacma updated 
https://github.com/llvm/llvm-project/pull/128019

>From c331c4c260b6432b6ae96723f78c16b189e9297a Mon Sep 17 00:00:00 2001
From: Marian Lukac 
Date: Thu, 20 Feb 2025 15:35:45 +
Subject: [PATCH] [Clang][AArch64] Add fp8 variants for untyped NEON intrinsics

This patch adds fp8 variants to existing intrinsics, whose operation
doesn't depend on arguments being a specific type.
---
 clang/include/clang/Basic/arm_neon.td |   74 +-
 clang/lib/AST/Type.cpp|5 +
 clang/lib/CodeGen/CGCall.cpp  |9 +
 clang/lib/CodeGen/TargetBuiltins/ARM.cpp  |   20 +
 clang/lib/Sema/SemaInit.cpp   |2 +
 .../fp8-intrinsics/acle_neon_fp8_untyped.c| 1114 +
 6 files changed, 1220 insertions(+), 4 deletions(-)
 create mode 100644 
clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_untyped.c

diff --git a/clang/include/clang/Basic/arm_neon.td 
b/clang/include/clang/Basic/arm_neon.td
index ab0051efe5159..90f0e90e4a7f8 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -2090,17 +2090,17 @@ let ArchGuard = "defined(__aarch64__) || 
defined(__arm64ec__)", TargetGuard = "r
 
 // Lookup table read with 2-bit/4-bit indices
 let ArchGuard = "defined(__aarch64__)", TargetGuard = "lut" in {
-  def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcQcQUcQPc",
+  def VLUTI2_B: SInst<"vluti2_lane", "Q.(qU)I", "cUcPcmQcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_1>]>;
-  def VLUTI2_B_Q  : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcQcQUcQPc",
+  def VLUTI2_B_Q  : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcmQcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_3>]>;
   def VLUTI2_H: SInst<"vluti2_lane", "Q.(]>;
   def VLUTI2_H_Q  : SInst<"vluti2_laneq", "Q.(]>;
-  def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPc",
+  def VLUTI4_B: SInst<"vluti4_lane", "..(qU)I", "QcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_0>]>;
-  def VLUTI4_B_Q  : SInst<"vluti4_laneq", "..UI", "QcQUcQPc",
+  def VLUTI4_B_Q  : SInst<"vluti4_laneq", "..UI", "QcQUcQPcQm",
  [ImmCheck<2, ImmCheck0_1>]>;
   def VLUTI4_H_X2 : SInst<"vluti4_lane_x2", ".2(]>;
@@ -2194,4 +2194,70 @@ let ArchGuard = "defined(__aarch64__)", TargetGuard = 
"fp8,neon" in {
   // fscale
   def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">;
   def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">;
+}
+
+//FP8 versions of untyped intrinsics
+let ArchGuard = "defined(__aarch64__)" in {
+  def VGET_LANE_MF8 : IInst<"vget_lane", "1.I", "mQm", [ImmCheck<1, 
ImmCheckLaneIndex, 0>]>;
+  def SPLAT_MF8 : WInst<"splat_lane", ".(!q)I", "mQm", [ImmCheck<1, 
ImmCheckLaneIndex, 0>]>;
+  def SPLATQ_MF8 : WInst<"splat_laneq", ".(!Q)I", "mQm", [ImmCheck<1, 
ImmCheckLaneIndex, 0>]>;
+  def VSET_LANE_MF8 : IInst<"vset_lane", ".1.I", "mQm", [ImmCheck<2, 
ImmCheckLaneIndex, 1>]>;
+  def VCREATE_MF8 : NoTestOpInst<"vcreate", ".(IU>)", "m", OP_CAST> { let 
BigEndianSafe = 1; }
+  let InstName = "vmov" in {
+def VDUP_N_MF8 : WOpInst<"vdup_n", ".1", "mQm", OP_DUP>;
+def VMOV_N_MF8 : WOpInst<"vmov_n", ".1", "mQm", OP_DUP>;
+  }
+  let InstName = "" in
+def VDUP_LANE_MF8: WOpInst<"vdup_lane", ".qI", "mQm", OP_DUP_LN>;
+  def VCOMBINE_MF8 : NoTestOpInst<"vcombine", "Q..", "m", OP_CONC>;
+  let InstName = "vmov" in {
+def VGET_HIGH_MF8 : NoTestOpInst<"vget_high", ".Q", "m", OP_HI>;
+def VGET_LOW_MF8 : NoTestOpInst<"vget_low", ".Q", "m", OP_LO>;
+  }
+  let InstName = "vtbl" in {
+def VTBL1_MF8 : WInst<"vtbl1", "..p", "m">;
+def VTBL2_MF8 : WInst<"vtbl2", ".2p", "m">;
+def VTBL3_MF8 : WInst<"vtbl3", ".3p", "m">;
+def VTBL4_MF8 : WInst<"vtbl4", ".4p", "m">;
+  }
+  let InstName = "vtbx" in {
+def VTBX1_MF8 : WInst<"vtbx1", "...p", "m">;
+def VTBX2_MF8 : WInst<"vtbx2", "..2p", "m">;
+def VTBX3_MF8 : WInst<"vtbx3", "..3p", "m">;
+def VTBX4_MF8 : WInst<"vtbx4", "..4p", "m">;
+  }
+  def VEXT_MF8 : WInst<"vext", "...I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 
0>]>;
+  def VREV64_MF8 : WOpInst<"vrev64", "..", "mQm", OP_REV64>;
+  def VREV32_MF8 : WOpInst<"vrev32", "..", "mQm", OP_REV32>;
+  def VREV16_MF8 : WOpInst<"vrev16", "..", "mQm", OP_REV16>;
+  let isHiddenLInst = 1 in 
+  def VBSL_MF8 : SInst<"vbsl", ".U..", "mQm">;
+  def VTRN_MF8 : WInst<"vtrn", "2..", "mQm">;
+  def VZIP_MF8 : WInst<"vzip", "2..", "mQm">;
+  def VUZP_MF8 : WInst<"vuzp", "2..", "mQm">;
+  def COPY_LANE_MF8 : IOpInst<"vcopy_lane", "..I.I", "m", OP_COPY_LN>;
+  def COPYQ_LANE_MF8 : IOpInst<"vcopy_lane", "..IqI", "Qm", OP_COPY_LN>;
+  def COPY_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..IQI", "m", OP_COPY_LN>;
+  def COPYQ_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..I.I", "Qm", OP_COPY_LN>;
+  def VDUP_LANE2_MF8 : WOpInst<"vdup_laneq", ".QI", "mQm", OP_DUP_LN>;
+  def VTRN1_MF8 : SOpInst<"vtrn1", "...", "mQm", OP_TRN1>;
+  def VZIP1_MF8 : SOpInst<"vzip1", "...", "mQ