[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

2021-04-26 Thread Ryan Santhirarajan via Phabricator via cfe-commits
rsanthir.quic abandoned this revision.
rsanthir.quic added a comment.

Merging with https://reviews.llvm.org/D100772


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100499/new/

https://reviews.llvm.org/D100499

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

2021-04-19 Thread Ryan Santhirarajan via Phabricator via cfe-commits
rsanthir.quic added a comment.

Here's the fix for enabling these on ARM:
https://reviews.llvm.org/D100772


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100499/new/

https://reviews.llvm.org/D100499

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

2021-04-19 Thread Ryan Santhirarajan via Phabricator via cfe-commits
rsanthir.quic updated this revision to Diff 338539.
rsanthir.quic added a comment.

only the pol128 intrinsic is incompatible with ARM, the rest should be 
supported.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100499/new/

https://reviews.llvm.org/D100499

Files:
  clang/lib/CodeGen/CGBuiltin.cpp


Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -5460,8 +5460,7 @@
   NEONMAP1(vabsq_v, arm_neon_vabs, 0),
   NEONMAP0(vadd_v),
   NEONMAP0(vaddhn_v),
-  NEONMAP0(vaddq_p128),
-  NEONMAP0(vaddq_v),
+  NEONMAP0(vaddq_v),
   NEONMAP1(vaesdq_v, arm_neon_aesd, 0),
   NEONMAP1(vaeseq_v, arm_neon_aese, 0),
   NEONMAP1(vaesimcq_v, arm_neon_aesimc, 0),


Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -5460,8 +5460,7 @@
   NEONMAP1(vabsq_v, arm_neon_vabs, 0),
   NEONMAP0(vadd_v),
   NEONMAP0(vaddhn_v),
-  NEONMAP0(vaddq_p128),
-  NEONMAP0(vaddq_v),
+  NEONMAP0(vaddq_v),
   NEONMAP1(vaesdq_v, arm_neon_aesd, 0),
   NEONMAP1(vaeseq_v, arm_neon_aese, 0),
   NEONMAP1(vaesimcq_v, arm_neon_aesimc, 0),
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

2021-04-19 Thread Ryan Santhirarajan via Phabricator via cfe-commits
rsanthir.quic added a comment.

Thanks for looking into this @DavidSpickett ! What you found makes sense. I'll 
update this patch to remove only the poly128 vadd from the mapping. I'll also 
add another patch that will correctly enable the remaining vadd intrinsics for 
ARM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100499/new/

https://reviews.llvm.org/D100499

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

2021-04-16 Thread David Spickett via Phabricator via cfe-commits
DavidSpickett added a comment.

Ok, it's behind the aarch64 guard because there's a giant #ifdef around it that 
I didn't see. If you move it in `arm_neon.td` down to outside that #ifdef you 
get the definitions. Next problem is that the `poly128_t` type isn't 
implemented for AArch32. Not sure why but for example in the bf16 tests:

  clang/test/CodeGen/arm-bf16-reinterpret-intrinsics.c:
  // TODO: poly128_t not implemented on aarch32
  // CHCK-LABEL: @test_vreinterpretq_p128_bf16

If you split the def in two, one for the non Q, one for the Q parts that can 
work and you get the non poly128_t definitions for Arm and then I can compile 
my test program without modifications to the header.

  --- a/clang/include/clang/Basic/arm_neon.td
  +++ b/clang/include/clang/Basic/arm_neon.td
  @@ -1160,7 +1160,8 @@ def SM4E : SInst<"vsm4e", "...", "QUi">;
   def SM4EKEY : SInst<"vsm4ekey", "...", "QUi">;
   }
  
  -def VADDP   : WInst<"vadd", "...", "PcPsPlQPcQPsQPlQPk">;
  +// TODO: poly128_t isn't implemented for AArch32
  +def VADDP   : WInst<"vadd", "...", "QPcQPsQPlQPk">;
  
   

   // Float -> Int conversions with explicit rounding mode
  @@ -1630,6 +1631,9 @@ def SCALAR_VDUP_LANEQ : IInst<"vdup_laneq", "1QI", 
"ScSsSiSlSfSdSUcSUsSUiSUlSPcS
   }
   }
  
  +// Everything that doesn't use poly128_t
  +def VADDP_Q   : WInst<"vadd", "...", "PcPsPl">;
  +
   // ARMv8.2-A FP16 vector intrinsics for A32/A64.
   let ArchGuard = "defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)" in {

However I ran into issues trying to get the relevant bits of aarc64-poly-add.c 
to compile for Arm and don't have time time to pursue it myself. If you can get 
those to run (or make an arm specific test file, some others do already) then 
all you'd need to do is remove the `vaddq_p128`  in `CGBuiltin.cpp` to prevent 
anyone thinking it is implemented for AArch32. (no one would be able to hit it 
without a modified header anyway)

My conclusion being that this group of intrinsics should be for A32 and A64 as 
the ACLE says. However we can't do all of them on A32 without poly128_t.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100499/new/

https://reviews.llvm.org/D100499

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

2021-04-16 Thread David Spickett via Phabricator via cfe-commits
DavidSpickett added a comment.

Both clang and GCC have their issues when it comes to matching the ACLE, so I 
wouldn't take the header guard as fact. It could be that we never implemented 
the A32 path for these functions/when they were added the document was in 
flux/no on ever tried this on A32.

I think you could implement `vadd_p8` on A32 with:

  veor.u8 d0, d0, d0

I think we just show A64 versions in the documentation. I think. Possible that 
what I've got above is a simd instruction but not an "advanced simd" 
instruction and that somehow doesn't count?

(caveat: I've mostly been making sure the function prototypes match the ACLE, 
not actually using these to do real work)

If I bodge the header to have vadd_p8 on Arm I get:

  $ cat /tmp/test.c
  #include 
  
  poly8x8_t test_vadd_p8(poly8x8_t a, poly8x8_t b) {
  return vadd_p8 (a, b);
  }
  $ ./bin/clang -target arm-arm-none-eabi -mcpu=cortex-a57 -S -o - /tmp/test.c 
-O3
  <...>
  test_vadd_p8:
  .fnstart
  vmovd16, r0, r1
  vmovd17, r2, r3
  veord16, d17, d16
  vmovr0, r1, d16
  bx  lr

Which seems to confirm but I don't know why it's put behind the `__aarch64__` 
guard.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100499/new/

https://reviews.llvm.org/D100499

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

2021-04-15 Thread Ryan Santhirarajan via Phabricator via cfe-commits
rsanthir.quic added a comment.

As you mentioned, I thought it was only supported due to 
`CheckFPAdvSIMDEnabled64`. If the header is also guarding for AArch64 does that 
not support the idea that it is AArch64 specific?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100499/new/

https://reviews.llvm.org/D100499

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

2021-04-15 Thread David Spickett via Phabricator via cfe-commits
DavidSpickett added a comment.

What's your logic for these being Arm only?

I looked up the ones that were added:

  vadd_p8
  vadd_p16
  vadd_p64
  vaddq_p8
  vaddq_p16
  vaddq_p64
  vaddq_p128

E.g. 
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?search=vadd_p8

It says that this is enabled for `v7/A32/A64`. However the pseudocode does use 
`CheckFPAdvSIMDEnabled64` which might imply AArch64 only. There is a 
`AArch32.CheckAdvSIMDOrFPEnabled` for AArch32 but looking at `vabdq_u32` which 
is Arm and AArch64, it also uses `CheckFPAdvSIMDEnabled64` so clearly that 
doesn't mean much.

The weird thing is that the header already guards this with `__aarch64__` so 
that must be based on some other property than simply being in these tables. 
(GCC agrees)

How did you find this? Presumably you couldn't use them from C, even without 
this patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100499/new/

https://reviews.llvm.org/D100499

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

2021-04-14 Thread Ryan Santhirarajan via Phabricator via cfe-commits
rsanthir.quic created this revision.
rsanthir.quic added reviewers: t.p.northover, DavidSpickett, labrinea, apazos.
Herald added subscribers: danielkiss, kristof.beyls.
rsanthir.quic requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

The Neon vadd intrinsics were added to both the ARMSIMD and
AArch64SIMD Intrinsic maps. The intrinsics should only be supported
for AArch64, this patch corrects this by removing the mapping
from ARMSIMDIntrinsicMap that was added in https://reviews.llvm.org/D96825


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D100499

Files:
  clang/lib/CodeGen/CGBuiltin.cpp


Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -5458,10 +5458,7 @@
   NEONMAP2(vabdq_v, arm_neon_vabdu, arm_neon_vabds, Add1ArgType | 
UnsignedAlts),
   NEONMAP1(vabs_v, arm_neon_vabs, 0),
   NEONMAP1(vabsq_v, arm_neon_vabs, 0),
-  NEONMAP0(vadd_v),
   NEONMAP0(vaddhn_v),
-  NEONMAP0(vaddq_p128),
-  NEONMAP0(vaddq_v),
   NEONMAP1(vaesdq_v, arm_neon_aesd, 0),
   NEONMAP1(vaeseq_v, arm_neon_aese, 0),
   NEONMAP1(vaesimcq_v, arm_neon_aesimc, 0),


Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -5458,10 +5458,7 @@
   NEONMAP2(vabdq_v, arm_neon_vabdu, arm_neon_vabds, Add1ArgType | UnsignedAlts),
   NEONMAP1(vabs_v, arm_neon_vabs, 0),
   NEONMAP1(vabsq_v, arm_neon_vabs, 0),
-  NEONMAP0(vadd_v),
   NEONMAP0(vaddhn_v),
-  NEONMAP0(vaddq_p128),
-  NEONMAP0(vaddq_v),
   NEONMAP1(vaesdq_v, arm_neon_aesd, 0),
   NEONMAP1(vaeseq_v, arm_neon_aese, 0),
   NEONMAP1(vaesimcq_v, arm_neon_aesimc, 0),
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits