[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2022-04-13 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

Thanks @vtjnash for the information! Comments on 
https://github.com/JuliaLang/julia/issues/44829


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2022-04-12 Thread Jameson Nash via Phabricator via cfe-commits
vtjnash added a comment.
Herald added a subscriber: StephenFan.
Herald added a project: All.

I was tracking back a recent ABI break (also failing now in gcc 12, so maybe 
this irregularity is intentional), and was concerned that this commit is 
observed to cause the platform ABI to change depending on the feature flags of 
the current compilation unit. Prior to this change, f16 was always treated as 
i16 for the purpose of the calling-convention (e.g. returned in %ax). But after 
this change, the ABI of the value is now inconsistent between compile units. I 
made a small change to one of the existing tests to show this. Note how the 
callq result was in %ax without this mattr flag, and in %xmm0 with this mattr 
flag added. But the function known as "identity.half" is external, and did not 
change between those two calls to the llvm.

  diff --git a/llvm/test/CodeGen/X86/half.ll b/llvm/test/CodeGen/X86/half.ll
  index 46179e7d9113..8c1b8c4b76ff 100644
  --- a/llvm/test/CodeGen/X86/half.ll
  +++ b/llvm/test/CodeGen/X86/half.ll
  @@ -5,6 +5,8 @@
   ; RUN:   | FileCheck %s -check-prefixes=CHECK,CHECK-LIBCALL,BWOFF
   ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+f16c 
-fixup-byte-word-insts=1 \
   ; RUN:| FileCheck %s -check-prefixes=CHECK,BWON,BWON-F16C
  +; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512fp16 
-fixup-byte-word-insts=0 \
  +; RUN:| FileCheck %s -check-prefixes=CHECK-CC
   ; RUN: llc < %s -mtriple=i686-unknown-linux-gnu -mattr +sse2 
-fixup-byte-word-insts=0  \
   ; RUN:| FileCheck %s -check-prefixes=CHECK-I686
  
  @@ -163,16 +199,31 @@ define void @test_trunc32(float %in, half* %addr) #0 {
 ret void
   }
   
  +declare half @identity.half(half)
  +
   define void @test_trunc64(double %in, half* %addr) #0 {
   ; CHECK-LABEL: test_trunc64:
   ; CHECK:   # %bb.0:
   ; CHECK-NEXT:pushq %rbx
   ; CHECK-NEXT:movq %rdi, %rbx
   ; CHECK-NEXT:callq __truncdfhf2@PLT
  +; CHECK-NEXT:# kill: def $ax killed $ax def $eax
  +; CHECK-NEXT:movl %eax, %edi
  +; CHECK-NEXT:callq identity.half@PLT
   ; CHECK-NEXT:movw %ax, (%rbx)
   ; CHECK-NEXT:popq %rbx
   ; CHECK-NEXT:retq
   ;
  +; CHECK-CC-LABEL: test_trunc64:
  +; CHECK-CC:   # %bb.0:
  +; CHECK-CC-NEXT:pushq %rbx
  +; CHECK-CC-NEXT:movq %rdi, %rbx
  +; CHECK-CC-NEXT:vcvtsd2sh %xmm0, %xmm0, %xmm0
  +; CHECK-CC-NEXT:callq identity.half@PLT
  +; CHECK-CC-NEXT:vmovsh %xmm0, (%rbx)
  +; CHECK-CC-NEXT:popq %rbx
  +; CHECK-CC-NEXT:retq
  +;
   ; CHECK-I686-LABEL: test_trunc64:
   ; CHECK-I686:   # %bb.0:
   ; CHECK-I686-NEXT:pushl %esi
  @@ -181,12 +232,16 @@ define void @test_trunc64(double %in, half* %addr) #0 {
   ; CHECK-I686-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
   ; CHECK-I686-NEXT:movsd %xmm0, (%esp)
   ; CHECK-I686-NEXT:calll __truncdfhf2
  +; CHECK-I686-NEXT:# kill: def $ax killed $ax def $eax
  +; CHECK-I686-NEXT:movl %eax, (%esp)
  +; CHECK-I686-NEXT:calll identity.half@PLT
   ; CHECK-I686-NEXT:movw %ax, (%esi)
   ; CHECK-I686-NEXT:addl $8, %esp
   ; CHECK-I686-NEXT:popl %esi
   ; CHECK-I686-NEXT:retl
 %val16 = fptrunc double %in to half
  -  store half %val16, half* %addr
  +  %val16b = call half @identity.half(half %val16)
  +  store half %val16b, half* %addr
 ret void
   }

Is this intentional? We do already have code to handle the ABI dependency on 
vector-sizes, and could add this to the list of flags that change the ABI (i.e. 
we disable it if it will break the ABI), but wanted to confirm first if that 
was the intent here.

discovered from https://github.com/JuliaLang/julia/issues/44829


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-11 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4159
+defm VMOVSHZ : avx512_move_scalar<"vmovsh", X86Movsh, X86vzload16, f16x_info,
+  [HasFP16]>,
+  VEX_LIG, T_MAP5XS, EVEX_CD8<16, CD8VT1>;

pengfei wrote:
> LuoYuanke wrote:
> > Why there is no OptForSize for vmovsh?
> Good catch. I think we should add it here.
Sorry, I think we should not add `OptForSize` here.
This predicate is used to force to select blend instead of mov due to 
performance consideration.
E.g.: https://godbolt.org/z/W4v38K6va

Since we don't have a blendph instruction, I think we can always select it to 
movsh. Not sure if using pblendw is beneficial.
I'll change it back in next patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-09 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-09 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: clang/docs/LanguageExtensions.rst:599
 * SPIR
+* X86
 

Might be worth mentioning that it requires AVX512FP16 here



Comment at: clang/lib/CodeGen/TargetInfo.cpp:2817
   Current = SSE;
+} else if (k == BuiltinType::Float16) {
+  Current = SSE;

Merge with the previous if?



Comment at: clang/lib/CodeGen/TargetInfo.cpp:2948
 Lo = Hi = Integer;
+} else if (ET->isFloat16Type()) {
+  Current = SSE;

Merge with the FloatTy if?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-09 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, but may wait 1 or 2 days for the comments from others.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-08 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:3471
+  ContainsFloatAtOffset(IRType, IROffset + 4, getDataLayout()))
+return llvm::FixedVectorType::get(llvm::Type::getHalfTy(getVMContext()), 
4);
+

LuoYuanke wrote:
> For 2 float, return <2xfloat> to be compatible to previous ABI?
It is already handled in line 3456.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-08 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:3471
+  ContainsFloatAtOffset(IRType, IROffset + 4, getDataLayout()))
+return llvm::FixedVectorType::get(llvm::Type::getHalfTy(getVMContext()), 
4);
+

For 2 float, return <2xfloat> to be compatible to previous ABI?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-06 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4478
+  let Predicates = [HasFP16] in {
+def VMOVSHZrr_REV: AVX512<0x11, MRMDestReg, (outs VR128X:$dst),
+(ins VR128X:$src1, VR128X:$src2),

pengfei wrote:
> craig.topper wrote:
> > pengfei wrote:
> > > LuoYuanke wrote:
> > > > Sorry, I forgot what REV stand for. Do you know it?
> > > > Is this just encoding difference for register operand compared with 
> > > > VMOVSHZrr? What is it used for?
> > > I think REV is short for revert. Which allows a different encoding when 
> > > operands order are reverted.
> > > Yes. It's used for a different encoding.
> > It is short for "reverse". Meaing the operands are in the reversed order. 
> > There are two valid encodings moving from one register to another. This 
> > happens because there are separate opcodes for moving register to 
> > memory(Store) and moving memory to register(load). The memory operand for 
> > both of those opcodes can be a register as well. The assembler and isel 
> > always uses the register to register version of the load opcode. The 
> > reversed version is only used by the disassembler
> > 
> > There is an exception to that. For VEX encoded AVX/AVX2 instructions, 
> > X86MCInstLowering will use an _REV move if it allows a 2 byte VEX prefix 
> > instead of a 3 byte VEX prefix. This doesn't apply to any AVX512 
> > instructions though. 
> Thanks Craig for the information.
> It is short for "reverse". Meaing the operands are in the reversed order. 
> There are two valid encodings moving from one register to another. This 
> happens because there are separate opcodes for moving register to 
> memory(Store) and moving memory to register(load). The memory operand for 
> both of those opcodes can be a register as well. The assembler and isel 
> always uses the register to register version of the load opcode. The reversed 
> version is only used by the disassembler
> 
> There is an exception to that. For VEX encoded AVX/AVX2 instructions, 
> X86MCInstLowering will use an _REV move if it allows a 2 byte VEX prefix 
> instead of a 3 byte VEX prefix. This doesn't apply to any AVX512 instructions 
> though. 

I understand now. Thanks, Craig and Pengfei.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-06 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4478
+  let Predicates = [HasFP16] in {
+def VMOVSHZrr_REV: AVX512<0x11, MRMDestReg, (outs VR128X:$dst),
+(ins VR128X:$src1, VR128X:$src2),

craig.topper wrote:
> pengfei wrote:
> > LuoYuanke wrote:
> > > Sorry, I forgot what REV stand for. Do you know it?
> > > Is this just encoding difference for register operand compared with 
> > > VMOVSHZrr? What is it used for?
> > I think REV is short for revert. Which allows a different encoding when 
> > operands order are reverted.
> > Yes. It's used for a different encoding.
> It is short for "reverse". Meaing the operands are in the reversed order. 
> There are two valid encodings moving from one register to another. This 
> happens because there are separate opcodes for moving register to 
> memory(Store) and moving memory to register(load). The memory operand for 
> both of those opcodes can be a register as well. The assembler and isel 
> always uses the register to register version of the load opcode. The reversed 
> version is only used by the disassembler
> 
> There is an exception to that. For VEX encoded AVX/AVX2 instructions, 
> X86MCInstLowering will use an _REV move if it allows a 2 byte VEX prefix 
> instead of a 3 byte VEX prefix. This doesn't apply to any AVX512 instructions 
> though. 
Thanks Craig for the information.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-06 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4478
+  let Predicates = [HasFP16] in {
+def VMOVSHZrr_REV: AVX512<0x11, MRMDestReg, (outs VR128X:$dst),
+(ins VR128X:$src1, VR128X:$src2),

pengfei wrote:
> LuoYuanke wrote:
> > Sorry, I forgot what REV stand for. Do you know it?
> > Is this just encoding difference for register operand compared with 
> > VMOVSHZrr? What is it used for?
> I think REV is short for revert. Which allows a different encoding when 
> operands order are reverted.
> Yes. It's used for a different encoding.
It is short for "reverse". Meaing the operands are in the reversed order. There 
are two valid encodings moving from one register to another. This happens 
because there are separate opcodes for moving register to memory(Store) and 
moving memory to register(load). The memory operand for both of those opcodes 
can be a register as well. The assembler and isel always uses the register to 
register version of the load opcode. The reversed version is only used by the 
disassembler

There is an exception to that. For VEX encoded AVX/AVX2 instructions, 
X86MCInstLowering will use an _REV move if it allows a 2 byte VEX prefix 
instead of a 3 byte VEX prefix. This doesn't apply to any AVX512 instructions 
though. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-06 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei marked 7 inline comments as done.
pengfei added a comment.

Thanks Yuanke.




Comment at: clang/lib/Headers/avx512fp16intrin.h:292
+
+  return (__m128h)__builtin_ia32_loadsh128_mask((__v8hf *)__A, src, __U & 1);
+}

LuoYuanke wrote:
> Just be curious, why not directly use __W?
First, this is a simple mimic of `_mm_mask_load_ss`.
I think the reason is the intrinsic requests `dst[MAX:16] := 0`, while the 
builtin returns with `src[MAX:16]`.
So we need to explicitly clear the upper bits.



Comment at: clang/lib/Headers/avx512fp16intrin.h:319
+__m512h_u __v;
+  } __attribute__((__packed__, __may_alias__));
+  return ((const struct __loadu_ph *)__p)->__v;

LuoYuanke wrote:
> What is __may_alias__ used for?
This is used for preventing type-based alias analysis.
https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#Common-Type-Attributes

"In the context of section 6.5 paragraph 7 of the C99 standard, an lvalue 
expression dereferencing such a pointer is treated like having a character 
type."
"This extension exists to support some vector APIs, in which pointers to one 
vector type are permitted to alias pointers to a different vector type."



Comment at: clang/lib/Headers/avx512fp16intrin.h:350
+   __m128h __A) {
+  __builtin_ia32_storesh128_mask((__v8hf *)__W, __A, __U & 1);
+}

LuoYuanke wrote:
> I see in _mm_mask_load_sh(), we create a __m128h with upper bits zero, not 
> sure we also need it in store intrinsic.
Both load and store intrinsics only access 16bit memory, the different is the 
load intrinsic needs to set up the high bits of the XMM register (because we do 
return a 128 bits result). We don't need to do that for a store.



Comment at: clang/lib/Headers/avx512fp16intrin.h:419
+static __inline__ short __DEFAULT_FN_ATTRS128 _mm_cvtsi128_si16(__m128i __a) {
+  __v8hi __b = (__v8hi)__a;
+  return __b[0];

LuoYuanke wrote:
> Why not return __a[0] directly?
Because `__m128i` is defined as <2 x i64>. __a[0] is correct only for i64 type.



Comment at: clang/test/CodeGen/X86/avx512fp16-abi.c:89
+  _Float16 a;
+  float b;
+};

LuoYuanke wrote:
> Any false test case that have padding between a and b?
This is the one with padding, since _Float16 aligns to 2 bytes while float 
aligns to 4.



Comment at: llvm/include/llvm/IR/Intrinsics.td:315
 def llvm_v8f16_ty  : LLVMType;//  8 x half (__fp16)
+def llvm_v16f16_ty : LLVMType;   // 16 x half (__fp16)
+def llvm_v32f16_ty : LLVMType;   // 32 x half (__fp16)

LuoYuanke wrote:
> Not sure about the legacy comments, should it be _Float16 now?
LLVM IR serves for not only one type. `__fp16` is still usable in Clang. 
Besides, OpenCL half type also use half in IR. And maybe we have other FE types 
too. So I'd like to keep it as is unless we have a better way to cover all 
other FE types.



Comment at: llvm/include/llvm/Target/TargetSelectionDAG.td:1054
+def extloadvf16 : PatFrag<(ops node:$ptr), (extload node:$ptr)> {
+  let IsLoad = 1;
+  let ScalarMemoryVT = f16;

LuoYuanke wrote:
> I notice it is true for other extload. Is it same to "true"?
Good catch. I noticed it too, but forgot to change it.



Comment at: llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp:341
 if ((insn->mode == MODE_64BIT || (byte1 & 0xc0) == 0xc0) &&
-((~byte1 & 0xc) == 0xc) && ((byte2 & 0x4) == 0x4)) {
+((~byte1 & 0x8) == 0x8) && ((byte2 & 0x4) == 0x4)) {
   insn->vectorExtensionType = TYPE_EVEX;

LuoYuanke wrote:
> This is the same to ((byte1 & 0x8) == 0x0)?
Yes, but I'm not sure if this is intentional. Maybe it keeps the shape in ` & X 
== X`?



Comment at: llvm/lib/Target/X86/X86.td:189
+// guarded under condition hasVLX. So we imply it in FeatureFP16 currently.
+// FIXME: FP16 conversion between f16 and i64 customise type v8i64, which is
+// supposed to be guarded under condition hasDQI. So we imply it in FeatureFP16

LuoYuanke wrote:
> customize?
customise seems correct too. Anyway, I can change it.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:10549
 
-  if (EltVT == MVT::i32 || EltVT == MVT::f32 || EltVT == MVT::f64 ||
-  (EltVT == MVT::i64 && Subtarget.is64Bit())) {
+  if (EltVT == MVT::i32 || EltVT == MVT::f16 || EltVT == MVT::f32 ||
+  EltVT == MVT::f64 || (EltVT == MVT::i64 && Subtarget.is64Bit()) ||

LuoYuanke wrote:
> Need check Subtarget.hasFP16()?
No, f16 is legal here, so it implies the feature.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:10551
+  EltVT == MVT::f64 || (EltVT == MVT::i64 && Subtarget.is64Bit()) ||
+  (EltVT == MVT::i16 && Subtarget.hasFP16())) {

[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-06 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll:374
+; SSE-NEXT:movl %edi, %ebp
+; SSE-NEXT:movzwl %bx, %edi
 ; SSE-NEXT:callq __gnu_h2f_ieee@PLT

Why this test case changes? Shall we add -mattr=+avx512fp16 to run?



Comment at: llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll:373
+; SSE-NEXT:movl %edi, %ebp
+; SSE-NEXT:movzwl %bx, %edi
 ; SSE-NEXT:callq __gnu_h2f_ieee@PLT

Ditto.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-06 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:82
+  PatFrags ScalarIntMemFrags = !if (!eq (EltTypeName, "f16"),
+   !cast("sse_load_f16"),
+   !if (!eq (EltTypeName, "f32"),

indent



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:3878
+}
+let Predicates = [HasFP16, HasVLX] in {
+  def : Pat<(v16f16 (vselect VK16WM:$mask, (v16f16 VR256X:$src1), (v16f16 
VR256X:$src0))),

Not sure this can be merged to 512 version load/store pattern with muticlass by 
abstract type info.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4159
+defm VMOVSHZ : avx512_move_scalar<"vmovsh", X86Movsh, X86vzload16, f16x_info,
+  [HasFP16]>,
+  VEX_LIG, T_MAP5XS, EVEX_CD8<16, CD8VT1>;

Why there is no OptForSize for vmovsh?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4478
+  let Predicates = [HasFP16] in {
+def VMOVSHZrr_REV: AVX512<0x11, MRMDestReg, (outs VR128X:$dst),
+(ins VR128X:$src1, VR128X:$src2),

Sorry, I forgot what REV stand for. Do you know it?
Is this just encoding difference for register operand compared with VMOVSHZrr? 
What is it used for?



Comment at: llvm/lib/Target/X86/X86RegisterInfo.td:570
 def VR64: RegisterClass<"X86", [x86mmx], 64, (sequence "MM%u", 0, 7)>;
-def VR128 : RegisterClass<"X86", [v4f32, v2f64, v16i8, v8i16, v4i32, v2i64, 
f128],
+def VR128 : RegisterClass<"X86", [v4f32, v2f64, v8f16, v16i8, v8i16, v4i32, 
v2i64, f128],
   128, (add FR32)>;

Given there is only EVEX instructions for fp16, is it necessary to add f16 type 
to it?



Comment at: llvm/lib/Target/X86/X86RegisterInfo.td:572
   128, (add FR32)>;
-def VR256 : RegisterClass<"X86", [v8f32, v4f64, v32i8, v16i16, v8i32, v4i64],
+def VR256 : RegisterClass<"X86", [v8f32, v4f64, v16f16, v32i8, v16i16, v8i32, 
v4i64],
   256, (sequence "YMM%u", 0, 15)>;

Ditto.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-05 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp:801
   //  0b00010: implied 0F 38 leading opcode bytes
   //  0b00011: implied 0F 3A leading opcode bytes
   //  0b00100-0b1: Reserved for future use

Add comments for map5 and map6?



Comment at: llvm/lib/Target/X86/X86.td:189
+// guarded under condition hasVLX. So we imply it in FeatureFP16 currently.
+// FIXME: FP16 conversion between f16 and i64 customise type v8i64, which is
+// supposed to be guarded under condition hasDQI. So we imply it in FeatureFP16

customize?



Comment at: llvm/lib/Target/X86/X86FastISel.cpp:2291
+  case MVT::i16: Opc = X86::CMOV_GR16;  break;
+  case MVT::f16: Opc = X86::CMOV_FR16X; break;
+  case MVT::i32: Opc = X86::CMOV_GR32;  break;

Also add it in isCMOVPseudo()?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:1946
+setGroup(VT);
+  }
+  setOperationAction(ISD::SCALAR_TO_VECTOR,   MVT::v8f16,  Legal);

Drop the brace.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:10549
 
-  if (EltVT == MVT::i32 || EltVT == MVT::f32 || EltVT == MVT::f64 ||
-  (EltVT == MVT::i64 && Subtarget.is64Bit())) {
+  if (EltVT == MVT::i32 || EltVT == MVT::f16 || EltVT == MVT::f32 ||
+  EltVT == MVT::f64 || (EltVT == MVT::i64 && Subtarget.is64Bit()) ||

Need check Subtarget.hasFP16()?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:10551
+  EltVT == MVT::f64 || (EltVT == MVT::i64 && Subtarget.is64Bit()) ||
+  (EltVT == MVT::i16 && Subtarget.hasFP16())) {
 assert((VT.is128BitVector() || VT.is256BitVector() ||

Why handle i16? Isn't it handled by movw?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:10744
   // For SSE 4.1, use insertps to put the high elements into the low element.
-  if (Subtarget.hasSSE41()) {
+  if (Subtarget.hasSSE41() && EltVT != MVT::f16) {
 SDValue Result;

Why exclude f16? Is there better choice for fp16?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:19023
 
-// SHUFPS the element to the lowest double word, then movss.
-int Mask[4] = { static_cast(IdxVal), -1, -1, -1 };
+// SHUFPS the element to the lowest double word, then movsh.
+SmallVector Mask(VecVT.getVectorNumElements(), -1);

movss/movsh


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-04 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:3405
+/// half member at the specified offset.  For example, {int,{half}} has a
+/// float at offset 4.  It is conservatively correct for this routine to return
+/// false.

float -> half?



Comment at: clang/lib/Headers/avx512fp16intrin.h:292
+
+  return (__m128h)__builtin_ia32_loadsh128_mask((__v8hf *)__A, src, __U & 1);
+}

Just be curious, why not directly use __W?



Comment at: clang/lib/Headers/avx512fp16intrin.h:319
+__m512h_u __v;
+  } __attribute__((__packed__, __may_alias__));
+  return ((const struct __loadu_ph *)__p)->__v;

What is __may_alias__ used for?



Comment at: clang/lib/Headers/avx512fp16intrin.h:350
+   __m128h __A) {
+  __builtin_ia32_storesh128_mask((__v8hf *)__W, __A, __U & 1);
+}

I see in _mm_mask_load_sh(), we create a __m128h with upper bits zero, not sure 
we also need it in store intrinsic.



Comment at: clang/lib/Headers/avx512fp16intrin.h:419
+static __inline__ short __DEFAULT_FN_ATTRS128 _mm_cvtsi128_si16(__m128i __a) {
+  __v8hi __b = (__v8hi)__a;
+  return __b[0];

Why not return __a[0] directly?



Comment at: clang/test/CodeGen/X86/avx512fp16-abi.c:89
+  _Float16 a;
+  float b;
+};

Any false test case that have padding between a and b?



Comment at: llvm/include/llvm/IR/Intrinsics.td:315
 def llvm_v8f16_ty  : LLVMType;//  8 x half (__fp16)
+def llvm_v16f16_ty : LLVMType;   // 16 x half (__fp16)
+def llvm_v32f16_ty : LLVMType;   // 32 x half (__fp16)

Not sure about the legacy comments, should it be _Float16 now?



Comment at: llvm/include/llvm/Target/TargetSelectionDAG.td:1054
+def extloadvf16 : PatFrag<(ops node:$ptr), (extload node:$ptr)> {
+  let IsLoad = 1;
+  let ScalarMemoryVT = f16;

I notice it is true for other extload. Is it same to "true"?



Comment at: llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp:341
 if ((insn->mode == MODE_64BIT || (byte1 & 0xc0) == 0xc0) &&
-((~byte1 & 0xc) == 0xc) && ((byte2 & 0x4) == 0x4)) {
+((~byte1 & 0x8) == 0x8) && ((byte2 & 0x4) == 0x4)) {
   insn->vectorExtensionType = TYPE_EVEX;

This is the same to ((byte1 & 0x8) == 0x0)?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-07-09 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: clang/lib/Headers/avx512vlfp16intrin.h:74
+static __inline__ __m256h __DEFAULT_FN_ATTRS256 _mm256_abs_ph(__m256h __A) {
+  return (__m256h)_mm256_and_epi32(_mm256_set1_epi32(0x7FFF7FFF), 
(__m256i)__A);
+}

craig.topper wrote:
> Why do we use _mm256_set1_epi32 instead of _mm256_set1_epi16?
There's no difference in assembly for immediate value. 
https://godbolt.org/z/sMbrM611d. But the latency of vpbroadcastd is better than 
vpbroadcastw in Skylake according to intrinsic guide. Here the only effect is 
consist with _mm256_and_epi32. Do you think it's better to use 
_mm256_set1_epi16?



Comment at: llvm/include/llvm/IR/RuntimeLibcalls.def:290
 HANDLE_LIBCALL(FPEXT_F16_F128, "__extendhftf2")
+HANDLE_LIBCALL(FPEXT_F16_F80, "__extendhfxf2")
 HANDLE_LIBCALL(FPEXT_F32_F64, "__extendsfdf2")

craig.topper wrote:
> Is this tested in this patch?
No. I'll move it to the 3rd patch and test it there.



Comment at: llvm/lib/Target/X86/X86FastISel.cpp:58
   bool X86ScalarSSEf32;
+  bool X86ScalarAVXf16;
 

craig.topper wrote:
> AVX here should maybe be AVX512, but maybe this is pointing out that this 
> name is bad. Would X86ScalarXMMf* be better?
Maybe we can use X86ScalarSSEf16, here SSE means SSE registers? Especially GCC 
community proposing to support FP16 since SSE2.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-07-06 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: clang/lib/Headers/avx512fp16intrin.h:254
+/// Constructs a 512-bit floating-point vector of [32 x half] from a
+///128-bit floating-point vector of [16 x half]. The lower 256 bits
+///contain the value of the source vector. The upper 256 bits are set

256-bit



Comment at: clang/lib/Headers/avx512vlfp16intrin.h:74
+static __inline__ __m256h __DEFAULT_FN_ATTRS256 _mm256_abs_ph(__m256h __A) {
+  return (__m256h)_mm256_and_epi32(_mm256_set1_epi32(0x7FFF7FFF), 
(__m256i)__A);
+}

Why do we use _mm256_set1_epi32 instead of _mm256_set1_epi16?



Comment at: clang/lib/Headers/avx512vlfp16intrin.h:78
+static __inline__ __m128h __DEFAULT_FN_ATTRS128 _mm_abs_ph(__m128h __A) {
+  return (__m128h)_mm_and_epi32(_mm_set1_epi32(0x7FFF7FFF), (__m128i)__A);
+}

Same question



Comment at: llvm/include/llvm/IR/RuntimeLibcalls.def:290
 HANDLE_LIBCALL(FPEXT_F16_F128, "__extendhftf2")
+HANDLE_LIBCALL(FPEXT_F16_F80, "__extendhfxf2")
 HANDLE_LIBCALL(FPEXT_F32_F64, "__extendsfdf2")

Is this tested in this patch?



Comment at: llvm/lib/Target/X86/X86FastISel.cpp:58
   bool X86ScalarSSEf32;
+  bool X86ScalarAVXf16;
 

AVX here should maybe be AVX512, but maybe this is pointing out that this name 
is bad. Would X86ScalarXMMf* be better?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:13537
+unsigned MovOpc = 0;
+if (EltVT == MVT::f16) {
+  MovOpc = X86ISD::MOVSH;

Drop curly braces on these.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-07-04 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: clang/lib/Headers/avx512fp16intrin.h:38
+
+static __inline__ _Float16 __DEFAULT_FN_ATTRS512 _mm512_cvtsh_h(__m512h __a) {
+  return __a[0];

RKSimon wrote:
> pengfei wrote:
> > RKSimon wrote:
> > > I realize its a lot of work, but is there any chance that we could get 
> > > doxygen comments to document these intrinsics?
> > I'm hesitating not only for the work but also the effect. We have about 1K 
> > new intrinsics and more than 5K LOC in total in the two header files. 
> > Adding the doxygen comments will make the readability worse and increase 
> > the difficulty in review. It's also a burden in maintaining the correctness.
> > Do you think it's feasible to only add a link to intrinsic guide? We have 
> > decided to only using link that points intrinsic guide in our product 
> > compiler. Using one source is friendly to maintainess. And I think 
> > intrinsic guide is also easy to use that doxygen.
> I completely understand where you're coming from. What we do lose is the 
> ability for code editors to display the doxygen when using the intrinsic (or 
> mouseover the code). Are there any particular intrinsics that we could do 
> with having comments closer at hand - ones that take rounding modes that its 
> tricky to remember the enum/defines for or implicit load/store alignments 
> come to mind?
> 
> I'm not sure about the idea of linking to external docs for specs - do we 
> have a style guide policy on this?
> Are there any particular intrinsics that we could do with having comments 
> closer at hand
I only found 3 ones from avx512fintrin.h, anyway, I copied here.

> ones that take rounding modes that its tricky to remember the enum/defines 
> for or implicit load/store alignments come to mind
Unfortunately, we didn't add doc for them when enabling avx512 intrinsics.

> I'm not sure about the idea of linking to external docs for specs - do we 
> have a style guide policy on this?
I was thinking some thing like "See https://llvm.org/LICENSE.txt for license 
information." in most source files. But I agree doxygen helps for code editors. 
I didn't think of them simply because I never used them :)
I had some thought about writing a tool to help transporting intrinsic guide 
info to doxygen, but haven't yet found time to do it.

Anyway, I guess this is not the block issue for this series patches, right?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-07-02 Thread Simon Pilgrim via Phabricator via cfe-commits
RKSimon added inline comments.



Comment at: clang/lib/Headers/avx512fp16intrin.h:38
+
+static __inline__ _Float16 __DEFAULT_FN_ATTRS512 _mm512_cvtsh_h(__m512h __a) {
+  return __a[0];

pengfei wrote:
> RKSimon wrote:
> > I realize its a lot of work, but is there any chance that we could get 
> > doxygen comments to document these intrinsics?
> I'm hesitating not only for the work but also the effect. We have about 1K 
> new intrinsics and more than 5K LOC in total in the two header files. Adding 
> the doxygen comments will make the readability worse and increase the 
> difficulty in review. It's also a burden in maintaining the correctness.
> Do you think it's feasible to only add a link to intrinsic guide? We have 
> decided to only using link that points intrinsic guide in our product 
> compiler. Using one source is friendly to maintainess. And I think intrinsic 
> guide is also easy to use that doxygen.
I completely understand where you're coming from. What we do lose is the 
ability for code editors to display the doxygen when using the intrinsic (or 
mouseover the code). Are there any particular intrinsics that we could do with 
having comments closer at hand - ones that take rounding modes that its tricky 
to remember the enum/defines for or implicit load/store alignments come to mind?

I'm not sure about the idea of linking to external docs for specs - do we have 
a style guide policy on this?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-07-02 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: llvm/lib/Target/X86/X86Subtarget.h:748
   bool hasVLX() const { return HasVLX; }
+  bool hasFP16() const { return HasFP16; }
   bool hasPKU() const { return HasPKU; }

pengfei wrote:
> RKSimon wrote:
> > I'm a little worried this might get confused with hasF16C - am I just being 
> > over cautious?
> Make sense. How about `hasAVX512FP16`? I can update the name as a followup 
> patch once these patches merged.
That sounds good to me. We should maybe go back and update some of the others. 
Especially VNNI since we also have AVXVNNI.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-07-02 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: clang/lib/Headers/avx512fp16intrin.h:38
+
+static __inline__ _Float16 __DEFAULT_FN_ATTRS512 _mm512_cvtsh_h(__m512h __a) {
+  return __a[0];

RKSimon wrote:
> I realize its a lot of work, but is there any chance that we could get 
> doxygen comments to document these intrinsics?
I'm hesitating not only for the work but also the effect. We have about 1K new 
intrinsics and more than 5K LOC in total in the two header files. Adding the 
doxygen comments will make the readability worse and increase the difficulty in 
review. It's also a burden in maintaining the correctness.
Do you think it's feasible to only add a link to intrinsic guide? We have 
decided to only using link that points intrinsic guide in our product compiler. 
Using one source is friendly to maintainess. And I think intrinsic guide is 
also easy to use that doxygen.



Comment at: llvm/lib/Target/X86/X86Subtarget.h:748
   bool hasVLX() const { return HasVLX; }
+  bool hasFP16() const { return HasFP16; }
   bool hasPKU() const { return HasPKU; }

RKSimon wrote:
> I'm a little worried this might get confused with hasF16C - am I just being 
> over cautious?
Make sense. How about `hasAVX512FP16`? I can update the name as a followup 
patch once these patches merged.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-07-02 Thread Simon Pilgrim via Phabricator via cfe-commits
RKSimon added inline comments.



Comment at: clang/lib/Headers/avx512fp16intrin.h:38
+
+static __inline__ _Float16 __DEFAULT_FN_ATTRS512 _mm512_cvtsh_h(__m512h __a) {
+  return __a[0];

I realize its a lot of work, but is there any chance that we could get doxygen 
comments to document these intrinsics?



Comment at: llvm/lib/Target/X86/X86Subtarget.h:748
   bool hasVLX() const { return HasVLX; }
+  bool hasFP16() const { return HasFP16; }
   bool hasPKU() const { return HasPKU; }

I'm a little worried this might get confused with hasF16C - am I just being 
over cautious?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-07-01 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: clang/test/CodeGen/X86/avx512fp16-complex.c:1
+// RUN: %clang_cc1 %s -O0 -fno-experimental-new-pass-manager -emit-llvm 
-triple x86_64-unknown-unknown -target-feature +avx512fp16 -o - | FileCheck %s 
--check-prefix=X86
+

craig.topper wrote:
> Can we split _Complex out of this patch? This affects other targets that have 
> _Float16 right? So probably needs a different set of reviewers.
Sure. Split to D105331. Do you know someone who is familiar with or may be 
interested in it?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:23042
  Op0.getSimpleValueType().is512BitVector())) {
-  assert(VT.getVectorNumElements() <= 16);
+  assert(VT.getVectorNumElements() <= 16 || Subtarget.hasFP16());
   Opc = IsStrict ? X86ISD::STRICT_CMPM : X86ISD::CMPM;

craig.topper wrote:
> This should probably include EltVT==MVT::f16 for the FP16 override?
Maybe we can only check `EltVT == MVT::f16` like this?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-07-01 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: clang/lib/Headers/avx512fp16intrin.h:51
+
+static __inline__ __m256h __DEFAULT_FN_ATTRS256 _mm256_undefined_ph() {
+  return (__m256h)__builtin_ia32_undef256();

I think this should be `_mm256_undefined_ph(void)`




Comment at: clang/lib/Headers/avx512fp16intrin.h:61
+
+static __inline__ __m128h __DEFAULT_FN_ATTRS128 _mm_undefined_ph() {
+  return (__m128h)__builtin_ia32_undef128();

I think this should be `_mm_undefined_ph(void)`



Comment at: clang/lib/Headers/avx512fp16intrin.h:65
+
+static __inline__ __m512h __DEFAULT_FN_ATTRS512 _mm512_undefined_ph() {
+  return (__m512h)__builtin_ia32_undef512();

I think this should be `_mm512_undefined_ph(void)`



Comment at: clang/test/CodeGen/X86/avx512fp16-complex.c:1
+// RUN: %clang_cc1 %s -O0 -fno-experimental-new-pass-manager -emit-llvm 
-triple x86_64-unknown-unknown -target-feature +avx512fp16 -o - | FileCheck %s 
--check-prefix=X86
+

Can we split _Complex out of this patch? This affects other targets that have 
_Float16 right? So probably needs a different set of reviewers.



Comment at: clang/test/Sema/Float16.c:13
 #ifdef HAVE
-// FIXME: Should this be valid?
-_Complex _Float16 a; // expected-error {{'_Complex _Float16' is invalid}}
+// FIXME: Should this be invalid?
+_Complex _Float16 a;

It's odd to change behavior and then have a FIXME asking if the old behavior 
was correct.



Comment at: llvm/lib/Support/X86TargetParser.cpp:204
 constexpr FeatureBitset FeaturesSapphireRapids =
-FeaturesICLServer | FeatureAMX_TILE | FeatureAMX_INT8 | FeatureAMX_BF16 |
-FeatureAVX512BF16 | FeatureAVX512VP2INTERSECT | FeatureCLDEMOTE |
-FeatureENQCMD | FeatureMOVDIR64B | FeatureMOVDIRI | FeaturePTWRITE |
-FeatureSERIALIZE | FeatureSHSTK | FeatureTSXLDTRK | FeatureUINTR |
-FeatureWAITPKG | FeatureAVXVNNI;
+FeatureAVX512FP16 | FeaturesICLServer | FeatureAMX_TILE | FeatureAMX_INT8 |
+FeatureAMX_BF16 | FeatureAVX512BF16 | FeatureAVX512VP2INTERSECT |

I think FeaturesICLServer should still be at the beginning of the list. 
FeatureAVX512FP16 should be alphabetized with the other AVX512 features. Looks 
like FeatureAVXNNI was already incorrectly alphabetized.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:18961
 
 // SHUFPS the element to the lowest double word, then movss.
+SmallVector Mask(VecVT.getVectorNumElements(), -1);

I think this comment should mention movsh now.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:19110
 
   // This will be just movd/movq/movss/movsd.
   if (IdxVal == 0 && ISD::isBuildVectorAllZeros(N0.getNode())) {

movsh



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:23042
  Op0.getSimpleValueType().is512BitVector())) {
-  assert(VT.getVectorNumElements() <= 16);
+  assert(VT.getVectorNumElements() <= 16 || Subtarget.hasFP16());
   Opc = IsStrict ? X86ISD::STRICT_CMPM : X86ISD::CMPM;

This should probably include EltVT==MVT::f16 for the FP16 override?



Comment at: llvm/lib/Target/X86/X86InstrFragmentsSIMD.td:410
 def X86Movsldup : SDNode<"X86ISD::MOVSLDUP", SDTShuff1Op>;
+def X86Movsh : SDNode<"X86ISD::MOVSH",
+  SDTypeProfile<1, 2, [SDTCisVT<0, v8f16>,

Add a blank line above this to match the original formatting



Comment at: llvm/lib/Target/X86/X86InstrFragmentsSIMD.td:996
 
+def fp16imm0 : PatLeaf<(f16 fpimm), [{
+  return N->isExactlyValue(+0.0);

This should be with fp32imm0 and friends.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-07-01 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added a comment.

In D105263#2852295 , @tschuett wrote:

> Could you add a link to a reference?

Done. Thanks for reminding.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-07-01 Thread Thorsten via Phabricator via cfe-commits
tschuett added a comment.

Could you add a link to a reference?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits