from:"LuoYuanke via Phabricator via cfe\-commits"

[PATCH] D151537: [NFC] Update cpu_specific test to use a newer CPU

2023-05-30 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

In D151537#4380763 , @erichkeane 
wrote:

> I don't really see the justification here?  Why do this change?  If the 
> intent is to just test a newer architecture, we can add tests for that, not 
> change existing ones.

KNL is deprecated and it is better to remove KNL related code in clang/llvm.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151537/new/

https://reviews.llvm.org/D151537

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D151537: [NFC] Update cpu_specific test to use a newer CPU

2023-05-29 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151537/new/

https://reviews.llvm.org/D151537

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147165: [Windows SEH] Fix catch+return crash for Windows -EHa

2023-03-31 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, pls wait for 1 or 2 days in case there are comments from others


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147165/new/

https://reviews.llvm.org/D147165

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141899: [IR][X86] Remove X86AMX type in LLVM IR instead of target extension

2023-02-26 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

> Deal. And because I am on busy for a long time and it is also better to let 
> intel guy handle x86-related feature, I am happy with the patch being 
> commandeered.

@nikic's proposal looks a promising solution, we can investigate more about it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141899/new/

https://reviews.llvm.org/D141899

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D143094: [clang] Change AMX macros to match names from GCC

2023-02-02 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

@pengfei, would you take a look to double confirm?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143094/new/

https://reviews.llvm.org/D143094

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D143094: [clang] Change AMX macros to match names from GCC

2023-02-01 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, thanks


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143094/new/

https://reviews.llvm.org/D143094

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141899: [IR][X86] Remove X86AMX type in LLVM IR instead of target extension

2023-01-17 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

In D141899#4061237 , @zixuan-wu wrote:

> With considering 
> https://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility I think 
> we need make consensus to choose one option from following 2 options.
>
> 1. Remove X86amx type in IR totally. (what I am doing now)
> 2. Without removing X86amx type in IR, just upgrade the x86amx type to target 
> extension and also upgrade bitcast llvm instruction to intrinsic(required). 
> It also includes changing the testcase to target extension type.
>
> BTW, need we make consensus about whether to do this purify patch anymore?

For option 2 is it possible to make x86_amx and target extension type co-exist 
and have a knob to control it? The new feature can base on the target extension 
type, and if the target extension type is validated on more and more 
applications we need figure out a way to deprecate the old type someday.

I don't think we need to puirfy the patch anymore. I prefer to have 2 type 
co-exist for some time and deprecate 1 in a proper and safe way.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141899/new/

https://reviews.llvm.org/D141899

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141899: [IR][X86] Remove X86AMX type in LLVM IR instead of target extension

2023-01-17 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

In D141899#4061150 , @zixuan-wu wrote:

> In D141899#4058173 , @LuoYuanke 
> wrote:
>
>> @zixuan-wu, changing x86_amx would break our internal code. May I know the 
>> motivation to change the type?
>
> The background is at https://reviews.llvm.org/D135202. No more motivation, 
> just to purify LLVM IR and demonstrate and validate target extension type.

I think TLX may be a better case to demonstrate target extension type as TLX is 
pretty new.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141899/new/

https://reviews.llvm.org/D141899

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141899: [IR][X86] Remove X86AMX type in LLVM IR instead of target extension

2023-01-17 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

In D141899#4061150 , @zixuan-wu wrote:

> In D141899#4058173 , @LuoYuanke 
> wrote:
>
>> @zixuan-wu, changing x86_amx would break our internal code. May I know the 
>> motivation to change the type?
>
> The background is at https://reviews.llvm.org/D135202. No more motivation, 
> just to purify LLVM IR and demonstrate target extension. I think putting 
> target-specific type into LLVM IR was thoughtless at that moment. Considering 
> there was no better solution at that time such as target extension, it's a 
> workable workaround. But it should not keep going anymore if there is better 
> way.

I think target extension type is nice, if it is introduced 2 years ago I would 
vote for it. However my concern is the compatibility issue as I explained. We 
need to be compatible to the IR that built by previous compiler, and be 
compatible to the 3rd software that based on the x86_amx type. I can't predict 
more risks for now if we replace an LLVM IR type, but I believe there is big 
risk hidden.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141899/new/

https://reviews.llvm.org/D141899

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141899: [IR][X86] Remove X86AMX type in LLVM IR instead of target extension

2023-01-17 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

We need consider how to be compatible with the existing software if we want the 
change the IR type. There are some existing software that is based on the 
existing type. For example the AMX dialect of MLIR and TLX code are based on 
the x86_amx, it would break them if we change the type. Some existing bitcode 
or text format IR already have already based on x86_amx. They will also be 
broken if we change the IR type. And I am also concerned about the following 
patch that remove amx cast intrinsics. We've done many optimization based on 
the intrinsics. Removing the intrinsics would break the optimization and all 
the code for the existing software.

Given the current patch is intrusive to current AMX solution, I would suggest 
author to describe the background and roadmap of your work and prototype the 
target extension type and its usage without changing x86_amx. If the prototype 
is good, we can discuss how to switch AMX to the new infrastructure without 
breaking existing software.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141899/new/

https://reviews.llvm.org/D141899

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141899: [IR][X86] Remove X86AMX type in LLVM IR instead of target extension

2023-01-17 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

In D141899#4058683 , @lebedev.ri 
wrote:

> In D141899#4058173 , @LuoYuanke 
> wrote:
>
>> @zixuan-wu, changing x86_amx would break our internal code. May I know the 
>> motivation to change the type?
>
> I just want to point out that generally "causes [too much] churn downstream"
> is not relevant concern upstream, as downstreams are considered to be on 
> their own.

But we previously design the amx type and we follow the design to develop and 
maintain both the upsteam code and downstream code. I think we should respect 
the orginal author's effort for developing and maintaining the code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141899/new/

https://reviews.llvm.org/D141899

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141899: [IR][X86] Remove X86AMX type in LLVM IR instead of target extension

2023-01-17 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

@zixuan-wu, changing x86_amx would break our internal code. May I know the 
motivation to change the type?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141899/new/

https://reviews.llvm.org/D141899

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99565: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled.

2023-01-12 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke commandeered this revision.
LuoYuanke edited reviewers, added: LiuChen3; removed: LuoYuanke.
LuoYuanke added a subscriber: lebedev.ri.
LuoYuanke added a comment.

In D99565#4049330 , @lebedev.ri wrote:

> This review seems to be stuck/dead, consider abandoning if no longer relevant.

@LiuChen3 has been inactive for long time. How can I help to abandon the patch?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99565/new/

https://reviews.llvm.org/D99565

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D140281: [X86] Rename CMPCCXADD intrinsics.

2022-12-18 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140281/new/

https://reviews.llvm.org/D140281

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D138547: [X86][AMX] Fix typo of the headerfile.

2022-11-23 Thread LuoYuanke via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG55fceef61e0d: [X86][AMX] Fix typo of the headerfile. 
(authored by LuoYuanke).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138547/new/

https://reviews.llvm.org/D138547

Files:
  clang/lib/Headers/amxfp16intrin.h


Index: clang/lib/Headers/amxfp16intrin.h
===
--- clang/lib/Headers/amxfp16intrin.h
+++ clang/lib/Headers/amxfp16intrin.h
@@ -20,7 +20,7 @@
 ///floating-point elements with elements in \a dst, and store the 32-bit
 ///result back to tile \a dst.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// \code
 /// void _tile_dpfp16ps (__tile dst, __tile a, __tile b)


Index: clang/lib/Headers/amxfp16intrin.h
===
--- clang/lib/Headers/amxfp16intrin.h
+++ clang/lib/Headers/amxfp16intrin.h
@@ -20,7 +20,7 @@
 ///floating-point elements with elements in \a dst, and store the 32-bit
 ///result back to tile \a dst.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// \code
 /// void _tile_dpfp16ps (__tile dst, __tile a, __tile b)
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D138547: [X86][AMX] Fix typo of the headerfile.

2022-11-23 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke created this revision.
Herald added a project: All.
LuoYuanke requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D138547

Files:
  clang/lib/Headers/amxfp16intrin.h


Index: clang/lib/Headers/amxfp16intrin.h
===
--- clang/lib/Headers/amxfp16intrin.h
+++ clang/lib/Headers/amxfp16intrin.h
@@ -20,7 +20,7 @@
 ///floating-point elements with elements in \a dst, and store the 32-bit
 ///result back to tile \a dst.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// \code
 /// void _tile_dpfp16ps (__tile dst, __tile a, __tile b)


Index: clang/lib/Headers/amxfp16intrin.h
===
--- clang/lib/Headers/amxfp16intrin.h
+++ clang/lib/Headers/amxfp16intrin.h
@@ -20,7 +20,7 @@
 ///floating-point elements with elements in \a dst, and store the 32-bit
 ///result back to tile \a dst.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// \code
 /// void _tile_dpfp16ps (__tile dst, __tile a, __tile b)
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D111778: [WIP][X86] Update CPU_SPECIFIC list.

2022-10-24 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

It seems @craig.topper supported __cpu_features2 in compiler-rt revision 
94ccb2acbf2c5 
. Anything 
else that we need to address before landing the patch?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111778/new/

https://reviews.llvm.org/D111778

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D132329: [X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics

2022-10-19 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132329/new/

https://reviews.llvm.org/D132329

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D132329: [X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics

2022-10-19 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/lib/Headers/avx512bf16intrin.h:13
 
+#ifdef __SSE2__
+

What is this macro check used for?



Comment at: clang/test/CodeGen/X86/avx512bf16-error.c:14
+__bfloat16 bar(__bfloat16 a, __bfloat16 b) {
+  return a + b;
+}

Need test for other operations (-, *, /) as well?



Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:4928
   Intrinsic<[llvm_v4f32_ty],
   [llvm_v4f32_ty, llvm_v4i32_ty, llvm_v4i32_ty], [IntrNoMem]>;
   def int_x86_avx512bf16_dpbf16ps_256:

It seems we still use i32 to represent <2 x bf16>, but we don't have a better 
way since 1 bit mask cover a pair of bf16 elements.



Comment at: llvm/lib/IR/AutoUpgrade.cpp:4095
+Intrinsic::x86_avx512bf16_mask_cvtneps2bf16_128)
+  Args[1] = Builder.CreateBitCast(
+  Args[1], FixedVectorType::get(Builder.getBFloatTy(), NumElts));

Why there is no bitcast for the input for the other intrinsics? I expect to see 
the bitcast from vXi16 to vXbf16.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:3916
+multiclass mask_move_lowering_f16_bf16 {
 let Predicates = [HasBWI] in {
+  def : Pat<(_.info512.VT (vselect VK32WM:$mask, (_.info512.VT VR512:$src1), 
(_.info512.VT VR512:$src0))),

Not sure the indent is correct or not.



Comment at: llvm/test/CodeGen/X86/bfloat.ll:32
+; BF16-NEXT:shll $16, %eax
+; BF16-NEXT:vmovd %eax, %xmm1
+; BF16-NEXT:vaddss %xmm0, %xmm1, %xmm0

It seems the difference between SSE2 and BF16 is using SSE instruction or AVX 
instruction. What do we expect to test for BF16?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132329/new/

https://reviews.llvm.org/D132329

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D136040: [X86] Support PREFETCHI instructions

2022-10-17 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/include/clang/Driver/Options.td:4651
 def mno_popcnt : Flag<["-"], "mno-popcnt">, Group;
+def mprefetchi : Flag<["-"], "mprefetchi">, Group;
+def mno_prefetchi : Flag<["-"], "mno-prefetchi">, Group;

I notice in line 4655 the option name is "prfch", do we need to follow the 
naming convention?



Comment at: llvm/lib/Target/X86/X86InstrInfo.td:3007
+  def PREFETCHIT0 : I<0x18, MRM7m, (outs), (ins i8mem:$src),
+"prefetchit0\t$src", [(prefetch addr:$src, (i32 1), (i32 3), (i32 0))]>, 
TB;
+  def PREFETCHIT1 : I<0x18, MRM6m, (outs), (ins i8mem:$src),

Could you add comments to explain the what the constant (1, 3, 0) means?  I 
guess it is the same to the arguments that llvm.prefetch defines.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136040/new/

https://reviews.llvm.org/D136040

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D135930: [X86] Add AVX-NE-CONVERT instructions.

2022-10-13 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/lib/Basic/Targets/X86.cpp:781
+Builder.defineMacro("__AVXNECONVERT__");
+  Builder.defineMacro("__AVXNECONVERT_SUPPORTED__");
   if (HasAVXVNNI)

Do we need it here?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135930/new/

https://reviews.llvm.org/D135930

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D132329: [X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics

2022-08-22 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/lib/AST/MicrosoftMangle.cpp:2472
 
+  case BuiltinType::BFloat16:
+mangleArtificialTagType(TTK_Struct, "__bf16", {"__clang"});

This looks irrelative to the patch.



Comment at: clang/test/CodeGen/X86/avx512bf16-builtins.c:7
 
-float test_mm_cvtsbh_ss(__bfloat16 A) {
-  // CHECK-LABEL: @test_mm_cvtsbh_ss
-  // CHECK: zext i16 %{{.*}} to i32
-  // CHECK: shl i32 %{{.*}}, 16
-  // CHECK: bitcast i32 %{{.*}} to float
+float test_mm_cvtsbh_ss(__bf16 A) {
+  // CHECK: fpext bfloat %{{.*}} to float

Add a test case for `__bfloat16` to test compatibility?



Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:4904
   ClangBuiltin<"__builtin_ia32_cvtne2ps2bf16_128">,
-  Intrinsic<[llvm_v8i16_ty], [llvm_v4f32_ty, llvm_v4f32_ty],
+  Intrinsic<[llvm_v8bf16_ty], [llvm_v4f32_ty, llvm_v4f32_ty],
   [IntrNoMem]>;

Probably need to upgrade the old intrinsics to new version for IR compatibility 
or we can keep IR unchanged and just generate bitcast from bfloat16 to i16.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:2180
+  if (!Subtarget.useSoftFloat() && Subtarget.hasBF16()) {
+addRegisterClass(MVT::bf16, ::FR16XRegClass);
+addRegisterClass(MVT::v8bf16, ::VR128XRegClass);

Not sure about this. Does it make bf16 legal type?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132329/new/

https://reviews.llvm.org/D132329

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-10 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-09 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/test/CodeGen/X86/fpclamptosat.ll:569
 ; CHECK-NEXT:cvttss2si %xmm0, %rax
 ; CHECK-NEXT:ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:movabsq $-9223372036854775808, %rcx # imm = 0x8000

I'm curious why there is 1 more compare in this patch. 



Comment at: llvm/test/CodeGen/X86/fpclamptosat.ll:776
+; CHECK-NEXT:cmovael %eax, %ecx
+; CHECK-NEXT:ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:movl $2147483647, %edx # imm = 0x7FFF

Ditto.



Comment at: llvm/test/CodeGen/X86/fpclamptosat_vec.ll:605
+; CHECK-NEXT:.cfi_def_cfa_offset 80
+; CHECK-NEXT:movss %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT:movss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill

Is the vector <4 x half> split to 4 scalar and pass by xmm? What's the ABI for 
vector half? Is there any case that test the scenario that run out of register 
and pass parameter through stack?



Comment at: llvm/test/CodeGen/X86/fptosi-sat-scalar.ll:2138
+; X64-NEXT:ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; X64-NEXT:movl $255, %eax
+; X64-NEXT:cmovael %ecx, %eax

It seems less efficient than previous code on NAN, zero handling, but we can 
improve later.



Comment at: llvm/test/CodeGen/X86/half.ll:946
+; CHECK-I686-NEXT:calll __extendhfsf2
+; CHECK-I686-NEXT:fstps {{[0-9]+}}(%esp)
+; CHECK-I686-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero

Why the x87 instruction is generated?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-08 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/test/Analysis/CostModel/X86/fptoi_sat.ll:852
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %f16u1 
= call i1 @llvm.fptoui.sat.i1.f16(half undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %f16s8 
= call i8 @llvm.fptosi.sat.i8.f16(half undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %f16u8 
= call i8 @llvm.fptoui.sat.i8.f16(half undef)

It seems the cost is reduced in general. Is it because we pass/return f16 by 
xmm register?



Comment at: llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir:31
   ; CHECK-LABEL: name: test
-  ; CHECK: INLINEASM , 0 /* attdialect */, 4390922 /* regdef:GR64 */, def 
$rsi, 4390922 /* regdef:GR64 */, def dead $rdi,
-INLINEASM , 0, 4390922, def $rsi, 4390922, def dead $rdi, 2147549193, 
killed $rdi, 2147483657, killed $rsi, 12, implicit-def dead early-clobber 
$eflags
+  ; CHECK: INLINEASM , 0 /* attdialect */, 4456458 /* regdef:GR64 */, def 
$rsi, 4456458 /* regdef:GR64 */, def dead $rdi,
+INLINEASM , 0, 4456458, def $rsi, 4456458, def dead $rdi, 2147549193, 
killed $rdi, 2147483657, killed $rsi, 12, implicit-def dead early-clobber 
$eflags

Why f16 patch affect this test case? There is no fp instruction in this test 
case.



Comment at: llvm/test/CodeGen/X86/atomic-non-integer.ll:253
+; X64-SSE-NEXT:movzwl (%rdi), %eax
+; X64-SSE-NEXT:pinsrw $0, %eax, %xmm0
+; X64-SSE-NEXT:retq

I notice X86-SSE1 return by GPR. Should we also return by GPR for X64-SSE?



Comment at: llvm/test/CodeGen/X86/avx512-insert-extract.ll:2307
+; SKX-NEXT:vmovd %ecx, %xmm0
+; SKX-NEXT:vcvtph2ps %xmm0, %xmm0
+; SKX-NEXT:vmovss %xmm0, %xmm0, %xmm0 {%k2} {z}

Is code less efficient than previous code? Why previous code still works 
without convert half to float?



Comment at: llvm/test/CodeGen/X86/avx512-masked_memop-16-8.ll:156
 ; Make sure we scalarize masked loads of f16.
 define <16 x half> @test_mask_load_16xf16(<16 x i1> %mask, <16 x half>* %addr, 
<16 x half> %val) {
 ; CHECK-LABEL: test_mask_load_16xf16:

It seems parameter %val is useless.



Comment at: llvm/test/CodeGen/X86/callbr-asm-bb-exports.ll:20
 ; CHECK-NEXT: t22: ch,glue = CopyToReg t17, Register:i32 %5, t8
-; CHECK-NEXT: t30: ch,glue = inlineasm_br t22, TargetExternalSymbol:i64'xorl 
$0, $0; jmp ${1:l}', MDNode:ch, TargetConstant:i64<8>, 
TargetConstant:i32<2293769>, Register:i32 %5, TargetConstant:i64<13>, 
TargetBlockAddress:i64<@test, %fail> 0, TargetConstant:i32<12>, Register:i32 
$df, TargetConstant:i32<12>, Register:i16 $fpsw, TargetConstant:i32<12>, 
Register:i32 $eflags, t22:1
+; CHECK-NEXT: t30: ch,glue = inlineasm_br t22, TargetExternalSymbol:i64'xorl 
$0, $0; jmp ${1:l}', MDNode:ch, TargetConstant:i64<8>, 
TargetConstant:i32<2359305>, Register:i32 %5, TargetConstant:i64<13>, 
TargetBlockAddress:i64<@test, %fail> 0, TargetConstant:i32<12>, Register:i32 
$df, TargetConstant:i32<12>, Register:i16 $fpsw, TargetConstant:i32<12>, 
Register:i32 $eflags, t22:1
 

Why this test is affacted? Is it caused by calling convention change?



Comment at: llvm/test/CodeGen/X86/fmf-flags.ll:115
-; X64-NEXT:movzwl %di, %edi
-; X64-NEXT:callq __gnu_h2f_ieee@PLT
 ; X64-NEXT:mulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0

Does __gnu_h2f_ieee retrun from xmm?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-08 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:616
+setOperationAction(ISD::FROUNDEVEN, MVT::f16, Promote);
+setOperationAction(ISD::FP_ROUND, MVT::f16, Expand);
+setOperationAction(ISD::FP_EXTEND, MVT::f32, Expand);

Just confused how to expand it. Will the expand fail and finally turns to 
libcall?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:763
+if (isTypeLegal(MVT::f16)) {
+  setOperationAction(ISD::FP_EXTEND, MVT::f80, Custom);
+  setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f80, Custom);

Why f16 emulation affect f80 type? Are we checking isTypeLegal(MVT::f80)?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22100
   SDValue Res;
+  if (SrcVT == MVT::f16 && !Subtarget.hasFP16()) {
+if (IsStrict)

Not sure if it is better to wrapper it into a readable function (e.g., 
isSoftF16).



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22448
+  if (SrcVT == MVT::f16)
+return SDValue();
+

Why we don't extent to f32 here?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22522
+  if (!isScalarFPTypeInSSEReg(SrcVT) ||
+  (SrcVT == MVT::f16 && !Subtarget.hasFP16()))
 return SDValue();

Why we don't extent to f32 here? Will it be promoted finally?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22765
+DAG.getIntPtrConstant(0, DL));
+  Res = DAG.getNode(X86ISD::STRICT_CVTPS2PH, DL, {MVT::v8i16, MVT::Other},
+{Chain, Res, DAG.getTargetConstant(4, DL, MVT::i32)});

Should MVT::v8i16 be MVT::v8f16?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22766
+  Res = DAG.getNode(X86ISD::STRICT_CVTPS2PH, DL, {MVT::v8i16, MVT::Other},
+{Chain, Res, DAG.getTargetConstant(4, DL, MVT::i32)});
+  Chain = Res.getValue(1);

Is it rounding control? Can we use a macro or add comments for what is the 
rounding control?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22775
+
+Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i16, Res,
+  DAG.getIntPtrConstant(0, DL));

MVT::f16 and delete the bitcast?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:44211
   VT != MVT::f80 && VT != MVT::f128 &&
+  !(VT.getScalarType() == MVT::f16 && !Subtarget.hasFP16()) &&
   (TLI.isTypeLegal(VT) || VT == MVT::v2f32) &&

Not sure if it is better to wrapper it into a readable function (e.g., 
isSoftF16).



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:1476
 }
-let Predicates = [HasFP16] in {
+let Predicates = [HasBWI] in {
   def : Pat<(v32f16 (X86VBroadcastld16 addr:$src)),

If target don't have avx512bw feature. There is some other pattern to lower the 
node or fp16 broadcast node is invalid?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4107
  _.ExeDomain>, EVEX_4V, Sched<[SchedWriteFShuffle.XMM]>;
+  let Predicates = [prd] in {
   def rrkz : AVX512PI<0x10, MRMSrcReg, (outs _.RC:$dst),

Previous prd only apply to "def rr"? Is it a bug for previous code?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4352
+defm : avx512_move_scalar_lowering<"VMOVSHZ", X86Movsh, fp16imm0, v8f16x_info>;
+defm : avx512_store_scalar_lowering<"VMOVSHZ", avx512vl_f16_info,
+   (v32i1 (bitconvert (and GR32:$mask, (i32 1, GR32>;

Why previous code don't have predicates?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:11657
 
+let Predicates = [HasBWI], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWZrm (v8i16 
(IMPLICIT_DEF)), addr:$src, 0), FR16X)>;

Why set AddedComplexity to -10? There no such addtional complexity in previous 
code. Add comments for it? 



Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3970
 
+let Predicates = [UseSSE2], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (PINSRWrm (v8i16 
(IMPLICIT_DEF)), addr:$src, 0), FR16)>;

Why  AddedComplexity = -10? Add comments for it?



Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3978
+let Predicates = [HasAVX, NoBWI], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWrm (v8i16 
(IMPLICIT_DEF)), addr:$src, 0), FR16)>;
+  def : Pat<(i16 (bitconvert f16:$src)), (EXTRACT_SUBREG (VPEXTRWrr (v8i16 
(COPY_TO_REGCLASS FR16:$src, VR128)), 0), sub_16bit)>;

Miss pattern for store?



Comment at: llvm/lib/Target/X86/X86InstrSSE.td:5214
+
+let Predicates = [HasAVX, NoBWI] in
+  def : Pat<(store f16:$src,

[PATCH] D127050: [Clang][FP16] Add 4 builtins for _Float16

2022-06-04 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/test/CodeGen/builtin_Float16.c:7
+void test_float16_builtins(void) {
+  volatile _Float16 res;
+

pengfei wrote:
> LuoYuanke wrote:
> > Is _Float16 a legal type for target armv7a and aarch64?
> Yes, see 
> https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point
Maybe use `__fp16` because it is supported on every target. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127050/new/

https://reviews.llvm.org/D127050

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D127050: [Clang][FP16] Add 4 builtins for _Float16

2022-06-04 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/test/CodeGen/builtin_Float16.c:7
+void test_float16_builtins(void) {
+  volatile _Float16 res;
+

Is _Float16 a legal type for target armv7a and aarch64?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127050/new/

https://reviews.llvm.org/D127050

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D127050: [Clang][FP16] Add 4 builtins for _Float16

2022-06-04 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/include/clang/Basic/Builtins.def:145
 BUILTIN(__builtin_huge_vall, "Ld", "nc")
+BUILTIN(__builtin_huge_valf16, "x", "nc")
 BUILTIN(__builtin_huge_valf128, "LLd", "nc")

Is the builtin sorted in alphabet order?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127050/new/

https://reviews.llvm.org/D127050

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122567: [X86][AMX] enable amx cast intrinsics in FE.

2022-04-02 Thread LuoYuanke via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG979d876bb4e9: [X86][AMX] enable amx cast intrinsics in FE. 
(authored by LuoYuanke).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122567/new/

https://reviews.llvm.org/D122567

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGen/X86/amx_api.c

Index: clang/test/CodeGen/X86/amx_api.c
===
--- clang/test/CodeGen/X86/amx_api.c
+++ clang/test/CodeGen/X86/amx_api.c
@@ -11,9 +11,11 @@
 // This is an example code and integration test.
 void test_api(int cond, short row, short col) {
   //CHECK-LABEL: @test_api
-  //CHECK: call x86_amx @llvm.x86.tileloadd64.internal
-  //CHECK: call x86_amx @llvm.x86.tdpbssd.internal
-  //CHECK: call void @llvm.x86.tilestored64.internal
+  //CHECK-DAG: call x86_amx @llvm.x86.tileloadd64.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.tdpbssd.internal
+  //CHECK-DAG: call void @llvm.x86.tilestored64.internal
   __tile1024i a = {row, 8};
   __tile1024i b = {8, col};
   __tile1024i c = {row, col};
@@ -33,65 +35,70 @@
 
 void test_tile_loadd(short row, short col) {
   //CHECK-LABEL: @test_tile_loadd
-  //CHECK: call x86_amx @llvm.x86.tileloadd64.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.tileloadd64.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile1024i a = {row, col};
   __tile_loadd(, buf, STRIDE);
 }
 
 void test_tile_stream_loadd(short row, short col) {
   //CHECK-LABEL: @test_tile_stream_loadd
-  //CHECK: call x86_amx @llvm.x86.tileloaddt164.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.tileloaddt164.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile1024i a = {row, col};
   __tile_stream_loadd(, buf, STRIDE);
 }
 
 void test_tile_dpbssd(__tile1024i a, __tile1024i b, __tile1024i c) {
   //CHECK-LABEL: @test_tile_dpbssd
-  //CHECK: call x86_amx @llvm.x86.tdpbssd.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.tdpbssd.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile_dpbssd(, a, b);
 }
 
 void test_tile_dpbsud(__tile1024i a, __tile1024i b, __tile1024i c) {
   //CHECK-LABEL: @test_tile_dpbsud
-  //CHECK: call x86_amx @llvm.x86.tdpbsud.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.tdpbsud.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile_dpbsud(, a, b);
 }
 
 void test_tile_dpbusd(__tile1024i a, __tile1024i b, __tile1024i c) {
   //CHECK-LABEL: @test_tile_dpbusd
-  //CHECK: call x86_amx @llvm.x86.tdpbusd.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.tdpbusd.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile_dpbusd(, a, b);
 }
 
 void test_tile_dpbuud(__tile1024i a, __tile1024i b, __tile1024i c) {
   //CHECK-LABEL: @test_tile_dpbuud
-  //CHECK: call x86_amx @llvm.x86.tdpbuud.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.tdpbuud.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile_dpbuud(, a, b);
 }
 
 void test_tile_stored(__tile1024i c) {
   //CHECK-LABEL: @test_tile_stored
-  //CHECK: {{%.*}} = bitcast <256 x i32> {{%.*}} to x86_amx
-  //CHECK-NEXT: call void @llvm.x86.tilestored64.internal
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call void @llvm.x86.tilestored64.internal
   __tile_stored(buf, STRIDE, c);
 }
 
 void test_tile_zero(__tile1024i c) {
   //CHECK-LABEL: @test_tile_zero
-  //CHECK: call x86_amx @llvm.x86.tilezero.internal
-  //CHECK-NEXT bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.tilezero.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile_zero();
 }
 
 void test_tile_dpbf16ps(__tile1024i a, __tile1024i b, __tile1024i c) {
   //CHECK-LABEL: @test_tile_dpbf16ps
-

[PATCH] D122567: [X86][AMX] enable amx cast intrinsics in FE.

2022-03-29 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:5413-5415
+if (PTy->isX86_AMXTy())
+  ArgValue = 
Builder.CreateIntrinsic(Intrinsic::x86_cast_vector_to_tile,
+ {ArgValue->getType()}, 
{ArgValue});

xiangzhangllvm wrote:
> Can we fold it in CreateBitCast(ArgValue, PTy) function ?
I don't think so. We have amx specific cast to avoid some unexpected 
optimization for bitcast.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122567/new/

https://reviews.llvm.org/D122567

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99152: [AMX] Prototype for vector and amx bitcast.

2022-03-28 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.
Herald added a project: All.

In D99152#3235546 , @lebedev.ri wrote:

> What's the status here?

Here is the patch D122567 .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99152/new/

https://reviews.llvm.org/D99152

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122567: [X86][AMX] enable amx cast intrinsics in FE.

2022-03-28 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke created this revision.
Herald added a subscriber: pengfei.
Herald added a project: All.
LuoYuanke requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

We have some discission in D99152  and 
llvm-dev and finially come up with
a solution to add amx specific cast intrinsics. We've support the
intrinsics in llvm IR. This patch is to replace bitcast with amx cast
intrinsics in code emitting in FE.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D122567

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGen/X86/amx_api.c

Index: clang/test/CodeGen/X86/amx_api.c
===
--- clang/test/CodeGen/X86/amx_api.c
+++ clang/test/CodeGen/X86/amx_api.c
@@ -11,9 +11,11 @@
 // This is an example code and integration test.
 void test_api(int cond, short row, short col) {
   //CHECK-LABEL: @test_api
-  //CHECK: call x86_amx @llvm.x86.tileloadd64.internal
-  //CHECK: call x86_amx @llvm.x86.tdpbssd.internal
-  //CHECK: call void @llvm.x86.tilestored64.internal
+  //CHECK-DAG: call x86_amx @llvm.x86.tileloadd64.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.tdpbssd.internal
+  //CHECK-DAG: call void @llvm.x86.tilestored64.internal
   __tile1024i a = {row, 8};
   __tile1024i b = {8, col};
   __tile1024i c = {row, col};
@@ -33,65 +35,70 @@
 
 void test_tile_loadd(short row, short col) {
   //CHECK-LABEL: @test_tile_loadd
-  //CHECK: call x86_amx @llvm.x86.tileloadd64.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.tileloadd64.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile1024i a = {row, col};
   __tile_loadd(, buf, STRIDE);
 }
 
 void test_tile_stream_loadd(short row, short col) {
   //CHECK-LABEL: @test_tile_stream_loadd
-  //CHECK: call x86_amx @llvm.x86.tileloaddt164.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.tileloaddt164.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile1024i a = {row, col};
   __tile_stream_loadd(, buf, STRIDE);
 }
 
 void test_tile_dpbssd(__tile1024i a, __tile1024i b, __tile1024i c) {
   //CHECK-LABEL: @test_tile_dpbssd
-  //CHECK: call x86_amx @llvm.x86.tdpbssd.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.tdpbssd.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile_dpbssd(, a, b);
 }
 
 void test_tile_dpbsud(__tile1024i a, __tile1024i b, __tile1024i c) {
   //CHECK-LABEL: @test_tile_dpbsud
-  //CHECK: call x86_amx @llvm.x86.tdpbsud.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.tdpbsud.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile_dpbsud(, a, b);
 }
 
 void test_tile_dpbusd(__tile1024i a, __tile1024i b, __tile1024i c) {
   //CHECK-LABEL: @test_tile_dpbusd
-  //CHECK: call x86_amx @llvm.x86.tdpbusd.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.tdpbusd.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile_dpbusd(, a, b);
 }
 
 void test_tile_dpbuud(__tile1024i a, __tile1024i b, __tile1024i c) {
   //CHECK-LABEL: @test_tile_dpbuud
-  //CHECK: call x86_amx @llvm.x86.tdpbuud.internal
-  //CHECK-NEXT: {{%.*}} = bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call x86_amx @llvm.x86.tdpbuud.internal
+  //CHECK-DAG: call <256 x i32> @llvm.x86.cast.tile.to.vector.v256i32(x86_amx {{%.*}})
   __tile_dpbuud(, a, b);
 }
 
 void test_tile_stored(__tile1024i c) {
   //CHECK-LABEL: @test_tile_stored
-  //CHECK: {{%.*}} = bitcast <256 x i32> {{%.*}} to x86_amx
-  //CHECK-NEXT: call void @llvm.x86.tilestored64.internal
+  //CHECK-DAG: call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> {{%.*}})
+  //CHECK-DAG: call void @llvm.x86.tilestored64.internal
   __tile_stored(buf, STRIDE, c);
 }
 
 void test_tile_zero(__tile1024i c) {
   //CHECK-LABEL: @test_tile_zero
-  //CHECK: call x86_amx @llvm.x86.tilezero.internal
-  //CHECK-NEXT bitcast x86_amx {{%.*}} to <256 x i32>
+  //CHECK-DAG: call x86_amx

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-27 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-26 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/include/clang/CodeGen/CGFunctionInfo.h:590
+  /// Log 2 of the maximum vector width.
+  unsigned MaxVectorWidth : 4;
+

I notice some code would indicate it is log 2 size with Log2 suffix in the 
variable name. Do you think it is more readable to add Log2 suffix?



Comment at: clang/lib/CodeGen/CGCall.cpp:5238
+  for (unsigned i = 0; i < IRCallArgs.size(); ++i)
+LargestVectorWidth = std::max(LargestVectorWidth,
+  getMaxVectorWidth(IRCallArgs[i]->getType()));

Does this also affect other calling convention besides fastcall?



Comment at: clang/lib/CodeGen/TargetInfo.cpp:2303
   void classify(QualType T, uint64_t OffsetBase, Class , Class ,
-bool isNamedArg) const;
+bool isNamedArg, bool IsRegCall = false) const;
 

Update the comments for the new parameter?



Comment at: clang/lib/CodeGen/TargetInfo.cpp:3031
 // than eight eightbytes, ..., it has class MEMORY.
-if (Size > 512)
+if (!IsRegCall && Size > 512)
   return;

Would you add a test for non regcall? Pass 1024 bit vector parameter and check 
if it is well handled both with regcall and without regcall.
Would you add comments to depict why regcall accept the size which is more than 
512?



Comment at: clang/test/CodeGen/aarch64-neon-tbl.c:45
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] 
%a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] 
%a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16

I'm curious why aarch64 test cases are affected by the patch.



Comment at: clang/test/CodeGen/regcall2.c:2
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl 
-triple=x86_64-pc-win32 | FileCheck %s --check-prefix=Win
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl 
-triple=x86_64-pc-linux-gnu | FileCheck %s --check-prefix=Lin

Add test case for target that has no avx512 feature?



Comment at: clang/test/CodeGen/regcall2.c:9
+  __m512d r1[4];
+  __m512 r2[4];
+} __sVector;

May add a test case to show what's the max register we can pass with regcall.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-20 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/include/clang/CodeGen/CGFunctionInfo.h:744
+  void setMaxVectorWidth(unsigned Width) {
+ MaxVectorWidth = Width ? llvm::countTrailingZeros(Width) + 1 : 0;
+  }

Use "Log2_32()"?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120307: [X86] Add helper enum for ternary intrinsics

2022-03-06 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, but pls wait for 1 or 2 days to see if there are any comments from others.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120307/new/

https://reviews.llvm.org/D120307

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D115199: [WIP][X86][AMX] Support amxpreserve attribute in clang.

2021-12-08 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke updated this revision to Diff 393012.
LuoYuanke added a comment.
Herald added a subscriber: martong.

Updating D115199 : [WIP][X86][AMX] Support 
amxpreserve attribute in clang.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115199/new/

https://reviews.llvm.org/D115199

Files:
  clang/include/clang/AST/Type.h
  clang/include/clang/AST/TypeProperties.td
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/AttrDocs.td
  clang/include/clang/CodeGen/CGFunctionInfo.h
  clang/lib/AST/ASTContext.cpp
  clang/lib/AST/ASTStructuralEquivalence.cpp
  clang/lib/AST/TypePrinter.cpp
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Sema/SemaDecl.cpp
  clang/lib/Sema/SemaType.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/Sema/attr-target-mv.c
  clang/test/SemaCXX/attr-non-x86-amx-preserve.cpp
  clang/test/SemaCXX/attr-x86-amx-preserve.cpp
  clang/unittests/AST/StructuralEquivalenceTest.cpp

Index: clang/unittests/AST/StructuralEquivalenceTest.cpp
===
--- clang/unittests/AST/StructuralEquivalenceTest.cpp
+++ clang/unittests/AST/StructuralEquivalenceTest.cpp
@@ -476,6 +476,16 @@
   EXPECT_FALSE(testStructuralMatch(t));
 }
 
+TEST_F(StructuralEquivalenceFunctionTest,
+   FunctionsWithDifferentAMXSavedRegsAttr) {
+  if (llvm::Triple(llvm::sys::getDefaultTargetTriple()).getArch() !=
+  llvm::Triple::x86_64)
+return;
+  auto t = makeNamedDecls("__attribute__((amxpreserve)) void foo();",
+  " void foo();", Lang_C99);
+  EXPECT_FALSE(testStructuralMatch(t));
+}
+
 struct StructuralEquivalenceCXXMethodTest : StructuralEquivalenceTest {
 };
 
Index: clang/test/SemaCXX/attr-x86-amx-preserve.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/attr-x86-amx-preserve.cpp
@@ -0,0 +1,33 @@
+// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
+
+struct a {
+  int b __attribute__((amxpreserve)); // expected-warning {{'amxpreserve' only applies to function types; type here is 'int'}}
+  static void foo(int *a) __attribute__((amxpreserve)) {}
+};
+
+struct a test __attribute__((amxpreserve)); // expected-warning {{'amxpreserve' only applies to function types; type here is 'struct a'}}
+
+__attribute__((amxpreserve(999))) void bar(int *) {} // expected-error {{'amxpreserve' attribute takes no arguments}}
+
+void __attribute__((amxpreserve)) foo(int *){}
+
+__attribute__((amxpreserve)) void foo2(int *) {}
+
+typedef __attribute__((amxpreserve)) void (*foo3)(int *);
+
+int (*foo4)(double a, __attribute__((amxpreserve)) float b); // expected-warning {{'amxpreserve' only applies to function types; type here is 'float'}}
+
+typedef void (*foo5)(int *);
+
+void foo6(){} // expected-note {{previous declaration is here}}
+
+void __attribute__((amxpreserve)) foo6(); // expected-error {{function declared with 'amxpreserve' attribute was previously declared without the 'amxpreserve' attribute}} 
+
+int main(int argc, char **argv) {
+  void (*fp)(int *) = foo; // expected-error {{cannot initialize a variable of type 'void (*)(int *)' with an lvalue of type 'void (int *) __attribute__((amxpreserve))'}} 
+  a::foo();
+  foo3 func = foo2;
+  func();
+  foo5 __attribute__((amxpreserve)) func2 = foo2;
+  return 0;
+}
Index: clang/test/SemaCXX/attr-non-x86-amx-preserve.cpp
===
--- /dev/null
+++ clang/test/SemaCXX/attr-non-x86-amx-preserve.cpp
@@ -0,0 +1,29 @@
+// RUN: %clang_cc1 -std=c++11 -triple armv7-unknown-linux-gnueabi -fsyntax-only -verify %s
+
+struct a {
+  int __attribute__((amxpreserve)) b; // expected-warning {{unknown attribute 'amxpreserve' ignored}}
+  static void foo(int *a) __attribute__((amxpreserve)) {} // expected-warning {{unknown attribute 'amxpreserve' ignored}}
+};
+
+struct a test __attribute__((amxpreserve)); // expected-warning {{unknown attribute 'amxpreserve' ignored}}
+
+__attribute__((amxpreserve(999))) void bar(int *) {} // expected-warning {{unknown attribute 'amxpreserve' ignored}}
+
+__attribute__((amxpreserve)) void foo(int *){} // expected-warning {{unknown attribute 'amxpreserve' ignored}}
+
+[[clang::amxpreserve]] void foo2(int *) {} // expected-warning {{unknown attribute 'amxpreserve' ignored}}
+
+typedef __attribute__((amxpreserve)) void (*foo3)(int *); // expected-warning {{unknown attribute 'amxpreserve' ignored}}
+
+typedef void (*foo5)(int *);
+
+int (*foo4)(double a, __attribute__((amxpreserve)) float b); // expected-warning {{unknown attribute 'amxpreserve' ignored}}
+
+int main(int argc, char **argv) {
+  void (*fp)(int *) = foo;
+  a::foo();
+  foo3 func = foo2;
+  func();
+  foo5 __attribute__((amxpreserve)) func2 = foo2; // expected-warning {{unknown attribute

[PATCH] D115199: [X86][AMX] Support amxpreserve attribute in clang.

2021-12-06 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke created this revision.
Herald added a reviewer: aaron.ballman.
LuoYuanke requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D115199

Files:
  clang/include/clang/Basic/Attr.td
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Sema/SemaType.cpp


Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -7514,6 +7514,17 @@
 return true;
   }
 
+  if (attr.getKind() == ParsedAttr::AT_AMXPreserve) {
+if (S.CheckAttrTarget(attr))
+  return true;
+
+// Delay if this is not a function type.
+if (!unwrapped.isFunctionType())
+  return false;
+
+return true;
+  }
+
   if (attr.getKind() == ParsedAttr::AT_AnyX86NoCallerSavedRegisters) {
 if (S.CheckAttrTarget(attr) || S.CheckAttrNoArgs(attr))
   return true;
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -1858,6 +1858,8 @@
 // carry an explicit noinline attribute.
 if (!F->hasFnAttribute(llvm::Attribute::AlwaysInline))
   B.addAttribute(llvm::Attribute::NoInline);
+  } else if (D->hasAttr()) {
+B.addAttribute(llvm::Attribute::AMXPreserve);
   } else {
 // Otherwise, propagate the inline hint attribute and potentially use its
 // absence to mark things as noinline.
Index: clang/lib/CodeGen/CGCall.cpp
===
--- clang/lib/CodeGen/CGCall.cpp
+++ clang/lib/CodeGen/CGCall.cpp
@@ -2116,6 +2116,8 @@
   FuncAttrs.addAttribute(llvm::Attribute::NoCfCheck);
 if (TargetDecl->hasAttr())
   FuncAttrs.addAttribute(llvm::Attribute::NoCallback);
+if (TargetDecl->hasAttr())
+  FuncAttrs.addAttribute(llvm::Attribute::AMXPreserve);
 
 HasOptnone = TargetDecl->hasAttr();
 if (auto *AllocSize = TargetDecl->getAttr()) {
Index: clang/include/clang/Basic/Attr.td
===
--- clang/include/clang/Basic/Attr.td
+++ clang/include/clang/Basic/Attr.td
@@ -2892,6 +2892,12 @@
   let SimpleHandler = 1;
 }
 
+def AMXPreserve : InheritableAttr, TargetSpecificAttr {
+  let Spellings = [GCC<"amxpreserve">, Declspec<"amxpreserve">];
+  let Documentation = [Undocumented];
+  let SimpleHandler = 1;
+}
+
 def AnyX86Interrupt : InheritableAttr, TargetSpecificAttr {
   // NOTE: If you add any additional spellings, ARMInterrupt's,
   // M68kInterrupt's, MSP430Interrupt's and MipsInterrupt's spellings must 
match.


Index: clang/lib/Sema/SemaType.cpp
===
--- clang/lib/Sema/SemaType.cpp
+++ clang/lib/Sema/SemaType.cpp
@@ -7514,6 +7514,17 @@
 return true;
   }
 
+  if (attr.getKind() == ParsedAttr::AT_AMXPreserve) {
+if (S.CheckAttrTarget(attr))
+  return true;
+
+// Delay if this is not a function type.
+if (!unwrapped.isFunctionType())
+  return false;
+
+return true;
+  }
+
   if (attr.getKind() == ParsedAttr::AT_AnyX86NoCallerSavedRegisters) {
 if (S.CheckAttrTarget(attr) || S.CheckAttrNoArgs(attr))
   return true;
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -1858,6 +1858,8 @@
 // carry an explicit noinline attribute.
 if (!F->hasFnAttribute(llvm::Attribute::AlwaysInline))
   B.addAttribute(llvm::Attribute::NoInline);
+  } else if (D->hasAttr()) {
+B.addAttribute(llvm::Attribute::AMXPreserve);
   } else {
 // Otherwise, propagate the inline hint attribute and potentially use its
 // absence to mark things as noinline.
Index: clang/lib/CodeGen/CGCall.cpp
===
--- clang/lib/CodeGen/CGCall.cpp
+++ clang/lib/CodeGen/CGCall.cpp
@@ -2116,6 +2116,8 @@
   FuncAttrs.addAttribute(llvm::Attribute::NoCfCheck);
 if (TargetDecl->hasAttr())
   FuncAttrs.addAttribute(llvm::Attribute::NoCallback);
+if (TargetDecl->hasAttr())
+  FuncAttrs.addAttribute(llvm::Attribute::AMXPreserve);
 
 HasOptnone = TargetDecl->hasAttr();
 if (auto *AllocSize = TargetDecl->getAttr()) {
Index: clang/include/clang/Basic/Attr.td
===
--- clang/include/clang/Basic/Attr.td
+++ clang/include/clang/Basic/Attr.td
@@ -2892,6 +2892,12 @@
   let SimpleHandler = 1;
 }
 
+def AMXPreserve : InheritableAttr, TargetSpecificAttr {
+  let Spellings = [GCC<"amxpreserve">, Declspec<"amxpreserve">];
+  let Documentation = [Undocumented];
+  let SimpleHandler = 1;
+}
+
 def AnyX86Interrupt : InheritableAttr, TargetSpecificAttr {

[PATCH] D111037: [X86] Check if struct is blank before getting the inner types

2021-10-08 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111037/new/

https://reviews.llvm.org/D111037

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D111037: [X86] Check if struct is blank before getting the inner types

2021-10-07 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/test/CodeGen/X86/avx512fp16-abi.c:207
+} pr52011() {
+  // CHECK-C: define{{.*}} { float, double } @pr52011
+}

Why not test CPP as well?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111037/new/

https://reviews.llvm.org/D111037

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D111037: [X86] Check if struct is blank before getting the inner types

2021-10-07 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/test/CodeGen/X86/avx512fp16-abi.c:203
+struct {
+  float a;
+  struct {};

Add more cases for the struct composed of _Float16, float, double, struct {}?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111037/new/

https://reviews.llvm.org/D111037

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D109607: [X86] Refactor GetSSETypeAtOffset to fix pr51813

2021-09-15 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, but pls wait 1 or 2 days for the comments from others.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109607/new/

https://reviews.llvm.org/D109607

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D109607: [X86] Refactor GetSSETypeAtOffset to fix pr51813

2021-09-15 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/test/CodeGen/X86/avx512fp16-abi.c:153
+struct float2 {
+  struct {} s;
+  float a;

Add a test case for "{ struct {}; half; struct {}; half;}?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109607/new/

https://reviews.llvm.org/D109607

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D109607: [X86] Refactor GetSSETypeAtOffset to fix pr51813

2021-09-15 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:3421
+if (T0->isHalfTy())
+  T1 = getFPTypeAtOffset(IRType, IROffset + 4, TD);
+// If we can't get a second FP type, return a simple half or float.

Not quite understanding why "+4". Would you comments on it?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109607/new/

https://reviews.llvm.org/D109607

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D109658: [X86][FP16] Change the order of the operands in complex FMA intrinsics to allow swap between the mul operands.

2021-09-13 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

It seems in this patch the builtins interface is aligned to intrinsics 
interface. Since AVX512FP16 is pretty new, I assume nobody is using the GCC 
builtin. Can we ask GCC guys change their builtin interface?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109658/new/

https://reviews.llvm.org/D109658

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D109607: [X86] Refactor GetSSETypeAtOffset to fix pr51813

2021-09-12 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:3417
+  llvm::Type *T1 = getFPTypeAtOffset(IRType, IROffset + NextFP, TD);
+  if (T1 == nullptr) {
+if (NextFP == 2)

Would you add comments on each case like previous code?



Comment at: clang/lib/CodeGen/TargetInfo.cpp:3454
-
-return llvm::Type::getHalfTy(getVMContext());
-  }

Is this the major change?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109607/new/

https://reviews.llvm.org/D109607

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D109487: [X86] Support *_set1_pch(Float16 _Complex h)

2021-09-11 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109487/new/

https://reviews.llvm.org/D109487

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105269: [X86] AVX512FP16 instructions enabling 6/6

2021-08-27 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, but may wait 1 or 2 days for the comments from others.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105269/new/

https://reviews.llvm.org/D105269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105269: [X86] AVX512FP16 instructions enabling 6/6

2021-08-27 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47419
+   : X86ISD::VFCMADDC;
+  // FIXME: How we handle when FMF of FADD is different from CFMUL's?
+  CFmul = DAG.getNode(newOp, SDLoc(N), CVT, FAddOp1, CFmul.getOperand(0),

RKSimon wrote:
> LuoYuanke wrote:
> > Sorry, I don't understand the comments. What does FMF mean?
> fast math flags?
I understand now. Thanks, Simon. :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105269/new/

https://reviews.llvm.org/D105269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105269: [X86] AVX512FP16 instructions enabling 6/6

2021-08-27 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47419
+   : X86ISD::VFCMADDC;
+  // FIXME: How we handle when FMF of FADD is different from CFMUL's?
+  CFmul = DAG.getNode(newOp, SDLoc(N), CVT, FAddOp1, CFmul.getOperand(0),

Sorry, I don't understand the comments. What does FMF mean?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105269/new/

https://reviews.llvm.org/D105269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105269: [X86] AVX512FP16 instructions enabling 6/6

2021-08-26 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:13640
+(v4f32 (OpNode VR128X:$src1, VR128X:$src2)),
+0, 0, 0, X86selects, "@earlyclobber $dst">, 
Sched<[sched.XMM]>;
+defm rm : AVX512_maskable LuoYuanke wrote:
> > I didn't see this flag for other scalar instructions, why we need it for 
> > complex instruction?
> Because all complex instructions have constrains "dst != src1 && dst != 
> src2". We use earlyclobber to avoid the dst been assigned to src1 or src2.
Got it. Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105269/new/

https://reviews.llvm.org/D105269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105269: [X86] AVX512FP16 instructions enabling 6/6

2021-08-26 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86InstrFoldTables.cpp:1852
+  { X86::VFCMULCPHZrr, X86::VFCMULCPHZrm, 0 },
+  { X86::VFCMULCSHZrr, X86::VFCMULCSHZrm, 
TB_NO_REVERSE },
   { X86::VFMADDPD4Yrr, X86::VFMADDPD4Ymr, 0 },

craig.topper wrote:
> LuoYuanke wrote:
> > pengfei wrote:
> > > LuoYuanke wrote:
> > > > Why FR32X version is not needed for complex scalar instructions?
> > > Do you mean complex ss/sd? We don't have these instructions.
> > No, I mean we have both X86::XXX and X86::XXX_Int for other instructions. 
> > One is FR16X which can be unfolded, one is VR128X which can't. For example, 
> > VFNMADD213SHZm and VFNMADD213SHZm_Int. 
> The VFCMULCSHZrr instructions produce two 16-bit values packed into the lower 
> 32 bits. That would mean we would need a FR32X result, but it couldn't 
> interact meaningfully with any other FR32X instruction since its really two 
> values.
> 
> I think we only have FR32/FR64 instructions for things that have generic IR 
> equivalents or that we create from other generic IR operations. Like I think 
> we have an FR32 RCP and RSQRT because we can convert float div or 1/sqrt to 
> them.
Thanks, Craig. I understand now. :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105269/new/

https://reviews.llvm.org/D105269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105269: [X86] AVX512FP16 instructions enabling 6/6

2021-08-26 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86InstrFoldTables.cpp:1852
+  { X86::VFCMULCPHZrr, X86::VFCMULCPHZrm, 0 },
+  { X86::VFCMULCSHZrr, X86::VFCMULCSHZrm, 
TB_NO_REVERSE },
   { X86::VFMADDPD4Yrr, X86::VFMADDPD4Ymr, 0 },

pengfei wrote:
> LuoYuanke wrote:
> > Why FR32X version is not needed for complex scalar instructions?
> Do you mean complex ss/sd? We don't have these instructions.
No, I mean we have both X86::XXX and X86::XXX_Int for other instructions. One 
is FR16X which can be unfolded, one is VR128X which can't. For example, 
VFNMADD213SHZm and VFNMADD213SHZm_Int. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105269/new/

https://reviews.llvm.org/D105269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105269: [X86] AVX512FP16 instructions enabling 6/6

2021-08-26 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/test/CodeGen/X86/avx512fp16-builtins.c:4223
+
+// CFC ADD PH
+

MADD?



Comment at: clang/test/CodeGen/X86/avx512fp16-builtins.c:4315
+
+// CF ADD PH
+

MADD?



Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:5732
+
+  def int_x86_avx512fp16_mask_vfcmaddc_ph_128
+  : GCCBuiltin<"__builtin_ia32_vfcmaddcph128_mask">,

_cph?



Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:5796
+  [ IntrNoMem, ImmArg> ]>;
+  def int_x86_avx512fp16_mask_vfmaddc_sh
+  : GCCBuiltin<"__builtin_ia32_vfmaddcsh_mask">,

_csh?



Comment at: llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp:3902
+  case X86::VFCMADDCSHZr:
+  case X86::VFCMADDCSHZrb:
+  case X86::VFCMADDCSHZrbk:

"b" means rounding. Right?



Comment at: llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp:3948
+for (unsigned i = 2; i < Inst.getNumOperands(); i++)
+  if (Inst.getOperand(i).isReg() && Dest == Inst.getOperand(i).getReg())
+return Warning(Ops[0]->getStartLoc(), "Destination register should be "

Sorry, I didn't find the constrain in the spec.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47289
+return 0;
+  if (RHS->getOpcode() == ISD::BITCAST && RHS.hasOneUse() &&
+  (RHS->getOperand(0)->getOpcode() == X86ISD::VFMULC ||

Can swap LHS and RHS reduce some redundant code?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47296
+};
+int MulId = getMulId();
+const TargetOptions  = DAG.getTarget().Options;

The lambda seems only be called once.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47298
+const TargetOptions  = DAG.getTarget().Options;
+if ((Options.AllowFPOpFusion == FPOpFusion::Fast || Options.UnsafeFPMath) 
&&
+MulId < 2 && Subtarget.hasFP16() && IsAdd &&

Is it possible fast and non-fast instruction is mixed due to inline? Shall we 
check the instruction AllowContract flag?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47359
+//  t23: v16f32 = X86ISD::VFCMULC[X86ISD::VFMULC]
+//  t8, t22
+//  t24: v32f16 = bitcast t23

Merge it to previous line.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:5781
+ (MaskOpNode  _.RC:$src1, (_.VT (_.BroadcastLdFrag 
addr:$src2))),
+ 0, 0, 0, ClobberConstraint>,
  EVEX_4V, EVEX_B,

Moving ClobberConstraint before IsCommutable  saves the code for default value?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:13604
+
+  defm VFMULCPH  : avx512_cfmbinop_common<0xD6, "vfmulcph", x86vfmulc, 
x86vfmulc,
+x86vfmulcRnd>, T_MAP6XS, EVEX_CD8<32, 
CD8VF>;

The name seems not accurate. Is it cfmop for mul and cfmaop for fma?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:13640
+(v4f32 (OpNode VR128X:$src1, VR128X:$src2)),
+0, 0, 0, X86selects, "@earlyclobber $dst">, 
Sched<[sched.XMM]>;
+defm rm : AVX512_maskablehttps://reviews.llvm.org/D105269/new/

https://reviews.llvm.org/D105269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105268: [X86] AVX512FP16 instructions enabling 5/6

2021-08-21 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, may wait 1 or 2 days for comments from others.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105268/new/

https://reviews.llvm.org/D105268

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D108509: [X86][AMX] Add missing inline attributes in AMX intrinsics. NFCI

2021-08-21 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108509/new/

https://reviews.llvm.org/D108509

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D108422: [NFC][clang] Move remaining part of X86Target.def to llvm/Support/X86TargetParser.def

2021-08-21 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

>> Thanks for reminding. We've supported -march=${CPU}, but forgot to update 
>> this table. We will update it.
>
> Shall we get this patch committed first before making any changes?

Yes, committing the patch first looks good to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108422/new/

https://reviews.llvm.org/D108422

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D108422: [NFC][clang] Move remaining part of X86Target.def to llvm/Support/X86TargetParser.def

2021-08-20 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

In D108422#2957541 , @erichkeane 
wrote:

> In D108422#2957528 , @RKSimon wrote:
>
>> There's nothing later than CannonLake here - does Intel need to at least 
>> reference up to Tiger/Rocketlake?
>
> @LuoYuanke ^^

Thanks for reminding. We've supported -march=${CPU}, but forgot to update this 
table. We will update it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108422/new/

https://reviews.llvm.org/D108422

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105267: [X86] AVX512FP16 instructions enabling 4/6

2021-08-19 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, thanks. May wait 1 or 2 days for the comments from others.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105267/new/

https://reviews.llvm.org/D105267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105268: [X86] AVX512FP16 instructions enabling 5/6

2021-08-19 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

I understand now. Thanks, Craig.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105268/new/

https://reviews.llvm.org/D105268

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105268: [X86] AVX512FP16 instructions enabling 5/6

2021-08-19 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/include/clang/Basic/BuiltinsX86.def:2010
+TARGET_BUILTIN(__builtin_ia32_vfmaddph, "V8xV8xV8xV8x", "ncV:128:", 
"avx512fp16,avx512vl")
+TARGET_BUILTIN(__builtin_ia32_vfmaddph256, "V16xV16xV16xV16x", "ncV:256:", 
"avx512fp16,avx512vl")
+

Can we arrange the vfmaddph variant together?  Move it to line 1997?
Why there is no mask version for 128 and 256?



Comment at: clang/include/clang/Basic/BuiltinsX86.def:2014
+
+TARGET_BUILTIN(__builtin_ia32_vfmaddsh3_mask, "V8xV8xV8xV8xUcIi", "ncV:128:", 
"avx512fp16")
+TARGET_BUILTIN(__builtin_ia32_vfmaddsh3_maskz, "V8xV8xV8xV8xUcIi", "ncV:128:", 
"avx512fp16")

What does "3" stand for?



Comment at: clang/lib/Headers/avx512vlfp16intrin.h:1385
+  __m128h __C) 
{
+  return (__m128h)__builtin_ia32_selectph_128(
+  (__mmask8)__U,

Sorry, I'm confused sometimes we use mask builtin, sometimes we use select 
builtin. Any guideline on it?



Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:5709
+
+  def int_x86_avx512fp16_vfmadd_ph_512
+  : Intrinsic<[ llvm_v32f16_ty ],

I notice there is no builtin bound to this intrinsic. What is it used for?



Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:5727
+  [ IntrNoMem, ImmArg> ]>;
+  def int_x86_avx512fp16_vfmadd_f16
+  : Intrinsic<[ llvm_half_ty ],

ph?



Comment at: llvm/lib/Target/X86/X86InstrFMA3Info.cpp:148
+
+#define FP16_FMA3GROUP_PACKED_AVX512(Name, Suf, Attrs) 
\
+  FMA3GROUP_PACKED_AVX512_WIDTHS(Name, PH, Suf, Attrs)

Can we integrate it to FMA3GROUP_PACKED_AVX512() with PH extended?



Comment at: llvm/lib/Target/X86/X86InstrFMA3Info.cpp:151
+
+#define FP16_FMA3GROUP_PACKED_AVX512_ROUND(Name, Suf, Attrs)   
\
+  FMA3GROUP_MASKED(Name, PHZ##Suf, Attrs)

Ditto.



Comment at: llvm/lib/Target/X86/X86InstrFMA3Info.cpp:154
+
+#define FP16_FMA3GROUP_SCALAR_AVX512_ROUND(Name, Suf, Attrs)   
\
+  FMA3GROUP(Name, SHZ##Suf, Attrs) 
\

Ditto.



Comment at: llvm/lib/Target/X86/X86InstrFMA3Info.cpp:158
+
+static const X86InstrFMA3Group FP16BroadcastGroups[] = {
+  FP16_FMA3GROUP_PACKED_AVX512(VFMADD, mb, 0)

Ditto.



Comment at: llvm/lib/Target/X86/X86InstrFMA3Info.cpp:167
+
+static const X86InstrFMA3Group FP16RoundGroups[] = {
+  FP16_FMA3GROUP_PACKED_AVX512_ROUND(VFMADD, rb, 0)

Ditto.



Comment at: llvm/lib/Target/X86/X86InstrFMA3Info.cpp:208
  (BaseOpcode >= 0xB6 && BaseOpcode <= 0xBF));
+  bool IsFMA3H = (TSFlags & X86II::EncodingMask) == X86II::EVEX &&
+ (TSFlags & X86II::OpMapMask) == X86II::T_MAP6 &&

Looks some redundant logic. Only X86II::EVEX and X86II::T_MAP6 is special for 
FP16?



Comment at: llvm/lib/Target/X86/X86InstrFMA3Info.cpp:235
+else
+  Table = makeArrayRef(FP16Groups);
+  }

Seems we only need FP16Groups be separate table.



Comment at: llvm/test/CodeGen/X86/avx512fp16-fma-commute.ll:9
+
+define half @fma_123_f16(half %x, half %y, half %z) {
+; CHECK-LABEL: fma_123_f16:

The name 123 is not the same with the generated instruction (213sh). Is it 
expected?



Comment at: llvm/test/CodeGen/X86/vec-strict-128-fp16.ll:105
 
+define <8 x half> @f13(<8 x half> %a, <8 x half> %b, <8 x half> %c) #0 {
+; CHECK-LABEL: f13:

Is it necessary to test 132, 231 version?



Comment at: llvm/test/CodeGen/X86/vec-strict-256-fp16.ll:105
+; CHECK:   # %bb.0:
+; CHECK-NEXT:vfmadd213ph %ymm2, %ymm1, %ymm0
+; CHECK-NEXT:ret{{[l|q]}}

Ditto.



Comment at: llvm/test/CodeGen/X86/vec-strict-512-fp16.ll:104
+; CHECK:   # %bb.0:
+; CHECK-NEXT:vfmadd213ph %zmm2, %zmm1, %zmm0
+; CHECK-NEXT:ret{{[l|q]}}

Ditto.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105268/new/

https://reviews.llvm.org/D105268

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105267: [X86] AVX512FP16 instructions enabling 4/6

2021-08-18 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:1920
+  setOperationAction(ISD::STRICT_FTRUNC,  VT, Legal);
+  setOperationAction(ISD::FRINT,  VT, Legal);
+  setOperationAction(ISD::STRICT_FRINT,   VT, Legal);

craig.topper wrote:
> LuoYuanke wrote:
> > Does this node means "round to int"? What's the difference to "FNEARBYINT"?
> rint and nearbyint are both C math library functions. rint raises an 
> exception if the rounding isn't exact, nearbyint doesn't.
Got it. Thanks. :)



Comment at: llvm/lib/Target/X86/X86InstrFoldTables.cpp:3037
   { X86::VSQRTSDr_Int, X86::VSQRTSDm_Int, 
TB_NO_REVERSE },
+  { X86::VSQRTSHZr,X86::VSQRTSHZm,0 },
+  { X86::VSQRTSHZr_Int,X86::VSQRTSHZm_Int,
TB_NO_REVERSE },

craig.topper wrote:
> LuoYuanke wrote:
> > Why no TB_NO_REVERSE for it?
> Only the _Int need TB_NO_REVERSE because the memory type is 16 bits but the 
> register class is VR128X so the sizes are different. The unfolding code would 
> use the size of the register class to do the unfold and create a 
> vmovaps/vmovups which would increase the size of the load.
> 
> For VSQRTZr, the register class is FR16X and the memory size is 16 bits so 
> they match.
Got it. Thanks. :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105267/new/

https://reviews.llvm.org/D105267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105267: [X86] AVX512FP16 instructions enabling 4/6

2021-08-18 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:1920
+  setOperationAction(ISD::STRICT_FTRUNC,  VT, Legal);
+  setOperationAction(ISD::FRINT,  VT, Legal);
+  setOperationAction(ISD::STRICT_FRINT,   VT, Legal);

Does this node means "round to int"? What's the difference to "FNEARBYINT"?



Comment at: llvm/lib/Target/X86/X86ISelLowering.h:290
 
+// AVX-512-FP16 scalar reciprocal approximations
+FRSQRTS,

Move the code to line 283, so that it is adjacent to FRSQRT and FRCP?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:9279
 /// avx512_fp14_s rcp14ss, rcp14sd, rsqrt14ss, rsqrt14sd
 multiclass avx512_fp14_s opc, string OpcodeStr, SDNode OpNode,
+ X86FoldableSchedWrite sched, X86VectorVTInfo _,

The name is not precise now. We now support non-fp14 node. Also update the 
comments.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:9484
+  defm PHZ : avx512_fp28_p,
+  avx512_fp28_p_sae,
+  T_MAP6PD, EVEX_V512, EVEX_CD8<16, CD8VF>;

indent.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:13476
+
+multiclass avx512_fp16_p_vl_all opc, string OpcodeStr, SDNode OpNode,
+   X86SchedWriteWidths sched> {

Why not merge this class to avx512_fp14_p_vl_all? Is it because it doesn't use 
MXCSR?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:13477
+multiclass avx512_fp16_p_vl_all opc, string OpcodeStr, SDNode OpNode,
+   X86SchedWriteWidths sched> {
+  let Predicates = [HasFP16] in

indent.



Comment at: llvm/lib/Target/X86/X86InstrFoldTables.cpp:3037
   { X86::VSQRTSDr_Int, X86::VSQRTSDm_Int, 
TB_NO_REVERSE },
+  { X86::VSQRTSHZr,X86::VSQRTSHZm,0 },
+  { X86::VSQRTSHZr_Int,X86::VSQRTSHZm_Int,
TB_NO_REVERSE },

Why no TB_NO_REVERSE for it?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105267/new/

https://reviews.llvm.org/D105267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105267: [X86] AVX512FP16 instructions enabling 4/6

2021-08-17 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/include/clang/Basic/BuiltinsX86.def:1897
+
+TARGET_BUILTIN(__builtin_ia32_rndscaleph_128_mask, "V8xV8xIiV8xUc", 
"ncV:128:", "avx512fp16,avx512vl")
+TARGET_BUILTIN(__builtin_ia32_rndscaleph_256_mask, "V16xV16xIiV16xUs", 
"ncV:256:", "avx512fp16,avx512vl")

The naming convention is not consistent. Rename it to rndscaleph128?



Comment at: clang/include/clang/Basic/BuiltinsX86.def:1898
+TARGET_BUILTIN(__builtin_ia32_rndscaleph_128_mask, "V8xV8xIiV8xUc", 
"ncV:128:", "avx512fp16,avx512vl")
+TARGET_BUILTIN(__builtin_ia32_rndscaleph_256_mask, "V16xV16xIiV16xUs", 
"ncV:256:", "avx512fp16,avx512vl")
+TARGET_BUILTIN(__builtin_ia32_rndscaleph_mask, "V32xV32xIiV32xUiIi", 
"ncV:512:", "avx512fp16")

Ditto.



Comment at: clang/include/clang/Basic/BuiltinsX86.def:1899
+TARGET_BUILTIN(__builtin_ia32_rndscaleph_256_mask, "V16xV16xIiV16xUs", 
"ncV:256:", "avx512fp16,avx512vl")
+TARGET_BUILTIN(__builtin_ia32_rndscaleph_mask, "V32xV32xIiV32xUiIi", 
"ncV:512:", "avx512fp16")
+TARGET_BUILTIN(__builtin_ia32_reduceph128_mask, "V8xV8xIiV8xUc", "ncV:128:", 
"avx512fp16,avx512vl")

rndscaleph512?



Comment at: clang/include/clang/Basic/BuiltinsX86.def:1906
+TARGET_BUILTIN(__builtin_ia32_getmantsh_round_mask, "V8xV8xV8xIiV8xUcIi", 
"ncV:128:", "avx512fp16")
+TARGET_BUILTIN(__builtin_ia32_getexpsh128_round_mask, "V8xV8xV8xV8xUcIi", 
"ncV:128:", "avx512fp16")
+TARGET_BUILTIN(__builtin_ia32_scalefsh_round_mask, "V8xV8xV8xV8xUcIi", 
"ncV:128:", "avx512fp16")

The name convention is not consistent for scalar version. getexpsh?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105267/new/

https://reviews.llvm.org/D105267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105265: [X86] AVX512FP16 instructions enabling 3/6

2021-08-16 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, but wait 1 or 2 days for the comments from others.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105265/new/

https://reviews.llvm.org/D105265

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105331: [CFE][X86] Enable complex _Float16 support

2021-08-16 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, but wait 1 or 2 days to see if there is any comments from others.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105331/new/

https://reviews.llvm.org/D105331

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105331: [CFE][X86] Enable complex _Float16.

2021-08-16 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

Could you add the linkage of ABI in the commit message?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105331/new/

https://reviews.llvm.org/D105331

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105265: [X86] AVX512FP16 instructions enabling 3/6

2021-08-16 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

Thank Craig for the clarification!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105265/new/

https://reviews.llvm.org/D105265

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105265: [X86] AVX512FP16 instructions enabling 3/6

2021-08-15 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:1955
   setOperationAction(ISD::SCALAR_TO_VECTOR,   MVT::v32f16, Custom);
+  setOperationAction(ISD::SINT_TO_FP, MVT::v32i16, Legal);
+  setOperationAction(ISD::STRICT_SINT_TO_FP,  MVT::v32i16, Legal);

Sorry, I'm just confused on why the type is the same for ISD::SINT_TO_FP and 
ISD::FP_TO_SINT? The legalization use src type for ISD::SINT_TO_FP and dst type 
for ISD::FP_TO_SINT? Why not unify to dst type.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:1996
   setOperationAction(ISD::SCALAR_TO_VECTOR,   MVT::v16f16, Custom);
+  setOperationAction(ISD::SINT_TO_FP, MVT::v16i16, Legal);
+  setOperationAction(ISD::STRICT_SINT_TO_FP,  MVT::v16i16, Legal);

How do we know it covert to v16f16? Is it possible convert to v16f32?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:2054
+  // vcvttph2[u]dq v4f16 -> v4i32/64, v2f16 -> v2i32/64
+  setOperationAction(ISD::FP_TO_SINT,MVT::v2f16, Custom);
+  setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::v2f16, Custom);

Why it is not v2i16?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:19998
+  SDLoc dl(Op);
+  SDValue InVec = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v2i64, Src);
+  if (IsStrict) {

Should this node be chained to Op.getOperand(0) for strict FP and convert 
operation be chained to this node?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22003
+  MakeLibCallOptions CallOptions;
+  return makeLibCall(DAG, LC, VT, In, CallOptions, SDLoc(Op)).first;
+}

InChain for strict FP?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22014
+SDValue Res = DAG.getNode(ISD::CONCAT_VECTORS, DL, MVT::v8f16, In,
+  DAG.getUNDEF(MVT::v4f16));
+if (IsStrict)

Is there any case for v3f16?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:31087
+// Now widen to 128 bits.
+unsigned NumConcats = 128 / TmpVT.getSizeInBits();
+MVT ConcatVT = MVT::getVectorVT(EleVT.getSimpleVT(), 8 * NumConcats);

Is it possible the type is i3/i5/i7?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:31265
+  if (Src.getValueType().getVectorElementType() == MVT::i16)
+return;
+

Where is vXi16 handle? Is it promoted to vXi32 finally?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:31280
+unsigned Opc = IsSigned ? X86ISD::CVTSI2P : X86ISD::CVTUI2P;
+Results.push_back(DAG.getNode(Opc, dl, MVT::v8f16, Src));
+  }

Isn't the result type changed to v8f16? Why we don't extract sub-vector here?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:49327
+  // UINT_TO_FP(vXi33~63) -> UINT_TO_FP(ZEXT(vXi33~63 to vXi64))
+  if (InVT.isVector() && VT.getVectorElementType() == MVT::f16) {
+unsigned ScalarSize = InVT.getScalarSizeInBits();

Need to check Subtarget.hasFP16() ?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:8194
+  }
+  def : InstAlias(NAME # "Z128rr") VR128X:$dst,

What is the alias used for? Can't it be distinguished from operand?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:8193
+defm Z128 : avx512_vcvt_fp, EVEX_V128;

Why null_frag instead of X86vfpround? 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105265/new/

https://reviews.llvm.org/D105265

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105264: [X86] AVX512FP16 instructions enabling 2/6

2021-08-12 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, but may wait 1 or 2 days for the comments from others.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105264/new/

https://reviews.llvm.org/D105264

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105264: [X86] AVX512FP16 instructions enabling 2/6

2021-08-12 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/lib/Headers/avx512vlfp16intrin.h:368
+_mm256_reduce_add_ph(__m256h __W) {
+  return __builtin_ia32_reduce_fadd_ph256(0.0f16, __W);
+}

From https://llvm.org/docs/LangRef.html#llvm-vector-reduce-add-intrinsic, 
-0.0f16 is better?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105264/new/

https://reviews.llvm.org/D105264

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105265: [X86] AVX512FP16 instructions enabling 3/6

2021-08-12 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/lib/Headers/avx512fp16intrin.h:1748
+
+#define _mm_cvt_roundsh_i32(A, R)  
\
+  (int)__builtin_ia32_vcvtsh2si32((__v8hf)(A), (int)(R))

Does it also return i32 in x86_64 platform? We may unify the intrinsic both for 
x86 and x86_x64 to return i32.



Comment at: clang/lib/Headers/avx512fp16intrin.h:1874
+
+static __inline__ __m512 __DEFAULT_FN_ATTRS512 _mm512_cvtxph_ps(__m256h __A) {
+  return (__m512)__builtin_ia32_vcvtph2psx512_mask(

VCVTPH2PSX support broadcast compared to VCVTPH2PS, but for intrinsics there is 
no difference. Do we need to add the new intrinsics? Ditto for its variants.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105265/new/

https://reviews.llvm.org/D105265

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107946: [X86] Reverse _set_ph and _setr_ph 's set order.

2021-08-12 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

Any test case update?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107946/new/

https://reviews.llvm.org/D107946

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105264: [X86] AVX512FP16 instructions enabling 2/6

2021-08-11 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86InstrFoldTables.cpp:4838
   { X86::VMULSDZrr_Intk,X86::VMULSDZrm_Intk,
TB_NO_REVERSE },
+  { X86::VMULSHZrr_Intk,X86::VMULSHZrm_Intk,
TB_NO_REVERSE },
   { X86::VMULSSZrr_Intk,X86::VMULSSZrm_Intk,
TB_NO_REVERSE },

Is this because intrinsics always assume the arguments are passed in register?



Comment at: llvm/test/CodeGen/X86/avx512fp16-fmaxnum.ll:26
+; CHECK:   # %bb.0:
+; CHECK-NEXT:vmaxph %xmm0, %xmm1, %xmm2 # encoding: 
[0x62,0xf5,0x74,0x08,0x5f,0xd0]
+; CHECK-NEXT:vcmpunordph %xmm0, %xmm0, %k1 # encoding: 
[0x62,0xf3,0x7c,0x08,0xc2,0xc8,0x03]

Is it legal without avx512vl?



Comment at: llvm/test/CodeGen/X86/avx512fp16-fold-load-binops.ll:7
+;  _mm_add_ss(a, _mm_load_ss(b));
+
+define <8 x half> @addsh(<8 x half> %va, half* %pb) {

Any case for max/min?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105264/new/

https://reviews.llvm.org/D105264

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105264: [X86] AVX512FP16 instructions enabling 2/6

2021-08-11 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp:3197
+  else if (PatchedName.endswith("sh"))
+PatchedName = IsVCMP ? "vcmpsh" : "cmpsh";
+  else if (PatchedName.endswith("ph"))

There is no cmpsh?



Comment at: llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp:3199
+  else if (PatchedName.endswith("ph"))
+PatchedName = IsVCMP ? "vcmpph" : "cmpph";
   else

We only have vcmpph?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:1873
   setOperationAction(ISD::SETCC,  VT, Custom);
+  setOperationAction(ISD::STRICT_FSETCC,  VT, Custom);
+  setOperationAction(ISD::STRICT_FSETCCS, VT, Custom);

Is this related to FP16?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:2674
+let Predicates = [HasFP16] in {
+  def : Pat<(v1i1 (X86cmpms(loadf16 addr:$src2), FR16X:$src1,
+  CommutableCMPCC : $cc)),

X86cmpms (



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:2675
+  def : Pat<(v1i1 (X86cmpms(loadf16 addr:$src2), FR16X:$src1,
+  CommutableCMPCC : $cc)),
+   (VCMPSHZrm FR16X:$src1, addr:$src2, imm:$cc)>;

CommutableCMPCC:$cc


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105264/new/

https://reviews.llvm.org/D105264

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105264: [X86] AVX512FP16 instructions enabling 2/6

2021-08-10 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/include/clang/Basic/BuiltinsX86.def:1860
+TARGET_BUILTIN(__builtin_ia32_minph512,  "V32xV32xV32xIi", "ncV:512:", 
"avx512fp16")
+
+TARGET_BUILTIN(__builtin_ia32_minph256,  "V16xV16xV16x", "ncV:256:", 
"avx512fp16,avx512vl")

Why there is no 256 and 128 version for addph, subph, mulph, divph?



Comment at: clang/lib/Headers/avx512fp16intrin.h:312
+   __m128h B) {
+  return __builtin_ia32_vcomish((__v8hf)A, (__v8hf)B, _CMP_NEQ_US,
+_MM_FROUND_CUR_DIRECTION);

_CMP_NEQ_OS?



Comment at: clang/lib/Headers/avx512fp16intrin.h:318
+   __m128h B) {
+  return __builtin_ia32_vcomish((__v8hf)A, (__v8hf)B, _CMP_EQ_OQ,
+_MM_FROUND_CUR_DIRECTION);

Why it is OQ not UQ? Ditto for all other ucomi intrinsics.



Comment at: clang/lib/Headers/avx512fp16intrin.h:516
+  return (__m512h)__builtin_ia32_maxph512((__v32hf)__A, (__v32hf)__B,
+  _MM_FROUND_CUR_DIRECTION);
+}

Why there is rounding control for max/min operation?



Comment at: clang/lib/Headers/avx512fp16intrin.h:669
+  __A = _mm_div_sh(__A, __B);
+  return __builtin_ia32_selectsh_128(__U, __A, __W);
+}

Will it be combined to one instruction? If __B[0] is 0, and mask[0] is 0, there 
is no exception? 



Comment at: clang/lib/Headers/avx512fp16intrin.h:698
+  (__v8hf)__A, (__v8hf)__B, (__v8hf)_mm_setzero_ph(), (__mmask8)-1,
+  _MM_FROUND_CUR_DIRECTION);
+}

Do we have rounding control for min?



Comment at: clang/lib/Headers/avx512fp16intrin.h:757
+
+#define _mm_max_round_sh(A, B, R)  
\
+  (__m128h) __builtin_ia32_maxsh_round_mask(   
\

This name may be misleading, it means suppress exception. Right?



Comment at: clang/lib/Headers/avx512fp16intrin.h:952
 
+#define _mm512_mask_reduce_operator(op)
\
+  __m256h __t1 = (__m256h)_mm512_extractf64x4_pd((__m512d)__W, 0); 
\

It seems there is no mask for reduce operation.



Comment at: clang/lib/Headers/avx512fp16intrin.h:963
+  __m128h __t9 = (__m128h)__builtin_shufflevector((__m128)__t8, (__m128)__t8,  
\
+  1, 0, 2, 3); 
\
+  __m128h __t10 = __t8 op __t9;
\

Not sure if there is any room to optimize. The operation for element 2, 3 is 
unnecessary.



Comment at: clang/lib/Headers/avx512vlfp16intrin.h:366
 
+#define _mm256_mask_reduce_operator(op)
\
+  __m128h __t1 = (__m128h)_mm256_extracti128_si256((__m256i)__W, 0);   
\

Ditto



Comment at: clang/lib/Headers/avx512vlfp16intrin.h:394
+
+#define _mm256_mask_reduce_operator(op)
\
+  __m128h __t1 = (__m128h)_mm256_extracti128_si256((__m256i)__V, 0);   
\

Ditto.



Comment at: clang/test/CodeGen/X86/avx512fp16-builtins.c:639
+  return _mm512_max_round_ph(__A, __B, _MM_FROUND_NO_EXC);
+}
+__m512h test_mm512_mask_max_round_ph(__m512h __W, __mmask32 __U, __m512h __A, 
__m512h __B) {

Need a blank line?



Comment at: clang/test/CodeGen/X86/avx512fp16-builtins.c:645
+  return _mm512_mask_max_round_ph(__W, __U, __A, __B, _MM_FROUND_NO_EXC);
+}
+__m512h test_mm512_maskz_max_round_ph(__mmask32 __U, __m512h __A, __m512h __B) 
{

Ditto.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105264/new/

https://reviews.llvm.org/D105264

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105331: [CFE][X86] Enable complex _Float16.

2021-08-09 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

Would you check the failure of the test cases?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105331/new/

https://reviews.llvm.org/D105331

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-09 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, but may wait 1 or 2 days for the comments from others.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-08 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:3471
+  ContainsFloatAtOffset(IRType, IROffset + 4, getDataLayout()))
+return llvm::FixedVectorType::get(llvm::Type::getHalfTy(getVMContext()), 
4);
+

For 2 float, return <2xfloat> to be compatible to previous ABI?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-06 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4478
+  let Predicates = [HasFP16] in {
+def VMOVSHZrr_REV: AVX512<0x11, MRMDestReg, (outs VR128X:$dst),
+(ins VR128X:$src1, VR128X:$src2),

pengfei wrote:
> craig.topper wrote:
> > pengfei wrote:
> > > LuoYuanke wrote:
> > > > Sorry, I forgot what REV stand for. Do you know it?
> > > > Is this just encoding difference for register operand compared with 
> > > > VMOVSHZrr? What is it used for?
> > > I think REV is short for revert. Which allows a different encoding when 
> > > operands order are reverted.
> > > Yes. It's used for a different encoding.
> > It is short for "reverse". Meaing the operands are in the reversed order. 
> > There are two valid encodings moving from one register to another. This 
> > happens because there are separate opcodes for moving register to 
> > memory(Store) and moving memory to register(load). The memory operand for 
> > both of those opcodes can be a register as well. The assembler and isel 
> > always uses the register to register version of the load opcode. The 
> > reversed version is only used by the disassembler
> > 
> > There is an exception to that. For VEX encoded AVX/AVX2 instructions, 
> > X86MCInstLowering will use an _REV move if it allows a 2 byte VEX prefix 
> > instead of a 3 byte VEX prefix. This doesn't apply to any AVX512 
> > instructions though. 
> Thanks Craig for the information.
> It is short for "reverse". Meaing the operands are in the reversed order. 
> There are two valid encodings moving from one register to another. This 
> happens because there are separate opcodes for moving register to 
> memory(Store) and moving memory to register(load). The memory operand for 
> both of those opcodes can be a register as well. The assembler and isel 
> always uses the register to register version of the load opcode. The reversed 
> version is only used by the disassembler
> 
> There is an exception to that. For VEX encoded AVX/AVX2 instructions, 
> X86MCInstLowering will use an _REV move if it allows a 2 byte VEX prefix 
> instead of a 3 byte VEX prefix. This doesn't apply to any AVX512 instructions 
> though. 

I understand now. Thanks, Craig and Pengfei.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-06 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll:374
+; SSE-NEXT:movl %edi, %ebp
+; SSE-NEXT:movzwl %bx, %edi
 ; SSE-NEXT:callq __gnu_h2f_ieee@PLT

Why this test case changes? Shall we add -mattr=+avx512fp16 to run?



Comment at: llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll:373
+; SSE-NEXT:movl %edi, %ebp
+; SSE-NEXT:movzwl %bx, %edi
 ; SSE-NEXT:callq __gnu_h2f_ieee@PLT

Ditto.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-06 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:82
+  PatFrags ScalarIntMemFrags = !if (!eq (EltTypeName, "f16"),
+   !cast("sse_load_f16"),
+   !if (!eq (EltTypeName, "f32"),

indent



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:3878
+}
+let Predicates = [HasFP16, HasVLX] in {
+  def : Pat<(v16f16 (vselect VK16WM:$mask, (v16f16 VR256X:$src1), (v16f16 
VR256X:$src0))),

Not sure this can be merged to 512 version load/store pattern with muticlass by 
abstract type info.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4159
+defm VMOVSHZ : avx512_move_scalar<"vmovsh", X86Movsh, X86vzload16, f16x_info,
+  [HasFP16]>,
+  VEX_LIG, T_MAP5XS, EVEX_CD8<16, CD8VT1>;

Why there is no OptForSize for vmovsh?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4478
+  let Predicates = [HasFP16] in {
+def VMOVSHZrr_REV: AVX512<0x11, MRMDestReg, (outs VR128X:$dst),
+(ins VR128X:$src1, VR128X:$src2),

Sorry, I forgot what REV stand for. Do you know it?
Is this just encoding difference for register operand compared with VMOVSHZrr? 
What is it used for?



Comment at: llvm/lib/Target/X86/X86RegisterInfo.td:570
 def VR64: RegisterClass<"X86", [x86mmx], 64, (sequence "MM%u", 0, 7)>;
-def VR128 : RegisterClass<"X86", [v4f32, v2f64, v16i8, v8i16, v4i32, v2i64, 
f128],
+def VR128 : RegisterClass<"X86", [v4f32, v2f64, v8f16, v16i8, v8i16, v4i32, 
v2i64, f128],
   128, (add FR32)>;

Given there is only EVEX instructions for fp16, is it necessary to add f16 type 
to it?



Comment at: llvm/lib/Target/X86/X86RegisterInfo.td:572
   128, (add FR32)>;
-def VR256 : RegisterClass<"X86", [v8f32, v4f64, v32i8, v16i16, v8i32, v4i64],
+def VR256 : RegisterClass<"X86", [v8f32, v4f64, v16f16, v32i8, v16i16, v8i32, 
v4i64],
   256, (sequence "YMM%u", 0, 15)>;

Ditto.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-05 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp:801
   //  0b00010: implied 0F 38 leading opcode bytes
   //  0b00011: implied 0F 3A leading opcode bytes
   //  0b00100-0b1: Reserved for future use

Add comments for map5 and map6?



Comment at: llvm/lib/Target/X86/X86.td:189
+// guarded under condition hasVLX. So we imply it in FeatureFP16 currently.
+// FIXME: FP16 conversion between f16 and i64 customise type v8i64, which is
+// supposed to be guarded under condition hasDQI. So we imply it in FeatureFP16

customize?



Comment at: llvm/lib/Target/X86/X86FastISel.cpp:2291
+  case MVT::i16: Opc = X86::CMOV_GR16;  break;
+  case MVT::f16: Opc = X86::CMOV_FR16X; break;
+  case MVT::i32: Opc = X86::CMOV_GR32;  break;

Also add it in isCMOVPseudo()?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:1946
+setGroup(VT);
+  }
+  setOperationAction(ISD::SCALAR_TO_VECTOR,   MVT::v8f16,  Legal);

Drop the brace.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:10549
 
-  if (EltVT == MVT::i32 || EltVT == MVT::f32 || EltVT == MVT::f64 ||
-  (EltVT == MVT::i64 && Subtarget.is64Bit())) {
+  if (EltVT == MVT::i32 || EltVT == MVT::f16 || EltVT == MVT::f32 ||
+  EltVT == MVT::f64 || (EltVT == MVT::i64 && Subtarget.is64Bit()) ||

Need check Subtarget.hasFP16()?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:10551
+  EltVT == MVT::f64 || (EltVT == MVT::i64 && Subtarget.is64Bit()) ||
+  (EltVT == MVT::i16 && Subtarget.hasFP16())) {
 assert((VT.is128BitVector() || VT.is256BitVector() ||

Why handle i16? Isn't it handled by movw?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:10744
   // For SSE 4.1, use insertps to put the high elements into the low element.
-  if (Subtarget.hasSSE41()) {
+  if (Subtarget.hasSSE41() && EltVT != MVT::f16) {
 SDValue Result;

Why exclude f16? Is there better choice for fp16?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:19023
 
-// SHUFPS the element to the lowest double word, then movss.
-int Mask[4] = { static_cast(IdxVal), -1, -1, -1 };
+// SHUFPS the element to the lowest double word, then movsh.
+SmallVector Mask(VecVT.getVectorNumElements(), -1);

movss/movsh


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2021-08-04 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:3405
+/// half member at the specified offset.  For example, {int,{half}} has a
+/// float at offset 4.  It is conservatively correct for this routine to return
+/// false.

float -> half?



Comment at: clang/lib/Headers/avx512fp16intrin.h:292
+
+  return (__m128h)__builtin_ia32_loadsh128_mask((__v8hf *)__A, src, __U & 1);
+}

Just be curious, why not directly use __W?



Comment at: clang/lib/Headers/avx512fp16intrin.h:319
+__m512h_u __v;
+  } __attribute__((__packed__, __may_alias__));
+  return ((const struct __loadu_ph *)__p)->__v;

What is __may_alias__ used for?



Comment at: clang/lib/Headers/avx512fp16intrin.h:350
+   __m128h __A) {
+  __builtin_ia32_storesh128_mask((__v8hf *)__W, __A, __U & 1);
+}

I see in _mm_mask_load_sh(), we create a __m128h with upper bits zero, not sure 
we also need it in store intrinsic.



Comment at: clang/lib/Headers/avx512fp16intrin.h:419
+static __inline__ short __DEFAULT_FN_ATTRS128 _mm_cvtsi128_si16(__m128i __a) {
+  __v8hi __b = (__v8hi)__a;
+  return __b[0];

Why not return __a[0] directly?



Comment at: clang/test/CodeGen/X86/avx512fp16-abi.c:89
+  _Float16 a;
+  float b;
+};

Any false test case that have padding between a and b?



Comment at: llvm/include/llvm/IR/Intrinsics.td:315
 def llvm_v8f16_ty  : LLVMType;//  8 x half (__fp16)
+def llvm_v16f16_ty : LLVMType;   // 16 x half (__fp16)
+def llvm_v32f16_ty : LLVMType;   // 32 x half (__fp16)

Not sure about the legacy comments, should it be _Float16 now?



Comment at: llvm/include/llvm/Target/TargetSelectionDAG.td:1054
+def extloadvf16 : PatFrag<(ops node:$ptr), (extload node:$ptr)> {
+  let IsLoad = 1;
+  let ScalarMemoryVT = f16;

I notice it is true for other extload. Is it same to "true"?



Comment at: llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp:341
 if ((insn->mode == MODE_64BIT || (byte1 & 0xc0) == 0xc0) &&
-((~byte1 & 0xc) == 0xc) && ((byte2 & 0x4) == 0x4)) {
+((~byte1 & 0x8) == 0x8) && ((byte2 & 0x4) == 0x4)) {
   insn->vectorExtensionType = TYPE_EVEX;

This is the same to ((byte1 & 0x8) == 0x0)?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105269: [X86] AVX512FP16 instructions enabling 6/6

2021-07-14 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/test/CodeGen/X86/avx512cfma-intrinsics.ll:3
+; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl -mattr=+avx512bw 
-mattr=+avx512fp16 -mattr=+avx512vl | FileCheck %s
+
+declare <4 x float> @llvm.x86.avx512fp16.mask.vfmaddc.ph.128(<4 x float>, <4 x 
float>, <4 x float>, i8)

Do we miss broadcast test case?



Comment at: llvm/test/CodeGen/X86/avx512cfmul-intrinsics.ll:3
+; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl -mattr=+avx512bw 
-mattr=+avx512fp16 -mattr=+avx512vl | FileCheck %s
+
+declare <4 x float> @llvm.x86.avx512fp16.mask.vfmulc.ph.128(<4 x float>, <4 x 
float>, <4 x float>, i8)

Do we miss broadcast test case?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105269/new/

https://reviews.llvm.org/D105269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99675: [llvm][clang] Create new intrinsic llvm.arithmetic.fence to control FP optimization at expression level

2021-06-23 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, but pls wait for 1 or 2 days to see if there is any more comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99675/new/

https://reviews.llvm.org/D99675

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103784: [X86] Support __tile_stream_loadd intrinsic for new AMX interface

2021-06-09 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM. Thank you!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103784/new/

https://reviews.llvm.org/D103784

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103784: [X86] Support __tile_stream_loadd intrinsic for new AMX interface

2021-06-08 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:124
 
 bool X86FastTileConfig::isTileLoad(MachineInstr ) {
+  return MI.getOpcode() == X86::PTILELOADDV ||

Also add the stream load for X86PreAMXConfig.cpp: isTileLoad().



Comment at: llvm/lib/Target/X86/X86InstrAMX.td:57
+def PTILELOADDT1V : PseudoI<(outs TILE:$dst), (ins GR16:$src1,
+ GR16:$src2,
+ opaquemem:$src3), []>;

indent.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103784/new/

https://reviews.llvm.org/D103784

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99675: [llvm][clang] Create new intrinsic llvm.arith.fence to control FP optimization at expression level

2021-06-03 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

We may add description on the intrinsic in docs/LangRef.rst.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99675/new/

https://reviews.llvm.org/D99675

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D101059: [X86][AMX] Add description for AMX new interface.

2021-04-27 Thread LuoYuanke via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGd6c6db2feaab: [X86][AMX] Add description for AMX new 
interface. (authored by LuoYuanke).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101059/new/

https://reviews.llvm.org/D101059

Files:
  clang/lib/Headers/amxintrin.h

Index: clang/lib/Headers/amxintrin.h
===
--- clang/lib/Headers/amxintrin.h
+++ clang/lib/Headers/amxintrin.h
@@ -30,7 +30,7 @@
 /// config and the tile data, and the tiles are zeroed. Any invalid
 /// configurations will result in #GP fault.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  LDTILECFG  instruction.
 ///
@@ -46,7 +46,7 @@
 /// palette, the number of bytes per row, and the number of rows. If tiles
 /// are not configured, all zeroes will be stored to memory.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  STTILECFG  instruction.
 ///
@@ -60,7 +60,7 @@
 /// Release the tile configuration to return to the init state, which
 /// releases all storage it currently holds.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILERELEASE  instruction.
 static __inline__ void __DEFAULT_FN_ATTRS_TILE _tile_release(void) {
@@ -71,7 +71,7 @@
 /// destination tile "dst" using the tile configuration previously configured
 /// via "_tile_loadconfig".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILELOADD  instruction.
 ///
@@ -91,7 +91,7 @@
 /// that the data will likely not be reused in the near future and the data
 /// caching can be optimized accordingly.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILELOADDT1  instruction.
 ///
@@ -109,7 +109,7 @@
 /// "stride" using the tile configuration previously configured via
 /// "_tile_loadconfig".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILESTORED  instruction.
 ///
@@ -124,7 +124,7 @@
 
 /// Zero the tile specified by "tdest".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILEZERO  instruction.
 ///
@@ -138,7 +138,7 @@
 /// results. Sum these 4 results with the corresponding 32-bit integer in "dst",
 /// and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBSSD  instruction.
 ///
@@ -157,7 +157,7 @@
 /// 32-bit results. Sum these 4 results with the corresponding 32-bit integer
 /// in "dst", and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBSUD  instruction.
 ///
@@ -176,7 +176,7 @@
 /// results. Sum these 4 results with the corresponding 32-bit integer in "dst",
 /// and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBUSD  instruction.
 ///
@@ -195,7 +195,7 @@
 /// 32-bit results. Sum these 4 results with the corresponding 32-bit integer in
 /// "dst", and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBUUD  instruction.
 ///
@@ -213,7 +213,7 @@
 /// elements with elements in "dst", and store the 32-bit result back to tile
 /// "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBF16PS  instruction.
 ///
@@ -226,8 +226,12 @@
 #define _tile_dpbf16ps(dst, src0, src1)\
   __builtin_ia32_tdpbf16ps((dst), (src0), (src1))
 
+/// AMX tile register size can be configured, the maximum size is 16x64=1024
+/// bytes. Since there is no 2D type in llvm IR, we use vector type to
+/// represent 2D tile and the fixed size is maximum amx tile register size.
 typedef int _tile1024i __attribute__((__vector_size__(1024), __aligned__(64)));
 
+/// This is internal intrinsic. C/C++ user should avoid calling it directly.
 static __inline__ _tile1024i __DEFAULT_FN_ATTRS_INT8
 _tile_loadd_internal(unsigned short m, unsigned short n, const void *base,
  __SIZE_TYPE__ stride) {
@@ -235,30 +239,35 @@
  (__SIZE_TYPE__)(stride));
 }
 
+/// This is internal intrinsic. C/C++ user should avoid calling it directly.
 static __inline__ _tile1024i __DEFAULT_FN_ATTRS_INT8
 _tile_dpbssd_internal(unsigned short m, unsigned short n, unsigned short k,
   _tile1024i dst, _tile1024i src1, _tile1024i src2) {
   return __builtin_ia32_tdpbssd_internal(m, n, k, dst, src1, src2);
 }
 
+/// This is internal intrinsic. C/C++ user should avoid calling it directly.
 static __inline__ _tile1024i __DEFAULT_FN_ATTRS_INT8
 _tile_dpbsud_internal(unsigned short m,

[PATCH] D101059: [X86][AMX] Add description for AMX new interface.

2021-04-22 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke updated this revision to Diff 339597.
LuoYuanke added a comment.

Fix some descriptions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101059/new/

https://reviews.llvm.org/D101059

Files:
  clang/lib/Headers/amxintrin.h

Index: clang/lib/Headers/amxintrin.h
===
--- clang/lib/Headers/amxintrin.h
+++ clang/lib/Headers/amxintrin.h
@@ -30,7 +30,7 @@
 /// config and the tile data, and the tiles are zeroed. Any invalid
 /// configurations will result in #GP fault.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  LDTILECFG  instruction.
 ///
@@ -46,7 +46,7 @@
 /// palette, the number of bytes per row, and the number of rows. If tiles
 /// are not configured, all zeroes will be stored to memory.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  STTILECFG  instruction.
 ///
@@ -60,7 +60,7 @@
 /// Release the tile configuration to return to the init state, which
 /// releases all storage it currently holds.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILERELEASE  instruction.
 static __inline__ void __DEFAULT_FN_ATTRS_TILE _tile_release(void) {
@@ -71,7 +71,7 @@
 /// destination tile "dst" using the tile configuration previously configured
 /// via "_tile_loadconfig".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILELOADD  instruction.
 ///
@@ -91,7 +91,7 @@
 /// that the data will likely not be reused in the near future and the data
 /// caching can be optimized accordingly.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILELOADDT1  instruction.
 ///
@@ -109,7 +109,7 @@
 /// "stride" using the tile configuration previously configured via
 /// "_tile_loadconfig".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILESTORED  instruction.
 ///
@@ -124,7 +124,7 @@
 
 /// Zero the tile specified by "tdest".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILEZERO  instruction.
 ///
@@ -138,7 +138,7 @@
 /// results. Sum these 4 results with the corresponding 32-bit integer in "dst",
 /// and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBSSD  instruction.
 ///
@@ -157,7 +157,7 @@
 /// 32-bit results. Sum these 4 results with the corresponding 32-bit integer
 /// in "dst", and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBSUD  instruction.
 ///
@@ -176,7 +176,7 @@
 /// results. Sum these 4 results with the corresponding 32-bit integer in "dst",
 /// and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBUSD  instruction.
 ///
@@ -195,7 +195,7 @@
 /// 32-bit results. Sum these 4 results with the corresponding 32-bit integer in
 /// "dst", and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBUUD  instruction.
 ///
@@ -213,7 +213,7 @@
 /// elements with elements in "dst", and store the 32-bit result back to tile
 /// "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBF16PS  instruction.
 ///
@@ -226,8 +226,12 @@
 #define _tile_dpbf16ps(dst, src0, src1)\
   __builtin_ia32_tdpbf16ps((dst), (src0), (src1))
 
+/// AMX tile register size can be configured, the maximum size is 16x64=1024
+/// bytes. Since there is no 2D type in llvm IR, we use vector type to
+/// represent 2D tile and the fixed size is maximum amx tile register size.
 typedef int _tile1024i __attribute__((__vector_size__(1024), __aligned__(64)));
 
+/// This is internal intrinsic. C/C++ user should avoid calling it directly.
 static __inline__ _tile1024i __DEFAULT_FN_ATTRS_INT8
 _tile_loadd_internal(unsigned short m, unsigned short n, const void *base,
  __SIZE_TYPE__ stride) {
@@ -235,30 +239,35 @@
  (__SIZE_TYPE__)(stride));
 }
 
+/// This is internal intrinsic. C/C++ user should avoid calling it directly.
 static __inline__ _tile1024i __DEFAULT_FN_ATTRS_INT8
 _tile_dpbssd_internal(unsigned short m, unsigned short n, unsigned short k,
   _tile1024i dst, _tile1024i src1, _tile1024i src2) {
   return __builtin_ia32_tdpbssd_internal(m, n, k, dst, src1, src2);
 }
 
+/// This is internal intrinsic. C/C++ user should avoid calling it directly.
 static __inline__ _tile1024i __DEFAULT_FN_ATTRS_INT8
 _tile_dpbsud_internal(unsigned short m, unsigned short n, unsigned short k,
   _tile1024i dst, _tile1024i src1, _tile1024i src2) {
   return

[PATCH] D101059: [X86][AMX] Add description for AMX new interface.

2021-04-22 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke created this revision.
LuoYuanke requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D101059

Files:
  clang/lib/Headers/amxintrin.h

Index: clang/lib/Headers/amxintrin.h
===
--- clang/lib/Headers/amxintrin.h
+++ clang/lib/Headers/amxintrin.h
@@ -30,7 +30,7 @@
 /// config and the tile data, and the tiles are zeroed. Any invalid
 /// configurations will result in #GP fault.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  LDTILECFG  instruction.
 ///
@@ -46,7 +46,7 @@
 /// palette, the number of bytes per row, and the number of rows. If tiles
 /// are not configured, all zeroes will be stored to memory.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  STTILECFG  instruction.
 ///
@@ -60,7 +60,7 @@
 /// Release the tile configuration to return to the init state, which
 /// releases all storage it currently holds.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILERELEASE  instruction.
 static __inline__ void __DEFAULT_FN_ATTRS_TILE _tile_release(void) {
@@ -71,7 +71,7 @@
 /// destination tile "dst" using the tile configuration previously configured
 /// via "_tile_loadconfig".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILELOADD  instruction.
 ///
@@ -91,7 +91,7 @@
 /// that the data will likely not be reused in the near future and the data
 /// caching can be optimized accordingly.
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILELOADDT1  instruction.
 ///
@@ -109,7 +109,7 @@
 /// "stride" using the tile configuration previously configured via
 /// "_tile_loadconfig".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILESTORED  instruction.
 ///
@@ -124,7 +124,7 @@
 
 /// Zero the tile specified by "tdest".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TILEZERO  instruction.
 ///
@@ -138,7 +138,7 @@
 /// results. Sum these 4 results with the corresponding 32-bit integer in "dst",
 /// and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBSSD  instruction.
 ///
@@ -157,7 +157,7 @@
 /// 32-bit results. Sum these 4 results with the corresponding 32-bit integer
 /// in "dst", and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBSUD  instruction.
 ///
@@ -176,7 +176,7 @@
 /// results. Sum these 4 results with the corresponding 32-bit integer in "dst",
 /// and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBUSD  instruction.
 ///
@@ -195,7 +195,7 @@
 /// 32-bit results. Sum these 4 results with the corresponding 32-bit integer in
 /// "dst", and store the 32-bit result back to tile "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBUUD  instruction.
 ///
@@ -213,7 +213,7 @@
 /// elements with elements in "dst", and store the 32-bit result back to tile
 /// "dst".
 ///
-/// \headerfile 
+/// \headerfile 
 ///
 /// This intrinsic corresponds to the  TDPBF16PS  instruction.
 ///
@@ -226,8 +226,12 @@
 #define _tile_dpbf16ps(dst, src0, src1)\
   __builtin_ia32_tdpbf16ps((dst), (src0), (src1))
 
+/// AMX tile register size can be configured, the maximum size is 16x64=1024
+/// bytes. Since there is no 2D type in llvm IR, we use vector type to
+/// represent 2D tile and the fixed size is maximum amx tile register size.
 typedef int _tile1024i __attribute__((__vector_size__(1024), __aligned__(64)));
 
+/// This is internal intrinsic. C/C++ user should avoid calling it directly.
 static __inline__ _tile1024i __DEFAULT_FN_ATTRS_INT8
 _tile_loadd_internal(unsigned short m, unsigned short n, const void *base,
  __SIZE_TYPE__ stride) {
@@ -235,30 +239,35 @@
  (__SIZE_TYPE__)(stride));
 }
 
+/// This is internal intrinsic. C/C++ user should avoid calling it directly.
 static __inline__ _tile1024i __DEFAULT_FN_ATTRS_INT8
 _tile_dpbssd_internal(unsigned short m, unsigned short n, unsigned short k,
   _tile1024i dst, _tile1024i src1, _tile1024i src2) {
   return __builtin_ia32_tdpbssd_internal(m, n, k, dst, src1, src2);
 }
 
+/// This is internal intrinsic. C/C++ user should avoid calling it directly.
 static __inline__ _tile1024i __DEFAULT_FN_ATTRS_INT8
 _tile_dpbsud_internal(unsigned short m, unsigned short n, unsigned short k,
   _tile1024i dst, _tile1024i src1, _tile1024i src2) {
   return __builtin_ia32_tdpbsud_internal(m, n, k,

[PATCH] D99708: [X86] Enable compilation of user interrupt handlers.

2021-04-12 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM. But wait one or two days to see if there is more comments from Craig and 
HJ.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99708/new/

https://reviews.llvm.org/D99708

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99708: [X86] Enable compilation of user interrupt handlers.

2021-04-06 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

LGMT. Thank you!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99708/new/

https://reviews.llvm.org/D99708

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99708: [X86] Enable compilation of user interrupt handlers.

2021-04-01 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

In D99708#2663989 , @craig.topper 
wrote:

> A user interrupt is different than a regular interrupt right? It doesn't make 
> sense that we would change the behavior of the interrupt calling convention 
> just because the the user interrupt instructions are enabled. That would 
> occur just from passing a -march for a newer CPU wouldn't it?

Maybe need support another attribute "__attribute__ ((user_interrupt))" for 
functions?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99708/new/

https://reviews.llvm.org/D99708

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99152: [AMX] Prototype for vector and amx bitcast.

2021-03-31 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

> Unfortunately this is not possible to use an opaque type with the AMX 
> intrinsics at the moment, because of the way they are define. It is possible 
> to use opaque types with intrinsics in general though, e.g. see 
> https://llvm.godbolt.org/z/Ezhf6535c
>
> My point is, you should be able to adjust the definitions of the AMX 
> intrinsics and then just replace all occurrences of `x86_amx` in your 
> examples with a opaque type you define in the module. But as I said 
> initially, you don't need to do everything at once (and you probably 
> shouldn't). I'd start with addressing the bitcast issue and tackle the 
> `x86_amx` type itself once that is done.
>
> (And I am also not saying that it definitely needs to be removed, only that 
> if it should be kept in the long run, it would be good to specify it in the 
> LangRef and should have a good justification, especially if there are no 
> instructions that do anything meaningful with values of the type other than 
> take it as arguments and return values. Opaque types are a suggestion for an 
> alternative that *may* be viable without a dedicated first-class type)

Thank you for the suggestion. So here is my plan.

1. specify x86_amx in LangRef.
2. Add llvm.x86.tile.cast intrinsic.
3. Optimize some of llvm.x86.tile.cast code as bitcast does, and transform 
llvm.x86.tile.cast to amx intrinsic if it can't be eliminated.
4. After the above 3 items are finished, replace bitcast with 
llvm.x86.tile.cast in front-end when generate IR for amx builtin.
5. After some time for stabilization, remove bitcast transform code from LLVM.
6. After all of the llvm.x86.tile.cast work is finished, let's discuss about 
opaque type.

Does that looks good to you?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99152/new/

https://reviews.llvm.org/D99152

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99152: [AMX] Prototype for vector and amx bitcast.

2021-03-29 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

> Whether to further optimizations are correct is a different problem, but we 
> need a specification for the builtins, intrinsics and the type before going 
> any further in that direction.
>
> I think you need to set the input to `LLVM IR`: 
> https://gcc.godbolt.org/z/WexMjsas9
>
> You should be able to use opaque types with overloaded intrinsics. I don't 
> think you define an intrinsic to take a specific opaque type (because it's 
> not known up front).

The opaque type (https://llvm.org/docs/LangRef.html#opaque-structure-types) is 
pretty new to me. I didn't find any example for the opaque type in builtins or 
intrinsics. I am appreciated if you would write an example code (maybe 
tilezero) for the builtins, intrinsics and the type, so that I can understand 
it well. If we use <256 x i32> in builtins and use x86_amx in intrinsics, and 
have an specific intrinsics to covert x86_amx to flat vector <256 x i32>, I am 
able to do it by myself.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99152/new/

https://reviews.llvm.org/D99152

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99152: [AMX] Prototype for vector and amx bitcast.

2021-03-29 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

> I think that point was not really clear during the discussion. Using `load 
> <256 x i32>` to lower `__tile_loadd() ` would indeed be incorrect. But I 
> don't think that's happening at the moment, at least going from a simple 
> example https://gcc.godbolt.org/z/KT5rczn8j

The `load/store <256 x i32>` is generated by front-end, because in C language 
tile is a vector <256 x i32>. The `load/store <256 x i32>` is transformed to 
`llvm.x86.tileloadd64.internal/llvm.x86.tilestored64.internal` in 
`lib/Target/X86/X86LowerAMXType.cpp` if the load result is to be an operand of 
amx intrinsics or the store value is returned from amx intrinsics.

>   void foo() {
> tilea = __builtin_ia32_tileloadd64_internal(16, 64, buf, 64);
>   }
>
> is lowered to
>
>   define dso_local void @foo() #0 {
> %1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* 
> getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 64)
> %2 = bitcast x86_amx %1 to <256 x i32>
> store <256 x i32> %2, <256 x i32>* @tilea, align 64
> ret void
>   }
>
> So we emit an intrinsic to do the strided load and the result is stored to 
> continuous memory, which is what the type `_tile1024i` requires. What's not 
> modeled correctly is the conversion between the result of 
> `@llvm.x86.tileloadd64.internal` and the `store`. It needs to be transferred 
> in a flat vector.

Yes.  I agree that it needs to be transferred in a flat vector.

> Whether we should have `x86_amx` in the first place is a separate question I 
> think. Having a builtin type that does not work properly with fundamental 
> instructions like `load` or `store` seems prone for errors (what instructions 
> actually work with `x86_amx`? Do binary operators work?). Perhaps it would be 
> possible and sufficient to have the intrinsics use an opaque type instead of 
> a builtin type, like

We only support tileload, tilestore, tilezero, tiletdp (dot product) 
instructions/intrinsics for `x86_amx`. Is there any opaque type example llvm 
source code for builtin? This example has some error at 
https://gcc.godbolt.org/z/ar6WhjTMz.

>   %my_x86_amx = type opaque
>   
>   define %my_x86_amx @foo(%my_x86_amx %x) {
> ret %my_x86_amx %x
>   }
>
> But I think we should address those 2 issues separately and fix the biggest 
> problem (mis-use of `bitcast`) first, perhaps followed up by verifier rules 
> rejecting `x86_amx` from un-suitable instructions and go from there.

I may further implement this patch and transform/eliminate 
@llvm.x86.vector.amx.cast in `lib/Target/X86/X86LowerAMXType.cpp` which is 
before codegen. There is some effort to implement it, but I'd like to take a 
try.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99152/new/

https://reviews.llvm.org/D99152

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99152: [AMX] Prototype for vector and amx bitcast.

2021-03-24 Thread LuoYuanke via Phabricator via cfe-commits

LuoYuanke added a comment.

In D99152#2647681 , @fhahn wrote:

> I can't see any `load <256 x i32>` in the linked example, just a store. Could 
> you check the example?

I create another example at https://gcc.godbolt.org/z/v6od5ceEz. In bar() 
function, you can see the `load <256 x i32>*` in the IR.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99152/new/

https://reviews.llvm.org/D99152

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

1 2 >

1 - 100 of 198 matches

Mail list logo