[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-07 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

Thanks @clementval for reporting it and the reproducer. Put a patch D129294 
 to address it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-07 Thread Valentin Clement via Phabricator via cfe-commits
clementval added a comment.

@pengfei We are also hitting the following assertion with this patch. Do you 
have any idea why?

  /llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:4333: void 
{anonymous}::SelectionDAGLegalize::ConvertNodeToLibcall(llvm::SDNode*): 
Assertion `cast(Node->getOperand(IsStrict ? 2 : 1))->isZero() 
&& "Unable to expand as libcall if it is not normal rounding"' failed.

LLVM IR triggering the assertion.

  ; ModuleID = 'FIRModule'
  source_filename = "FIRModule"
  target triple = "x86_64-unknown-linux-gnu"
  
  @_QMhp237Ea11 = global half 0xH3D00
  @_QMhp237Eb1a = global half 0xH5640
  @_QMf90_kindECascii = external constant i32
  @_QMf90_kindECbyte = external constant i32
  @_QMf90_kindECdouble = external constant i32
  @_QMiso_fortran_envECint16 = external constant i32
  @_QMiso_fortran_envECint32 = external constant i32
  @_QMiso_fortran_envECint64 = external constant i32
  @_QMiso_fortran_envECint8 = external constant i32
  @_QMf90_kindECjis = external constant i32
  @_QMiso_fortran_envEClogical16 = external constant i32
  @_QMiso_fortran_envEClogical32 = external constant i32
  @_QMiso_fortran_envEClogical64 = external constant i32
  @_QMiso_fortran_envEClogical8 = external constant i32
  @_QMf90_kindECnot_available = external constant i32
  @_QMf90_kindECquad = external constant i32
  @_QMiso_fortran_envECreal128 = external constant i32
  @_QMf90_kindECreal16 = external constant i32
  @_QMiso_fortran_envECreal32 = external constant i32
  @_QMiso_fortran_envECreal64 = external constant i32
  @_QMf90_kindECreal64x2 = external constant i32
  @_QMf90_kindECsingle = external constant i32
  @_QMf90_kindECtwobyte = external constant i32
  @_QMf90_kindECucs2 = external constant i32
  @_QMf90_kindECucs4 = external constant i32
  @_QMf90_kindECword = external constant i32
  @_QQcl.2E2F627567312E66393000 = linkonce constant [11 x i8] c"./bug1.f90\00"
  @_QQcl.2831362C313629 = linkonce constant [7 x i8] c"(16,16)"
  @_QQcl.28346631302E3329 = linkonce constant [8 x i8] c"(4f10.3)"
  
  declare ptr @malloc(i64)
  
  declare void @free(ptr)
  
  define void @_QQmain() !dbg !3 {
%1 = alloca { ptr, i64, i32, i8, i8, i8, i8 }, align 8, !dbg !7
%2 = alloca half, i64 1, align 2, !dbg !9
%3 = call ptr @_FortranAioBeginExternalListOutput(i32 -1, ptr 
@_QQcl.2E2F627567312E66393000, i32 9), !dbg !10
%4 = call i1 @_FortranAioOutputAscii(ptr %3, ptr @_QQcl.2831362C313629, i64 
7), !dbg !11
%5 = call i32 @_FortranAioEndIoStatement(ptr %3), !dbg !12
%6 = call ptr @_FortranAioBeginExternalFormattedOutput(ptr 
@_QQcl.28346631302E3329, i64 8, i32 -1, ptr @_QQcl.2E2F627567312E66393000, i32 
10), !dbg !13
%7 = load half, ptr @_QMhp237Ea11, align 2, !dbg !14
%8 = load half, ptr @_QMhp237Eb1a, align 2, !dbg !15
%9 = fpext half %7 to float, !dbg !16
%10 = fpext half %8 to float, !dbg !17
%11 = call float @llvm.copysign.f32(float %9, float %10), !dbg !18
%12 = fptrunc float %11 to half, !dbg !19
store half %12, ptr %2, align 2, !dbg !20
%13 = insertvalue { ptr, i64, i32, i8, i8, i8, i8 } { ptr undef, i64 2, i32 
20180515, i8 0, i8 25, i8 0, i8 0 }, ptr %2, 0, !dbg !7
store { ptr, i64, i32, i8, i8, i8, i8 } %13, ptr %1, align 8, !dbg !7
%14 = call i1 @_FortranAioOutputDescriptor(ptr %6, ptr %1), !dbg !21
%15 = call i32 @_FortranAioEndIoStatement(ptr %6), !dbg !22
ret void, !dbg !23
  }
  
  declare ptr @_FortranAioBeginExternalListOutput(i32, ptr, i32)
  
  declare i1 @_FortranAioOutputAscii(ptr, ptr, i64)
  
  declare i32 @_FortranAioEndIoStatement(ptr)
  
  declare ptr @_FortranAioBeginExternalFormattedOutput(ptr, i64, i32, ptr, i32)
  
  declare i1 @_FortranAioOutputDescriptor(ptr, ptr)
  
  ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable 
willreturn
  declare float @llvm.copysign.f32(float, float) #0
  
  attributes #0 = { nocallback nofree nosync nounwind readnone speculatable 
willreturn }
  
  !llvm.dbg.cu = !{!0}
  !llvm.module.flags = !{!2}
  
  !0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "mlir", 
isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
  !1 = !DIFile(filename: "FIRModule", directory: "/")
  !2 = !{i32 2, !"Debug Info Version", i32 3}
  !3 = distinct !DISubprogram(name: "_QQmain", linkageName: "_QQmain", scope: 
null, file: !4, line: 9, type: !5, scopeLine: 9, spFlags: DISPFlagDefinition | 
DISPFlagOptimized, unit: !0, retainedNodes: !6)
  !4 = !DIFile(filename: "", directory: 
"/local/home/vclement/llvm-project/build")
  !5 = !DISubroutineType(types: !6)
  !6 = !{}
  !7 = !DILocation(line: 39, column: 9, scope: !8)
  !8 = !DILexicalBlockFile(scope: !3, file: !4, discriminator: 0)
  !9 = !DILocation(line: 10, column: 8, scope: !8)
  !10 = !DILocation(line: 17, column: 8, scope: !8)
  !11 = !DILocation(line: 22, column: 8, scope: !8)
  !12 = !DILocation(line: 23, column: 9, scope: !8)
  !13 = !DILocation(line: 31, column: 9, scope: !8)
  !14 = 

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-06 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

Thanks for confirming it! I don't have much experience in compiler-rt. But I 
think the version of clang matters much to compiler-rt particular in ABI 
changing cases like this :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-06 Thread Jean Perier via Phabricator via cfe-commits
jeanPerier added a comment.

In D107082#3632301 , @pengfei wrote:

> Hi @jeanPerier , yes, you are right. This patch changes the calling 
> conversion of fp16 from GPRs to XMMs. So you need to update the runtime. If 
> you are using compiler-rt, you could simply re-build it with trunk code, or 
> at least after rGabeeae57 
> . If you 
> are using your own runtime, you can solve the problem through the way in 
> https://github.com/llvm/llvm-project/issues/56156

Thanks for the quick reply.  I was using a compiler-rt from the trunk source 
but not building it with a clang compiler compiled from the trunk. I did not 
know the version of clang used to compiled compiler-rt mattered that much. 
Using clang from the trunk (or at least after the commit you mentionnned) 
solved my problem. Thanks !


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-06 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

Hi @jeanPerier , yes, you are right. This patch changes the calling conversion 
of fp16 from GPRs to XMMs. So you need to update the runtime. If you are using 
compiler-rt, you could simply re-build it with trunk code, or at least after 
rGabeeae57 
. If you 
are using your own runtime, you can solve the problem through the way in 
https://github.com/llvm/llvm-project/issues/56156


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-06 Thread Jean Perier via Phabricator via cfe-commits
jeanPerier added a comment.

Hi @pengfei, I am working on flang, and after this patch, we started to see 
some bugs in Fortran programs using REAL(2) (which is fp16 in flang). I am not 
an expert in LLVM codegen and the builtins, but I am wondering if there is not 
issue with how llvm codegen thinks `__truncsfhf2` returns its value and how the 
runtime actually does return it.

Here is an llvm IR reproducer for a bug we saw:

  define void @bug(ptr %addr, i32 %i) {
%1 = sitofp i32 %i to half
store half %1, ptr %addr, align 2
ret void
  }

After this patch the generated assembly on X86 is:

  bug:# @bug
  pushrbx
  mov rbx, rdi
  cvtsi2ssxmm0, esi
  call__truncsfhf2@PLT
  pextrw  eax, xmm0, 0
  mov word ptr [rbx], ax
  pop rbx
  ret

When running this from a C program to test integers are casted to floats, I am 
only seeing the bytes of the passed address being set to zero (regardless of 
the input). It seems to me that there is an issue around the `__truncsfhf2` 
interface. The `pextrw  eax, xmm0, 0` after the call seems to suggest LLVM 
codegen is looking for the result in xmm0 register, but it seems that 
`__truncsfhf2` is only returning it in eax.

Do you have any idea what could be the issue ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-04 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

In D107082#3628120 , @sylvestre.ledru 
wrote:

> @pengfei I am not convinced it is an issue on my side. I don't have anything 
> particular in this area and using a stage2 build system.
>
> Anyway, this patch fixes the issue on my side:
> https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/snapshot/debian/patches/force-sse2-compiler-rt.diff

I don't have much experience in compiler-rt and multi stage build. So I may be 
wrong. It looks to me like an existing problem just exposed by this patch. The 
diff is another proof.
The build command tells us it's a 32-bit build. But the change for `x86_64` 
solves it, which confirms my previous guess: You are using one configure for 
CMake (probobally 64 bit) but build for 32 bit target.
Although the diff works, it doesn't look a clean solution to me. But I don't 
have better suggestion either.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-04 Thread Sylvestre Ledru via Phabricator via cfe-commits
sylvestre.ledru added a comment.

@pengfei I am not convinced it is an issue on my side. I don't have anything 
particular in this area and using a stage2 build system.

Anyway, this patch fixes the issue on my side:
https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/snapshot/debian/patches/force-sse2-compiler-rt.diff


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-02 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

FYI, `COMPILER_RT_HAS_FLOAT16` is set according to 
https://github.com/llvm/llvm-project/blob/main/compiler-rt/cmake/builtin-config-ix.cmake#L25-L31
 and 
https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/CMakeLists.txt#L699


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-02 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

In D107082#3626632 , @sylvestre.ledru 
wrote:

> Same as in https://reviews.llvm.org/D114099
> It breaks the build on ubuntu bionic, Hirsute, etc on amd64:
>
>   
> "/build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/build-llvm/./bin/clang"
>  --target=x86_64-pc-linux-gnu -DVISIBILITY_HIDDEN  -fstack-protector-strong 
> -Wformat -Werror=format-security -Wno-unused-command-line-argument 
> -Wdate-time -D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -m32 -DCOMPILER_RT_HAS_FLOAT16 
> -std=c11 -fPIC -fno-builtin -fvisibility=hidden -fomit-frame-pointer -MD -MT 
> CMakeFiles/clang_rt.builtins-i386.dir/extendhfsf2.c.o -MF 
> CMakeFiles/clang_rt.builtins-i386.dir/extendhfsf2.c.o.d -o 
> CMakeFiles/clang_rt.builtins-i386.dir/extendhfsf2.c.o -c 
> '/build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/extendhfsf2.c'
>   In file included from 
> /build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/extendhfsf2.c:11:
>   In file included from 
> /build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/fp_extend_impl.inc:38:
>   
> /build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/fp_extend.h:44:9:
>  error: _Float16 is not supported on this target
>   typedef _Float16 src_t;
>   ^
>   1 error generated.

Hi @sylvestre.ledru , thanks for reporting this issue.

It looks to me a configuration (or option mismatch) problem in compiler-rt. We 
support the `_Float16` type on targets that have SSE2 and/or up features. A 
32-bit target doesn't enable SSE2 feature by default. This should be fine 
because the cmake of compiler-rt will detect the buildable of `_Float16` first 
and set `COMPILER_RT_HAS_FLOAT16` accordingly. So this issue looks to me it 
passed the detection of `_Float16` with a SSE2 enabled option but built the 
compiler-rt with a different option (SSE2 disabled).

I'd suggest to add an extra `-msse2` when build it if possible. Otherwise, 
don't let `-DCOMPILER_RT_HAS_FLOAT16` been passed here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-02 Thread Sylvestre Ledru via Phabricator via cfe-commits
sylvestre.ledru added a comment.

Same as in https://reviews.llvm.org/D114099
It breaks the build on ubuntu bionic, Hirsute, etc on amd64:

  
"/build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/build-llvm/./bin/clang"
 --target=x86_64-pc-linux-gnu -DVISIBILITY_HIDDEN  -fstack-protector-strong 
-Wformat -Werror=format-security -Wno-unused-command-line-argument -Wdate-time 
-D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -m32 -DCOMPILER_RT_HAS_FLOAT16 -std=c11 -fPIC 
-fno-builtin -fvisibility=hidden -fomit-frame-pointer -MD -MT 
CMakeFiles/clang_rt.builtins-i386.dir/extendhfsf2.c.o -MF 
CMakeFiles/clang_rt.builtins-i386.dir/extendhfsf2.c.o.d -o 
CMakeFiles/clang_rt.builtins-i386.dir/extendhfsf2.c.o -c 
'/build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/extendhfsf2.c'
  In file included from 
/build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/extendhfsf2.c:11:
  In file included from 
/build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/fp_extend_impl.inc:38:
  
/build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/fp_extend.h:44:9:
 error: _Float16 is not supported on this target
  typedef _Float16 src_t;
  ^
  1 error generated.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-24 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

I'll take care next time. Thanks @MaskRay !


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-24 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added a comment.

In addition, don't use `Reland "Reland "Reland "Reland ...` One `Reland` is 
sufficient.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-24 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added a comment.

Please include `Differential Revision: ` line for reland commits as well so 
that people know that this patch has a reland.
https://github.com/llvm/llvm-project/issues/56204 is related to 
655ba9c8a1d22075443711cc749f0b032e07adee 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-16 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/test/CodeGen/X86/fpclamptosat_vec.ll:605
+; CHECK-NEXT:.cfi_def_cfa_offset 80
+; CHECK-NEXT:movss %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT:movss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill

pengfei wrote:
> pengfei wrote:
> > LuoYuanke wrote:
> > > Is the vector <4 x half> split to 4 scalar and pass by xmm? What's the 
> > > ABI for vector half? Is there any case that test the scenario that run 
> > > out of register and pass parameter through stack?
> > Good question! Previously, I discussed with GCC folks we won't support 
> > vector in emulation. I expected the FE with pass whole vector through 
> > stack. So a vector in IR is illegal to ABI and can be splited.
> > But seems GCC passes it by vector register. https://godbolt.org/z/a67rMhTW6
> > I'll double confirm with GCC folks.
> Discussed with GCC folks today. We should support the vector ABI. But we have 
> to adding more patterns to support load/store etc. operations for vector 
> type. I'd like to address this as a follow up.
Addressed by D127982.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-12 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

In D107082#3576355 , @mehdi_amini 
wrote:

> This broke the bot here: 
> https://lab.llvm.org/buildbot/#/builders/61/builds/27616
>
> The cmake invocation includes some GPU specific options that you can omit 
> (`-DMLIR_ENABLE_CUDA_RUNNER=1` , 
> `-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc`, 
> `-DMLIR_ENABLE_VULKAN_RUNNER=1`, `-DMLIR_RUN_CUDA_TENSOR_CORE_TESTS=ON`), 
> which should leave out:
>
>   cmake ../llvm.src/llvm -DLLVM_BUILD_EXAMPLES=ON 
> '-DLLVM_TARGETS_TO_BUILD=host;NVPTX' -DLLVM_ENABLE_PROJECTS=mlir  
> -DMLIR_INCLUDE_INTEGRATION_TESTS=ON  -DBUILD_SHARED_LIBS=ON 
> -DLLVM_CCACHE_BUILD=ON -DMLIR_ENABLE_BINDINGS_PYTHON=ON  
> -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON '-DLLVM_LIT_ARGS=-v 
> -vv' -GNinja
>
> You can probably leave out other options too:
>
>   cmake ../llvm.src/llvm '-DLLVM_TARGETS_TO_BUILD=host' 
> -DLLVM_ENABLE_PROJECTS=mlir  -DMLIR_INCLUDE_INTEGRATION_TESTS=ON 
> -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON '-DLLVM_LIT_ARGS=-v 
> -vv' -GNinja

@mehdi_amini Thanks for the commands, I can reproduce it on my local now. Will 
look into it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-12 Thread Mehdi AMINI via Phabricator via cfe-commits
mehdi_amini added a comment.

This broke the bot here: 
https://lab.llvm.org/buildbot/#/builders/61/builds/27616

The cmake invocation includes some GPU specific options that you can omit 
(`-DMLIR_ENABLE_CUDA_RUNNER=1` , 
`-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc`, 
`-DMLIR_ENABLE_VULKAN_RUNNER=1`, `-DMLIR_RUN_CUDA_TENSOR_CORE_TESTS=ON`), which 
should leave out:

  cmake ../llvm.src/llvm -DLLVM_BUILD_EXAMPLES=ON 
'-DLLVM_TARGETS_TO_BUILD=host;NVPTX' -DLLVM_ENABLE_PROJECTS=mlir  
-DMLIR_INCLUDE_INTEGRATION_TESTS=ON  -DBUILD_SHARED_LIBS=ON 
-DLLVM_CCACHE_BUILD=ON -DMLIR_ENABLE_BINDINGS_PYTHON=ON  
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON '-DLLVM_LIT_ARGS=-v -vv' 
-GNinja

*y


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-11 Thread Phoebe Wang via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG2d2da259c872: [X86][RFC] Enable `_Float16` type support on 
X86 following the psABI (authored by pengfei).

Changed prior to commit:
  https://reviews.llvm.org/D107082?vs=435583=436190#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

Files:
  llvm/docs/ReleaseNotes.rst
  llvm/lib/Target/X86/X86FastISel.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86ISelLowering.h
  llvm/lib/Target/X86/X86InstrAVX512.td
  llvm/lib/Target/X86/X86InstrCompiler.td
  llvm/lib/Target/X86/X86InstrInfo.cpp
  llvm/lib/Target/X86/X86InstrSSE.td
  llvm/lib/Target/X86/X86InstrVecCompiler.td
  llvm/lib/Target/X86/X86InstructionSelector.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/Analysis/CostModel/X86/fptoi_sat.ll
  llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir
  llvm/test/CodeGen/X86/atomic-non-integer.ll
  llvm/test/CodeGen/X86/avx512-insert-extract.ll
  llvm/test/CodeGen/X86/avx512-masked_memop-16-8.ll
  llvm/test/CodeGen/X86/avx512fp16-fp-logic.ll
  llvm/test/CodeGen/X86/callbr-asm-bb-exports.ll
  llvm/test/CodeGen/X86/cvt16-2.ll
  llvm/test/CodeGen/X86/cvt16.ll
  llvm/test/CodeGen/X86/fastmath-float-half-conversion.ll
  llvm/test/CodeGen/X86/fmf-flags.ll
  llvm/test/CodeGen/X86/fp-round.ll
  llvm/test/CodeGen/X86/fp-roundeven.ll
  llvm/test/CodeGen/X86/fp128-cast-strict.ll
  llvm/test/CodeGen/X86/fpclamptosat.ll
  llvm/test/CodeGen/X86/fpclamptosat_vec.ll
  llvm/test/CodeGen/X86/fptosi-sat-scalar.ll
  llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
  llvm/test/CodeGen/X86/fptoui-sat-scalar.ll
  llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
  llvm/test/CodeGen/X86/freeze.ll
  llvm/test/CodeGen/X86/frem.ll
  llvm/test/CodeGen/X86/half-constrained.ll
  llvm/test/CodeGen/X86/half.ll
  llvm/test/CodeGen/X86/pr31088.ll
  llvm/test/CodeGen/X86/pr38533.ll
  llvm/test/CodeGen/X86/pr47000.ll
  llvm/test/CodeGen/X86/scheduler-asm-moves.mir
  llvm/test/CodeGen/X86/shuffle-extract-subvector.ll
  llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16-fma.ll
  llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16.ll
  llvm/test/CodeGen/X86/statepoint-invoke-ra-enter-at-end.mir
  llvm/test/CodeGen/X86/vec_fp_to_int.ll
  llvm/test/CodeGen/X86/vector-half-conversions.ll
  llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
  llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll
  llvm/test/MC/X86/x86_64-asm-match.s

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-10 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke accepted this revision.
LuoYuanke added a comment.
This revision is now accepted and ready to land.

LGTM, thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-10 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/test/CodeGen/X86/fpclamptosat_vec.ll:605
+; CHECK-NEXT:.cfi_def_cfa_offset 80
+; CHECK-NEXT:movss %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT:movss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill

pengfei wrote:
> LuoYuanke wrote:
> > Is the vector <4 x half> split to 4 scalar and pass by xmm? What's the ABI 
> > for vector half? Is there any case that test the scenario that run out of 
> > register and pass parameter through stack?
> Good question! Previously, I discussed with GCC folks we won't support vector 
> in emulation. I expected the FE with pass whole vector through stack. So a 
> vector in IR is illegal to ABI and can be splited.
> But seems GCC passes it by vector register. https://godbolt.org/z/a67rMhTW6
> I'll double confirm with GCC folks.
Discussed with GCC folks today. We should support the vector ABI. But we have 
to adding more patterns to support load/store etc. operations for vector type. 
I'd like to address this as a follow up.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-09 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/test/Analysis/CostModel/X86/fptoi_sat.ll:852
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %f16u1 
= call i1 @llvm.fptoui.sat.i1.f16(half undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %f16s8 
= call i8 @llvm.fptosi.sat.i8.f16(half undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %f16u8 
= call i8 @llvm.fptoui.sat.i8.f16(half undef)

LuoYuanke wrote:
> It seems the cost is reduced in general. Is it because we pass/return f16 by 
> xmm register?
No. It's because we don't have cost model for `f16`. I added some in D127386 to 
address this.



Comment at: llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir:31
   ; CHECK-LABEL: name: test
-  ; CHECK: INLINEASM , 0 /* attdialect */, 4390922 /* regdef:GR64 */, def 
$rsi, 4390922 /* regdef:GR64 */, def dead $rdi,
-INLINEASM , 0, 4390922, def $rsi, 4390922, def dead $rdi, 2147549193, 
killed $rdi, 2147483657, killed $rsi, 12, implicit-def dead early-clobber 
$eflags
+  ; CHECK: INLINEASM , 0 /* attdialect */, 4456458 /* regdef:GR64 */, def 
$rsi, 4456458 /* regdef:GR64 */, def dead $rdi,
+INLINEASM , 0, 4456458, def $rsi, 4456458, def dead $rdi, 2147549193, 
killed $rdi, 2147483657, killed $rsi, 12, implicit-def dead early-clobber 
$eflags

LuoYuanke wrote:
> Why f16 patch affect this test case? There is no fp instruction in this test 
> case.
I this it's newly added `FR16` that affects all number the other number 
register class. We met the problem when enabling FP16 too.



Comment at: llvm/test/CodeGen/X86/atomic-non-integer.ll:253
+; X64-SSE-NEXT:movzwl (%rdi), %eax
+; X64-SSE-NEXT:pinsrw $0, %eax, %xmm0
+; X64-SSE-NEXT:retq

LuoYuanke wrote:
> I notice X86-SSE1 return by GPR. Should we also return by GPR for X64-SSE?
No. The result in X86-SSE in UB. We support the emulation on SSE2 and later.



Comment at: llvm/test/CodeGen/X86/avx512-insert-extract.ll:2307
+; SKX-NEXT:vmovd %ecx, %xmm0
+; SKX-NEXT:vcvtph2ps %xmm0, %xmm0
+; SKX-NEXT:vmovss %xmm0, %xmm0, %xmm0 {%k2} {z}

LuoYuanke wrote:
> Is code less efficient than previous code? Why previous code still works 
> without convert half to float?
Yes. The previous code using `i16` for FP16. Improved, thanks!



Comment at: llvm/test/CodeGen/X86/callbr-asm-bb-exports.ll:20
 ; CHECK-NEXT: t22: ch,glue = CopyToReg t17, Register:i32 %5, t8
-; CHECK-NEXT: t30: ch,glue = inlineasm_br t22, TargetExternalSymbol:i64'xorl 
$0, $0; jmp ${1:l}', MDNode:ch, TargetConstant:i64<8>, 
TargetConstant:i32<2293769>, Register:i32 %5, TargetConstant:i64<13>, 
TargetBlockAddress:i64<@test, %fail> 0, TargetConstant:i32<12>, Register:i32 
$df, TargetConstant:i32<12>, Register:i16 $fpsw, TargetConstant:i32<12>, 
Register:i32 $eflags, t22:1
+; CHECK-NEXT: t30: ch,glue = inlineasm_br t22, TargetExternalSymbol:i64'xorl 
$0, $0; jmp ${1:l}', MDNode:ch, TargetConstant:i64<8>, 
TargetConstant:i32<2359305>, Register:i32 %5, TargetConstant:i64<13>, 
TargetBlockAddress:i64<@test, %fail> 0, TargetConstant:i32<12>, Register:i32 
$df, TargetConstant:i32<12>, Register:i16 $fpsw, TargetConstant:i32<12>, 
Register:i32 $eflags, t22:1
 

LuoYuanke wrote:
> Why this test is affacted? Is it caused by calling convention change?
No. It's caused by newly added `FR16` register class.



Comment at: llvm/test/CodeGen/X86/fmf-flags.ll:115
-; X64-NEXT:movzwl %di, %edi
-; X64-NEXT:callq __gnu_h2f_ieee@PLT
 ; X64-NEXT:mulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0

LuoYuanke wrote:
> Does __gnu_h2f_ieee retrun from xmm?
There does not exist a `__gnu_h2f_ieee` on X86 before. It's ARM/AArch64 
specific.



Comment at: llvm/test/CodeGen/X86/fpclamptosat.ll:569
 ; CHECK-NEXT:cvttss2si %xmm0, %rax
 ; CHECK-NEXT:ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:movabsq $-9223372036854775808, %rcx # imm = 0x8000

LuoYuanke wrote:
> I'm curious why there is 1 more compare in this patch. 
It's an optimization implemented by D111976. We don't meet the requirment that 
`isOperationLegalOrCustom`. It's not easy to solve because we need to check the 
promoted type instead. I'll leave it as is.



Comment at: llvm/test/CodeGen/X86/fpclamptosat.ll:776
+; CHECK-NEXT:cmovael %eax, %ecx
+; CHECK-NEXT:ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:movl $2147483647, %edx # imm = 0x7FFF

LuoYuanke wrote:
> Ditto.
The same as above.



Comment at: llvm/test/CodeGen/X86/fpclamptosat_vec.ll:605
+; CHECK-NEXT:.cfi_def_cfa_offset 80
+; CHECK-NEXT:movss %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT:movss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-09 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei updated this revision to Diff 435583.
pengfei marked an inline comment as done.
pengfei added a comment.

Address Yuanke's comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

Files:
  llvm/docs/ReleaseNotes.rst
  llvm/lib/Target/X86/X86FastISel.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86ISelLowering.h
  llvm/lib/Target/X86/X86InstrAVX512.td
  llvm/lib/Target/X86/X86InstrCompiler.td
  llvm/lib/Target/X86/X86InstrInfo.cpp
  llvm/lib/Target/X86/X86InstrSSE.td
  llvm/lib/Target/X86/X86InstrVecCompiler.td
  llvm/lib/Target/X86/X86InstructionSelector.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/Analysis/CostModel/X86/fptoi_sat.ll
  llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir
  llvm/test/CodeGen/X86/atomic-non-integer.ll
  llvm/test/CodeGen/X86/avx512-insert-extract.ll
  llvm/test/CodeGen/X86/avx512-masked_memop-16-8.ll
  llvm/test/CodeGen/X86/avx512fp16-fp-logic.ll
  llvm/test/CodeGen/X86/callbr-asm-bb-exports.ll
  llvm/test/CodeGen/X86/cvt16-2.ll
  llvm/test/CodeGen/X86/cvt16.ll
  llvm/test/CodeGen/X86/fastmath-float-half-conversion.ll
  llvm/test/CodeGen/X86/fmf-flags.ll
  llvm/test/CodeGen/X86/fp-round.ll
  llvm/test/CodeGen/X86/fp-roundeven.ll
  llvm/test/CodeGen/X86/fp128-cast-strict.ll
  llvm/test/CodeGen/X86/fpclamptosat.ll
  llvm/test/CodeGen/X86/fpclamptosat_vec.ll
  llvm/test/CodeGen/X86/fptosi-sat-scalar.ll
  llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
  llvm/test/CodeGen/X86/fptoui-sat-scalar.ll
  llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
  llvm/test/CodeGen/X86/freeze.ll
  llvm/test/CodeGen/X86/half-constrained.ll
  llvm/test/CodeGen/X86/half.ll
  llvm/test/CodeGen/X86/pr31088.ll
  llvm/test/CodeGen/X86/pr38533.ll
  llvm/test/CodeGen/X86/pr47000.ll
  llvm/test/CodeGen/X86/scheduler-asm-moves.mir
  llvm/test/CodeGen/X86/shuffle-extract-subvector.ll
  llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16-fma.ll
  llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16.ll
  llvm/test/CodeGen/X86/statepoint-invoke-ra-enter-at-end.mir
  llvm/test/CodeGen/X86/vec_fp_to_int.ll
  llvm/test/CodeGen/X86/vector-half-conversions.ll
  llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
  llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll
  llvm/test/MC/X86/x86_64-asm-match.s

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-09 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: llvm/test/CodeGen/X86/fpclamptosat.ll:569
 ; CHECK-NEXT:cvttss2si %xmm0, %rax
 ; CHECK-NEXT:ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:movabsq $-9223372036854775808, %rcx # imm = 0x8000

I'm curious why there is 1 more compare in this patch. 



Comment at: llvm/test/CodeGen/X86/fpclamptosat.ll:776
+; CHECK-NEXT:cmovael %eax, %ecx
+; CHECK-NEXT:ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:movl $2147483647, %edx # imm = 0x7FFF

Ditto.



Comment at: llvm/test/CodeGen/X86/fpclamptosat_vec.ll:605
+; CHECK-NEXT:.cfi_def_cfa_offset 80
+; CHECK-NEXT:movss %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT:movss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill

Is the vector <4 x half> split to 4 scalar and pass by xmm? What's the ABI for 
vector half? Is there any case that test the scenario that run out of register 
and pass parameter through stack?



Comment at: llvm/test/CodeGen/X86/fptosi-sat-scalar.ll:2138
+; X64-NEXT:ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; X64-NEXT:movl $255, %eax
+; X64-NEXT:cmovael %ecx, %eax

It seems less efficient than previous code on NAN, zero handling, but we can 
improve later.



Comment at: llvm/test/CodeGen/X86/half.ll:946
+; CHECK-I686-NEXT:calll __extendhfsf2
+; CHECK-I686-NEXT:fstps {{[0-9]+}}(%esp)
+; CHECK-I686-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero

Why the x87 instruction is generated?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-08 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: llvm/test/Analysis/CostModel/X86/fptoi_sat.ll:852
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %f16u1 
= call i1 @llvm.fptoui.sat.i1.f16(half undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %f16s8 
= call i8 @llvm.fptosi.sat.i8.f16(half undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %f16u8 
= call i8 @llvm.fptoui.sat.i8.f16(half undef)

It seems the cost is reduced in general. Is it because we pass/return f16 by 
xmm register?



Comment at: llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir:31
   ; CHECK-LABEL: name: test
-  ; CHECK: INLINEASM , 0 /* attdialect */, 4390922 /* regdef:GR64 */, def 
$rsi, 4390922 /* regdef:GR64 */, def dead $rdi,
-INLINEASM , 0, 4390922, def $rsi, 4390922, def dead $rdi, 2147549193, 
killed $rdi, 2147483657, killed $rsi, 12, implicit-def dead early-clobber 
$eflags
+  ; CHECK: INLINEASM , 0 /* attdialect */, 4456458 /* regdef:GR64 */, def 
$rsi, 4456458 /* regdef:GR64 */, def dead $rdi,
+INLINEASM , 0, 4456458, def $rsi, 4456458, def dead $rdi, 2147549193, 
killed $rdi, 2147483657, killed $rsi, 12, implicit-def dead early-clobber 
$eflags

Why f16 patch affect this test case? There is no fp instruction in this test 
case.



Comment at: llvm/test/CodeGen/X86/atomic-non-integer.ll:253
+; X64-SSE-NEXT:movzwl (%rdi), %eax
+; X64-SSE-NEXT:pinsrw $0, %eax, %xmm0
+; X64-SSE-NEXT:retq

I notice X86-SSE1 return by GPR. Should we also return by GPR for X64-SSE?



Comment at: llvm/test/CodeGen/X86/avx512-insert-extract.ll:2307
+; SKX-NEXT:vmovd %ecx, %xmm0
+; SKX-NEXT:vcvtph2ps %xmm0, %xmm0
+; SKX-NEXT:vmovss %xmm0, %xmm0, %xmm0 {%k2} {z}

Is code less efficient than previous code? Why previous code still works 
without convert half to float?



Comment at: llvm/test/CodeGen/X86/avx512-masked_memop-16-8.ll:156
 ; Make sure we scalarize masked loads of f16.
 define <16 x half> @test_mask_load_16xf16(<16 x i1> %mask, <16 x half>* %addr, 
<16 x half> %val) {
 ; CHECK-LABEL: test_mask_load_16xf16:

It seems parameter %val is useless.



Comment at: llvm/test/CodeGen/X86/callbr-asm-bb-exports.ll:20
 ; CHECK-NEXT: t22: ch,glue = CopyToReg t17, Register:i32 %5, t8
-; CHECK-NEXT: t30: ch,glue = inlineasm_br t22, TargetExternalSymbol:i64'xorl 
$0, $0; jmp ${1:l}', MDNode:ch, TargetConstant:i64<8>, 
TargetConstant:i32<2293769>, Register:i32 %5, TargetConstant:i64<13>, 
TargetBlockAddress:i64<@test, %fail> 0, TargetConstant:i32<12>, Register:i32 
$df, TargetConstant:i32<12>, Register:i16 $fpsw, TargetConstant:i32<12>, 
Register:i32 $eflags, t22:1
+; CHECK-NEXT: t30: ch,glue = inlineasm_br t22, TargetExternalSymbol:i64'xorl 
$0, $0; jmp ${1:l}', MDNode:ch, TargetConstant:i64<8>, 
TargetConstant:i32<2359305>, Register:i32 %5, TargetConstant:i64<13>, 
TargetBlockAddress:i64<@test, %fail> 0, TargetConstant:i32<12>, Register:i32 
$df, TargetConstant:i32<12>, Register:i16 $fpsw, TargetConstant:i32<12>, 
Register:i32 $eflags, t22:1
 

Why this test is affacted? Is it caused by calling convention change?



Comment at: llvm/test/CodeGen/X86/fmf-flags.ll:115
-; X64-NEXT:movzwl %di, %edi
-; X64-NEXT:callq __gnu_h2f_ieee@PLT
 ; X64-NEXT:mulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0

Does __gnu_h2f_ieee retrun from xmm?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-08 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei updated this revision to Diff 435151.
pengfei marked 3 inline comments as done.
pengfei added a comment.

Address Yuanke's comments. Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

Files:
  llvm/docs/ReleaseNotes.rst
  llvm/lib/Target/X86/X86FastISel.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86ISelLowering.h
  llvm/lib/Target/X86/X86InstrAVX512.td
  llvm/lib/Target/X86/X86InstrCompiler.td
  llvm/lib/Target/X86/X86InstrInfo.cpp
  llvm/lib/Target/X86/X86InstrSSE.td
  llvm/lib/Target/X86/X86InstrVecCompiler.td
  llvm/lib/Target/X86/X86InstructionSelector.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/Analysis/CostModel/X86/fptoi_sat.ll
  llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir
  llvm/test/CodeGen/X86/atomic-non-integer.ll
  llvm/test/CodeGen/X86/avx512-insert-extract.ll
  llvm/test/CodeGen/X86/avx512-masked_memop-16-8.ll
  llvm/test/CodeGen/X86/avx512fp16-fp-logic.ll
  llvm/test/CodeGen/X86/callbr-asm-bb-exports.ll
  llvm/test/CodeGen/X86/cvt16-2.ll
  llvm/test/CodeGen/X86/cvt16.ll
  llvm/test/CodeGen/X86/fastmath-float-half-conversion.ll
  llvm/test/CodeGen/X86/fmf-flags.ll
  llvm/test/CodeGen/X86/fp-round.ll
  llvm/test/CodeGen/X86/fp-roundeven.ll
  llvm/test/CodeGen/X86/fp128-cast-strict.ll
  llvm/test/CodeGen/X86/fpclamptosat.ll
  llvm/test/CodeGen/X86/fpclamptosat_vec.ll
  llvm/test/CodeGen/X86/fptosi-sat-scalar.ll
  llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
  llvm/test/CodeGen/X86/fptoui-sat-scalar.ll
  llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
  llvm/test/CodeGen/X86/freeze.ll
  llvm/test/CodeGen/X86/half-constrained.ll
  llvm/test/CodeGen/X86/half.ll
  llvm/test/CodeGen/X86/pr31088.ll
  llvm/test/CodeGen/X86/pr38533.ll
  llvm/test/CodeGen/X86/pr47000.ll
  llvm/test/CodeGen/X86/scheduler-asm-moves.mir
  llvm/test/CodeGen/X86/shuffle-extract-subvector.ll
  llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16-fma.ll
  llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16.ll
  llvm/test/CodeGen/X86/statepoint-invoke-ra-enter-at-end.mir
  llvm/test/CodeGen/X86/vec_fp_to_int.ll
  llvm/test/CodeGen/X86/vector-half-conversions.ll
  llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
  llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll
  llvm/test/MC/X86/x86_64-asm-match.s

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-08 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:616
+setOperationAction(ISD::FROUNDEVEN, MVT::f16, Promote);
+setOperationAction(ISD::FP_ROUND, MVT::f16, Expand);
+setOperationAction(ISD::FP_EXTEND, MVT::f32, Expand);

LuoYuanke wrote:
> Just confused how to expand it. Will the expand fail and finally turns to 
> libcall?
Yeah, we can use `LibCall` instead.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:763
+if (isTypeLegal(MVT::f16)) {
+  setOperationAction(ISD::FP_EXTEND, MVT::f80, Custom);
+  setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f80, Custom);

LuoYuanke wrote:
> Why f16 emulation affect f80 type? Are we checking isTypeLegal(MVT::f80)?
It's in the scope of `if (UseX87)`. And we need to lower `fpext half %0 to 
x86_fp80`.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22448
+  if (SrcVT == MVT::f16)
+return SDValue();
+

LuoYuanke wrote:
> Why we don't extent to f32 here?
Return `SDValue()` will extent later. This can save the code.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22522
+  if (!isScalarFPTypeInSSEReg(SrcVT) ||
+  (SrcVT == MVT::f16 && !Subtarget.hasFP16()))
 return SDValue();

LuoYuanke wrote:
> Why we don't extent to f32 here? Will it be promoted finally?
Yes.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22765
+DAG.getIntPtrConstant(0, DL));
+  Res = DAG.getNode(X86ISD::STRICT_CVTPS2PH, DL, {MVT::v8i16, MVT::Other},
+{Chain, Res, DAG.getTargetConstant(4, DL, MVT::i32)});

LuoYuanke wrote:
> Should MVT::v8i16 be MVT::v8f16?
No. We use `MVT::v8i16` when we enabled F16C instructions.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22775
+
+Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i16, Res,
+  DAG.getIntPtrConstant(0, DL));

LuoYuanke wrote:
> MVT::f16 and delete the bitcast?
I don't think we have pattern to extract `f16` from `v8i16`. Besides, I think 
keeping the bitcast makes the flow clear.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:1476
 }
-let Predicates = [HasFP16] in {
+let Predicates = [HasBWI] in {
   def : Pat<(v32f16 (X86VBroadcastld16 addr:$src)),

LuoYuanke wrote:
> If target don't have avx512bw feature. There is some other pattern to lower 
> the node or fp16 broadcast node is invalid?
Good catch. Added in X86InstrSSE.td



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4107
  _.ExeDomain>, EVEX_4V, Sched<[SchedWriteFShuffle.XMM]>;
+  let Predicates = [prd] in {
   def rrkz : AVX512PI<0x10, MRMSrcReg, (outs _.RC:$dst),

LuoYuanke wrote:
> Previous prd only apply to "def rr"? Is it a bug for previous code?
No. previous code works well because no mask variants before AVX512 and no f16 
before FP16. The latter is not true now.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4352
+defm : avx512_move_scalar_lowering<"VMOVSHZ", X86Movsh, fp16imm0, v8f16x_info>;
+defm : avx512_store_scalar_lowering<"VMOVSHZ", avx512vl_f16_info,
+   (v32i1 (bitconvert (and GR32:$mask, (i32 1, GR32>;

LuoYuanke wrote:
> Why previous code don't have predicates?
Because no legal `f16` previously.



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:11657
 
+let Predicates = [HasBWI], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWZrm (v8i16 
(IMPLICIT_DEF)), addr:$src, 0), FR16X)>;

LuoYuanke wrote:
> Why set AddedComplexity to -10? There no such addtional complexity in 
> previous code. Add comments for it? 
We used it before, but very little. We need to make sure select FP16 
instructions first if available.



Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3970
 
+let Predicates = [UseSSE2], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (PINSRWrm (v8i16 
(IMPLICIT_DEF)), addr:$src, 0), FR16)>;

LuoYuanke wrote:
> Why  AddedComplexity = -10? Add comments for it?
This is to avoid FP16 instructions been overridden.



Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3978
+let Predicates = [HasAVX, NoBWI], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWrm (v8i16 
(IMPLICIT_DEF)), addr:$src, 0), FR16)>;
+  def : Pat<(i16 (bitconvert f16:$src)), (EXTRACT_SUBREG (VPEXTRWrr (v8i16 
(COPY_TO_REGCLASS FR16:$src, VR128)), 0), sub_16bit)>;

LuoYuanke wrote:
> Miss pattern for store?
It's in line 5214.



Comment at: llvm/lib/Target/X86/X86InstrSSE.td:5214
+
+let Predicates = [HasAVX, NoBWI] in
+  def : Pat<(store f16:$src, addr:$dst), (VPEXTRWmr 

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-06-08 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:616
+setOperationAction(ISD::FROUNDEVEN, MVT::f16, Promote);
+setOperationAction(ISD::FP_ROUND, MVT::f16, Expand);
+setOperationAction(ISD::FP_EXTEND, MVT::f32, Expand);

Just confused how to expand it. Will the expand fail and finally turns to 
libcall?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:763
+if (isTypeLegal(MVT::f16)) {
+  setOperationAction(ISD::FP_EXTEND, MVT::f80, Custom);
+  setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f80, Custom);

Why f16 emulation affect f80 type? Are we checking isTypeLegal(MVT::f80)?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22100
   SDValue Res;
+  if (SrcVT == MVT::f16 && !Subtarget.hasFP16()) {
+if (IsStrict)

Not sure if it is better to wrapper it into a readable function (e.g., 
isSoftF16).



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22448
+  if (SrcVT == MVT::f16)
+return SDValue();
+

Why we don't extent to f32 here?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22522
+  if (!isScalarFPTypeInSSEReg(SrcVT) ||
+  (SrcVT == MVT::f16 && !Subtarget.hasFP16()))
 return SDValue();

Why we don't extent to f32 here? Will it be promoted finally?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22765
+DAG.getIntPtrConstant(0, DL));
+  Res = DAG.getNode(X86ISD::STRICT_CVTPS2PH, DL, {MVT::v8i16, MVT::Other},
+{Chain, Res, DAG.getTargetConstant(4, DL, MVT::i32)});

Should MVT::v8i16 be MVT::v8f16?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22766
+  Res = DAG.getNode(X86ISD::STRICT_CVTPS2PH, DL, {MVT::v8i16, MVT::Other},
+{Chain, Res, DAG.getTargetConstant(4, DL, MVT::i32)});
+  Chain = Res.getValue(1);

Is it rounding control? Can we use a macro or add comments for what is the 
rounding control?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22775
+
+Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i16, Res,
+  DAG.getIntPtrConstant(0, DL));

MVT::f16 and delete the bitcast?



Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:44211
   VT != MVT::f80 && VT != MVT::f128 &&
+  !(VT.getScalarType() == MVT::f16 && !Subtarget.hasFP16()) &&
   (TLI.isTypeLegal(VT) || VT == MVT::v2f32) &&

Not sure if it is better to wrapper it into a readable function (e.g., 
isSoftF16).



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:1476
 }
-let Predicates = [HasFP16] in {
+let Predicates = [HasBWI] in {
   def : Pat<(v32f16 (X86VBroadcastld16 addr:$src)),

If target don't have avx512bw feature. There is some other pattern to lower the 
node or fp16 broadcast node is invalid?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4107
  _.ExeDomain>, EVEX_4V, Sched<[SchedWriteFShuffle.XMM]>;
+  let Predicates = [prd] in {
   def rrkz : AVX512PI<0x10, MRMSrcReg, (outs _.RC:$dst),

Previous prd only apply to "def rr"? Is it a bug for previous code?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4352
+defm : avx512_move_scalar_lowering<"VMOVSHZ", X86Movsh, fp16imm0, v8f16x_info>;
+defm : avx512_store_scalar_lowering<"VMOVSHZ", avx512vl_f16_info,
+   (v32i1 (bitconvert (and GR32:$mask, (i32 1, GR32>;

Why previous code don't have predicates?



Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:11657
 
+let Predicates = [HasBWI], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWZrm (v8i16 
(IMPLICIT_DEF)), addr:$src, 0), FR16X)>;

Why set AddedComplexity to -10? There no such addtional complexity in previous 
code. Add comments for it? 



Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3970
 
+let Predicates = [UseSSE2], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (PINSRWrm (v8i16 
(IMPLICIT_DEF)), addr:$src, 0), FR16)>;

Why  AddedComplexity = -10? Add comments for it?



Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3978
+let Predicates = [HasAVX, NoBWI], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWrm (v8i16 
(IMPLICIT_DEF)), addr:$src, 0), FR16)>;
+  def : Pat<(i16 (bitconvert f16:$src)), (EXTRACT_SUBREG (VPEXTRWrr (v8i16 
(COPY_TO_REGCLASS FR16:$src, VR128)), 0), sub_16bit)>;

Miss pattern for store?



Comment at: llvm/lib/Target/X86/X86InstrSSE.td:5214
+
+let Predicates = [HasAVX, NoBWI] in
+  def : Pat<(store f16:$src, 

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-05-18 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei updated this revision to Diff 430349.
pengfei added a comment.

Replace __gnu_f2h_ieee/__gnu_h2f_ieee with __truncsfhf2/__extendhfsf2 to match 
with GCC.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

Files:
  llvm/docs/ReleaseNotes.rst
  llvm/lib/Target/X86/X86FastISel.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86InstrAVX512.td
  llvm/lib/Target/X86/X86InstrCompiler.td
  llvm/lib/Target/X86/X86InstrInfo.cpp
  llvm/lib/Target/X86/X86InstrSSE.td
  llvm/lib/Target/X86/X86InstrVecCompiler.td
  llvm/lib/Target/X86/X86InstructionSelector.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/Analysis/CostModel/X86/fptoi_sat.ll
  llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir
  llvm/test/CodeGen/X86/atomic-non-integer.ll
  llvm/test/CodeGen/X86/avx512-insert-extract.ll
  llvm/test/CodeGen/X86/avx512-masked_memop-16-8.ll
  llvm/test/CodeGen/X86/callbr-asm-bb-exports.ll
  llvm/test/CodeGen/X86/cvt16-2.ll
  llvm/test/CodeGen/X86/cvt16.ll
  llvm/test/CodeGen/X86/fastmath-float-half-conversion.ll
  llvm/test/CodeGen/X86/fmf-flags.ll
  llvm/test/CodeGen/X86/fp-round.ll
  llvm/test/CodeGen/X86/fp-roundeven.ll
  llvm/test/CodeGen/X86/fp128-cast-strict.ll
  llvm/test/CodeGen/X86/fpclamptosat.ll
  llvm/test/CodeGen/X86/fpclamptosat_vec.ll
  llvm/test/CodeGen/X86/fptosi-sat-scalar.ll
  llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
  llvm/test/CodeGen/X86/fptoui-sat-scalar.ll
  llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
  llvm/test/CodeGen/X86/freeze.ll
  llvm/test/CodeGen/X86/half-constrained.ll
  llvm/test/CodeGen/X86/half.ll
  llvm/test/CodeGen/X86/pr31088.ll
  llvm/test/CodeGen/X86/pr38533.ll
  llvm/test/CodeGen/X86/pr47000.ll
  llvm/test/CodeGen/X86/scheduler-asm-moves.mir
  llvm/test/CodeGen/X86/shuffle-extract-subvector.ll
  llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16-fma.ll
  llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16.ll
  llvm/test/CodeGen/X86/statepoint-invoke-ra-enter-at-end.mir
  llvm/test/CodeGen/X86/vec_fp_to_int.ll
  llvm/test/CodeGen/X86/vector-half-conversions.ll
  llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
  llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll
  llvm/test/MC/X86/x86_64-asm-match.s

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-05-18 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei updated this revision to Diff 430314.
pengfei added a comment.

Use 32-bit spill slot for half type. Others still on going.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

Files:
  llvm/docs/ReleaseNotes.rst
  llvm/lib/Target/X86/X86FastISel.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86InstrAVX512.td
  llvm/lib/Target/X86/X86InstrCompiler.td
  llvm/lib/Target/X86/X86InstrInfo.cpp
  llvm/lib/Target/X86/X86InstrSSE.td
  llvm/lib/Target/X86/X86InstrVecCompiler.td
  llvm/lib/Target/X86/X86InstructionSelector.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/Analysis/CostModel/X86/fptoi_sat.ll
  llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir
  llvm/test/CodeGen/X86/atomic-non-integer.ll
  llvm/test/CodeGen/X86/avx512-insert-extract.ll
  llvm/test/CodeGen/X86/avx512-masked_memop-16-8.ll
  llvm/test/CodeGen/X86/callbr-asm-bb-exports.ll
  llvm/test/CodeGen/X86/cvt16-2.ll
  llvm/test/CodeGen/X86/cvt16.ll
  llvm/test/CodeGen/X86/fastmath-float-half-conversion.ll
  llvm/test/CodeGen/X86/fmf-flags.ll
  llvm/test/CodeGen/X86/fp-round.ll
  llvm/test/CodeGen/X86/fp-roundeven.ll
  llvm/test/CodeGen/X86/fp128-cast-strict.ll
  llvm/test/CodeGen/X86/fpclamptosat.ll
  llvm/test/CodeGen/X86/fpclamptosat_vec.ll
  llvm/test/CodeGen/X86/fptosi-sat-scalar.ll
  llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
  llvm/test/CodeGen/X86/fptoui-sat-scalar.ll
  llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
  llvm/test/CodeGen/X86/freeze.ll
  llvm/test/CodeGen/X86/half-constrained.ll
  llvm/test/CodeGen/X86/half.ll
  llvm/test/CodeGen/X86/pr31088.ll
  llvm/test/CodeGen/X86/pr38533.ll
  llvm/test/CodeGen/X86/pr47000.ll
  llvm/test/CodeGen/X86/scheduler-asm-moves.mir
  llvm/test/CodeGen/X86/shuffle-extract-subvector.ll
  llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16-fma.ll
  llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16.ll
  llvm/test/CodeGen/X86/statepoint-invoke-ra-enter-at-end.mir
  llvm/test/CodeGen/X86/vec_fp_to_int.ll
  llvm/test/CodeGen/X86/vector-half-conversions.ll
  llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
  llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll
  llvm/test/MC/X86/x86_64-asm-match.s

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-05-17 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei updated this revision to Diff 430019.
pengfei added a comment.
Herald added subscribers: armkevincheng, eric-k256, javed.absar.
Herald added a reviewer: sjarus.

Rebased on the avx512fp16 implementation. Still WIP for optimizations and a 
fast RA issue.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

Files:
  llvm/docs/ReleaseNotes.rst
  llvm/lib/Target/X86/X86FastISel.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86InstrAVX512.td
  llvm/lib/Target/X86/X86InstrCompiler.td
  llvm/lib/Target/X86/X86InstrInfo.cpp
  llvm/lib/Target/X86/X86InstrSSE.td
  llvm/lib/Target/X86/X86InstrVecCompiler.td
  llvm/lib/Target/X86/X86InstructionSelector.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/Analysis/CostModel/X86/fptoi_sat.ll
  llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir
  llvm/test/CodeGen/X86/atomic-non-integer.ll
  llvm/test/CodeGen/X86/avx512-insert-extract.ll
  llvm/test/CodeGen/X86/avx512-masked_memop-16-8.ll
  llvm/test/CodeGen/X86/callbr-asm-bb-exports.ll
  llvm/test/CodeGen/X86/cvt16-2.ll
  llvm/test/CodeGen/X86/cvt16.ll
  llvm/test/CodeGen/X86/fastmath-float-half-conversion.ll
  llvm/test/CodeGen/X86/fmf-flags.ll
  llvm/test/CodeGen/X86/fp-round.ll
  llvm/test/CodeGen/X86/fp-roundeven.ll
  llvm/test/CodeGen/X86/fp128-cast-strict.ll
  llvm/test/CodeGen/X86/fpclamptosat.ll
  llvm/test/CodeGen/X86/fpclamptosat_vec.ll
  llvm/test/CodeGen/X86/fptosi-sat-scalar.ll
  llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
  llvm/test/CodeGen/X86/fptoui-sat-scalar.ll
  llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
  llvm/test/CodeGen/X86/freeze.ll
  llvm/test/CodeGen/X86/half-constrained.ll
  llvm/test/CodeGen/X86/half.ll
  llvm/test/CodeGen/X86/pr31088.ll
  llvm/test/CodeGen/X86/pr38533.ll
  llvm/test/CodeGen/X86/pr47000.ll
  llvm/test/CodeGen/X86/scheduler-asm-moves.mir
  llvm/test/CodeGen/X86/shuffle-extract-subvector.ll
  llvm/test/CodeGen/X86/statepoint-invoke-ra-enter-at-end.mir
  llvm/test/CodeGen/X86/vec_fp_to_int.ll
  llvm/test/CodeGen/X86/vector-half-conversions.ll
  llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
  llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll
  llvm/test/MC/X86/x86_64-asm-match.s

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2021-08-02 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added a comment.

After the last refactor, I think this patch is mostly ready.
This patch strips most of the ABI and _Float16 type related code from D105263 
, which can be leaving with only AVX512-FP16 
ISA enabling code.
I think it should be more friendly for review. The defect is we make all FP16 
enabling patches depend on and been blocked by this one. So I hope we could 
have a quick review and land it earlier.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2021-07-29 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added a comment.

In D107082#2913881 , @craig.topper 
wrote:

> I haven't had a chance to look at this patch in detail, but I wanted to ask 
> if you considered doing what ARM and RISCV do for this. They pass the f16 in 
> the lower bits on an f32 by only changing the ABI handling code in the 
> backend. The type legalizer takes care of the rest. That seems simpler than 
> this patch. See for example https://reviews.llvm.org/D98670

Thanks Craig for the information. I referenced implementation in AArch64. I 
think we have to add a legal f16 type in this way because:

1. We will support `_Float16` type in Clang on SSE2 and above to keep the same 
behavior with GCC. So a legal type is a must.
2. Using lower 16bits of f32 may not satisfice the requirment from calling 
conversion of aggregation type and complex type defined by psABI.
3. We have some optimizations to leverage F16C or AVX512 ps2ph/ph2ps 
instructions. A legal type is easy to customize.

Besides, we have full arithmatic f16 support in AVX512FP16. Most of the code 
here are shared and served for both scenarios. We just need to promote for most 
FP operations and expand or customize `FP_ROUND` and `FP_EXTEND` here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2021-07-29 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

I haven't had a chance to look at this patch in detail, but I wanted to ask if 
you considered doing what ARM and RISCV do for this. They pass the f16 in the 
lower bits on an f32 by only changing the ABI handling code in the backend. The 
type legalizer takes care of the rest. That eems simpler than this patch. See 
for example https://reviews.llvm.org/D98670


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2021-07-29 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/include/llvm/IR/RuntimeLibcalls.def:293-294
 HANDLE_LIBCALL(FPEXT_F16_F64, "__extendhfdf2")
 HANDLE_LIBCALL(FPEXT_F16_F32, "__gnu_h2f_ieee")
 HANDLE_LIBCALL(FPROUND_F32_F16, "__gnu_f2h_ieee")
 HANDLE_LIBCALL(FPROUND_F64_F16, "__truncdfhf2")

GCC12 will provide functions `__extendhfsf2` and `__truncsfhf2`. I wonder if I 
can change it directly here or do extra customization for ARM/AArch64? Other 
targets?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2021-07-29 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added a comment.

I sent out this patch mainly for PoC of the ABI changes, I'll fix the 
performance regressions in next phase.
LLVM was using a different calling conversion on x86 when passing and returning 
half type. It conflicts with current X86 psABI.
I have evaluated the risk internally and think the change of ABI has low risk 
due to Clang doesn't use such calling conversion. But I may not be thoughtful 
enough. Questions and comments are appreciated.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits