[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
JonPsson1 wrote: > Both clear_cache and enable_execute_stack should just be empty no-op > functions on our platform. ah - ok. I fixed clear_cache.c, and this also took care of the other test case which was calling it. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -179,20 +179,31 @@ bool SystemZABIInfo::isVectorArgumentType(QualType Ty) const { getContext().getTypeSize(Ty) <= 128); } -bool SystemZABIInfo::isFPArgumentType(QualType Ty) const { +// The Size argument will in case of af an overaligned single element struct +// reflect the overalignment value. In such a case the argument will be +// passed using the type matching Size. +llvm::Type *SystemZABIInfo::getFPArgumentType(QualType Ty, + uint64_t Size) const { if (IsSoftFloatABI) -return false; +return nullptr; if (const BuiltinType *BT = Ty->getAs()) switch (BT->getKind()) { +case BuiltinType::Float16: + if (Size == 16) JonPsson1 wrote: As I wrote earlier, in cases of a single element struct containing e.g. a _Float16 aligned to 8 bytes, the argument will be passed as a Double. At this point, the Size reflects this situation. I guess that's why the code looked like it did before. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
uweigand wrote: > I see a lot of target specific pre-processing in clear_cache.c that either > disables this to a no-op, or does something target specific. The test itself > this is disabled for some targets. Seems reasonable to leave this as > unsupported and disable the test (for now?). Also disabling > enable_execute_stack_test.c. Both clear_cache and enable_execute_stack should just be empty no-op functions on our platform. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -179,20 +179,31 @@ bool SystemZABIInfo::isVectorArgumentType(QualType Ty) const { getContext().getTypeSize(Ty) <= 128); } -bool SystemZABIInfo::isFPArgumentType(QualType Ty) const { +// The Size argument will in case of af an overaligned single element struct +// reflect the overalignment value. In such a case the argument will be +// passed using the type matching Size. +llvm::Type *SystemZABIInfo::getFPArgumentType(QualType Ty, + uint64_t Size) const { if (IsSoftFloatABI) -return false; +return nullptr; if (const BuiltinType *BT = Ty->getAs()) switch (BT->getKind()) { +case BuiltinType::Float16: + if (Size == 16) arsenm wrote: I don't understand why a size check is necessary; Float16 implies the size? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -179,20 +179,31 @@ bool SystemZABIInfo::isVectorArgumentType(QualType Ty) const { getContext().getTypeSize(Ty) <= 128); } -bool SystemZABIInfo::isFPArgumentType(QualType Ty) const { +// The Size argument will in case of af an overaligned single element struct +// reflect the overalignment value. In such a case the argument will be +// passed using the type matching Size. +llvm::Type *SystemZABIInfo::getFPArgumentType(QualType Ty, + uint64_t Size) const { if (IsSoftFloatABI) -return false; +return nullptr; if (const BuiltinType *BT = Ty->getAs()) switch (BT->getKind()) { +case BuiltinType::Float16: + if (Size == 16) +return llvm::Type::getHalfTy(getVMContext()); + LLVM_FALLTHROUGH; case BuiltinType::Float: + if (Size == 32) arsenm wrote: Same here. The important part was explicit handling of the types, and that bfloat is separate from half https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -446,10 +448,11 @@ ABIArgInfo SystemZABIInfo::classifyArgumentType(QualType Ty) const { // The structure is passed as an unextended integer, a float, or a double. if (isFPArgumentType(SingleElementTy)) { - assert(Size == 32 || Size == 64); + assert(Size == 16 || Size == 32 || Size == 64); return ABIArgInfo::getDirect( - Size == 32 ? llvm::Type::getFloatTy(getVMContext()) - : llvm::Type::getDoubleTy(getVMContext())); + Size == 16 ? llvm::Type::getHalfTy(getVMContext()) JonPsson1 wrote: ok - patch updated to do this mapping in a direct way. It was actually more beneficial perhaps than merely guarding against bfloat, as the Size can reflect an overalignment in which case the (for historical reasons I beleive) type actually changes to match the alignment. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
JonPsson1 wrote: Patch rebased and updated. -- Saw two failing compiler-rt tests now: ``` Failed Tests (2): Builtins-s390x-linux :: clear_cache_test.c Builtins-s390x-linux :: enable_execute_stack_test.c ``` clear_cache_test.c contains a call to the compiler-rt clear_cache() function. From README.txt: ``` // __clear_cache() is used to tell process that new instructions have been // written to an address range. Necessary on processors that do not have // a unified instruction and data cache. void __clear_cache(void* start, void* end); // __enable_execute_stack() is used with nested functions when a trampoline // function is written onto the stack and that page range needs to be made // executable. void __enable_execute_stack(void* addr); ``` I see a lot of target specific pre-processing in clear_cache.c that either disables this to a no-op, or does something target specific. The test itself this is disabled for some targets. Seems reasonable to leave this as unsupported and disable the test (for now?). Also disabling enable_execute_stack_test.c. -- The 'bool HalfArgsAndReturns' Clang flag seems to desipite the name only be relevant for __fp16, so it is kept as the default 'false'. Added a comment next to it. -- Handling of FP-args types / sizes updated per review. This was a bit more involved than I first thought as over-alignments actually imply a greater size and type. -- Added tests for overaligned (a4 / a16) _Float16 in systemz-abi.c. @uweigand: patch ready for a review now. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -446,10 +448,11 @@ ABIArgInfo SystemZABIInfo::classifyArgumentType(QualType Ty) const { // The structure is passed as an unextended integer, a float, or a double. if (isFPArgumentType(SingleElementTy)) { - assert(Size == 32 || Size == 64); + assert(Size == 16 || Size == 32 || Size == 64); return ABIArgInfo::getDirect( - Size == 32 ? llvm::Type::getFloatTy(getVMContext()) - : llvm::Type::getDoubleTy(getVMContext())); + Size == 16 ? llvm::Type::getHalfTy(getVMContext()) arsenm wrote: I don't think it matters if it's reachable or not, should write future proof code and avoid repeating this pattern https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
JonPsson1 wrote: Thanks for help - I think I found the way to enable the building of these functions - patch updated. I could now (for the first time? :D ) compile and run a program on SystemZ with _Float16 variables, by using `--rtlib=compiler-rt `with clang. As I am not the expert on FP semantics, I wonder if anyone could confirm that these routines are safe and correct to use as far as FP exceptions, rounding modes, (..?) goes? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -446,10 +448,11 @@ ABIArgInfo SystemZABIInfo::classifyArgumentType(QualType Ty) const { // The structure is passed as an unextended integer, a float, or a double. if (isFPArgumentType(SingleElementTy)) { - assert(Size == 32 || Size == 64); + assert(Size == 16 || Size == 32 || Size == 64); return ABIArgInfo::getDirect( - Size == 32 ? llvm::Type::getFloatTy(getVMContext()) - : llvm::Type::getDoubleTy(getVMContext())); + Size == 16 ? llvm::Type::getHalfTy(getVMContext()) JonPsson1 wrote: bfloat is not enabled on SystemZ so shouldn't this be safe? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -446,10 +448,11 @@ ABIArgInfo SystemZABIInfo::classifyArgumentType(QualType Ty) const { // The structure is passed as an unextended integer, a float, or a double. if (isFPArgumentType(SingleElementTy)) { - assert(Size == 32 || Size == 64); + assert(Size == 16 || Size == 32 || Size == 64); return ABIArgInfo::getDirect( - Size == 32 ? llvm::Type::getFloatTy(getVMContext()) - : llvm::Type::getDoubleTy(getVMContext())); + Size == 16 ? llvm::Type::getHalfTy(getVMContext()) arsenm wrote: This will break on bfloat, don't assume type size -> fp type https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
efriedma-quic wrote: I think you need to do some target-specific stuff to get the builtins library to build... see, for example, https://reviews.llvm.org/D42958 . https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
JonPsson1 wrote: > -DLLVM_RUNTIME_TARGETS Thanks. If I use cmake with -DLLVM_RUNTIME_TARGETS=systemz, and then rebuild, I see one more file being built: ``` ninja [6/6] Linking CXX shared library lib/clang/20/lib/s390x-unknown-linux-gnu/libclang_rt.asan.so ``` However it does not seem to be quite the file that I need to run my program: ``` ./bin/clang -target s390x-linux-gnu -march=z16 ./test4.c -O3 -o ./a.out --rtlib=compiler-rt /usr/bin/ld: cannot find /home/ijonpan/llvm-project/build/lib/clang/20/lib/s390x-unknown-linux-gnu/libclang_rt.builtins.a: No such file or directory clang: error: linker command failed with exit code 1 (use -v to see invocation) ``` Even so, I don't understand why this wouldn't follow if I enable compiler-rt as a project... https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
efriedma-quic wrote: I think you're looking for -DLLVM_RUNTIME_TARGETS ? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
JonPsson1 wrote: I was hoping these added lines would enable the conversion functions, but it doesn't seem to work: ``` + if (TT.isSystemZ()) { +setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2"); +setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2"); + } clang -target s390x-linux-gnu -march=z16 ./test.c -O3 -o ./a.out --rtlib=compiler-rt /usr/bin/ld: cannot find /home/ijonpan/llvm-project/build/lib/clang/20/lib/s390x-unknown-linux-gnu/libclang_rt.builtins.a: No such file or directory clang: error: linker command failed with exit code 1 (use -v to see invocation) ``` There is something like COMPILER_RT_HAS_${arch}_FLOAT16 in compiler-rt/lib/builtins/CMakeLists.txt, but I can't find anyplace to add s390x to the targets that will build libclang_rt.builtins.a. I have configured with -DLLVM_ENABLE_PROJECTS="clang;compiler-rt" Would anyone know off-hand how to make this work? Thanks. @nikic @phoebewang @kito-cheng @tgross35 https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
JonPsson1 wrote: Spill f16 using float instructions into 4-byte stack slots: - Seems to work to use a RegInfoByHwMode to reset the SpillSize for FP16 to 32 bits. By using two HwMode:s, the spill size can still be 16 bits with vector support. - Using new LE16/STE16 opcodes seems easier than extracting/inserting subregs in storeRegToStackSlot() / loadRegFromStackSlot() via FP32 regs, although that could also work. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
JonPsson1 wrote: >> Without vector support, it might make sense to have load/store pseudos with >> an extra GPR def operand, so that these loads/stores can be expanded as a >> PostRA pseudo. Then it would only need handling in one place, but OTOH >> having a second explicit def operand is also undesired, maybe. > I don't think we need to spend much effort optimizing for pre-z13 machines at > this point. This wouldn't be an optimization, rather just handling the shift + insert sequences needed for older machines in one place, instead of as of now in both lowerLoadF16(), and in loadRegFromStackSlot() (and similarly for stores). > Also, looks like the clang-format check is complaining a bit ... Fixed. Inline-assembly support for half with tests added. Implemented bitcast i16<->f16 with vector support but letting it fall back to store+load bitcasting for older archs. This type of bitcasting is used with inline-assembly. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -1883,6 +1931,10 @@ void SystemZInstrInfo::getLoadStoreOpcodes(const TargetRegisterClass *RC, } else if (RC == &SystemZ::FP128BitRegClass) { LoadOpcode = SystemZ::LX; StoreOpcode = SystemZ::STX; + } else if (RC == &SystemZ::FP16BitRegClass || + RC == &SystemZ::VR16BitRegClass) { +LoadOpcode = SystemZ::VL16; uweigand wrote: Ah OK. That's probably fine then. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -1883,6 +1931,10 @@ void SystemZInstrInfo::getLoadStoreOpcodes(const TargetRegisterClass *RC, } else if (RC == &SystemZ::FP128BitRegClass) { LoadOpcode = SystemZ::LX; StoreOpcode = SystemZ::STX; + } else if (RC == &SystemZ::FP16BitRegClass || + RC == &SystemZ::VR16BitRegClass) { +LoadOpcode = SystemZ::VL16; JonPsson1 wrote: I was assuming this would work, as FP16 should be a sub-registerclass of VR16 - all FP16 regs are also contained in VR16. I see in the generated file: ``` static const TargetRegisterClass *const FP16BitSuperclasses[] = { &SystemZ::VR16BitRegClass, nullptr }; ``` Given that, and that -verify-machineinstrs is used in spill-half-01.mir, it seems ok to me. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -1883,6 +1931,10 @@ void SystemZInstrInfo::getLoadStoreOpcodes(const TargetRegisterClass *RC, } else if (RC == &SystemZ::FP128BitRegClass) { LoadOpcode = SystemZ::LX; StoreOpcode = SystemZ::STX; + } else if (RC == &SystemZ::FP16BitRegClass || + RC == &SystemZ::VR16BitRegClass) { +LoadOpcode = SystemZ::VL16; uweigand wrote: What I meant is that `VL16` is defined as having a `v16hb` return value ``` def VL16 : UnaryAliasVRX; ``` and `v16hb` is defined to live in `VR16` - *not* `FP16`: ``` def v16hb : TypedReg; ``` So it seems to me forcing a `FP16` register into the instruction, even if it might seem to do what we want, is not really allowed and might e.g. trigger strict MI checking failures ... https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -1883,6 +1931,10 @@ void SystemZInstrInfo::getLoadStoreOpcodes(const TargetRegisterClass *RC, } else if (RC == &SystemZ::FP128BitRegClass) { LoadOpcode = SystemZ::LX; StoreOpcode = SystemZ::STX; + } else if (RC == &SystemZ::FP16BitRegClass || + RC == &SystemZ::VR16BitRegClass) { +LoadOpcode = SystemZ::VL16; JonPsson1 wrote: Yes, for instance in spill-half-01.mir. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -47,8 +49,11 @@ def LDR : UnaryRR <"ldr", 0x28, null_frag, FP64, FP64>; def LXR : UnaryRRE<"lxr", 0xB365, null_frag, FP128, FP128>; // For z13 we prefer LDR over LER to avoid partial register dependencies. -let isCodeGenOnly = 1 in - def LDR32 : UnaryRR<"ldr", 0x28, null_frag, FP32, FP32>; +let isCodeGenOnly = 1 in { + def LER16 : UnaryRR <"ler", 0x38, null_frag, FP16, FP16>; uweigand wrote: Formatting is a bit odd here. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -1883,6 +1931,10 @@ void SystemZInstrInfo::getLoadStoreOpcodes(const TargetRegisterClass *RC, } else if (RC == &SystemZ::FP128BitRegClass) { LoadOpcode = SystemZ::LX; StoreOpcode = SystemZ::STX; + } else if (RC == &SystemZ::FP16BitRegClass || + RC == &SystemZ::VR16BitRegClass) { +LoadOpcode = SystemZ::VL16; uweigand wrote: Hmm. Do these even work on FP16? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -513,11 +514,26 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM, } // Handle floating-point types. + // Promote all f16 operations to float, with some exceptions below. + for (unsigned Opc = 0; Opc < ISD::BUILTIN_OP_END; ++Opc) +setOperationAction(Opc, MVT::f16, Promote); + setOperationAction(ISD::ConstantFP, MVT::f16, Expand); + for (MVT VT : {MVT::f32, MVT::f64, MVT::f128}) { +setLoadExtAction(ISD::EXTLOAD, VT, MVT::f16, Expand); +setTruncStoreAction(VT, MVT::f16, Expand); +setOperationAction(ISD::FP_EXTEND, VT, Custom); +setOperationAction(ISD::STRICT_FP_EXTEND, VT, Custom); uweigand wrote: I don't have a strong opinion on that. How you do it currently looks fine to me. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
uweigand wrote: > Improved handling to utilize vector instructions when present. Thanks! >New VR16 regclass, but v8f16 _not_ legal. It might make sense to have it as a >legal type and e.g. do VL;VST when moving vectors in memory, and also set all >vector ops to "Expand". Not sure how trivial that change would be, given some >special handlings of vector nodes, so not done as of now: only the scalar f16 >is legal. Agreed. I don't think we need vector f16 at this point. > Without vector support, it might make sense to have load/store pseudos with > an extra GPR def operand, so that these loads/stores can be expanded as a > PostRA pseudo. Then it would only need handling in one place, but OTOH having > a second explicit def operand is also undesired, maybe. I don't think we need to spend much effort optimizing for pre-z13 machines at this point. > f16 immediates handled like f32: > > * Basic support added for fp 0.0/-0.0 and generation of vector constants > (which should always work btw given their size with vrepih). > > * Single-lane vector instructions like WFLCSB not used for fp16 (yet), > even though it should be possible to add _16 variants. Doesn't seem > important, so skipping. The sign-operations are the only instructions we even could semantically use with f16, right? We certainly could do so, but I agree it's probably not important. > Should fp16 inline asm operands also be supported at this point? Good point. I think so, yes. Also, looks like the clang-format check is complaining a bit ... https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
JonPsson1 wrote: Improved handling to utilize vector instructions when present. New VR16 regclass, but v8f16 *not* legal. It might make sense to have it as a legal type and e.g. do VL;VST when moving vectors in memory, and also set all vector ops to "Expand". Not sure how trivial that change would be, given some special handlings of vector nodes, so not done as of now: only the scalar f16 is legal. Seems to work fine to add "16" versions for loads, stores and lzer/lcdfr in case of vector support. Without vector support, it might make sense to have load/store pseudos with an extra GPR def operand, so that these loads/stores can be expanded as a PostRA pseudo. Then it would only need handling in one place, but OTOH having a second explicit def operand is also undesired, maybe. f16 immediates handled like f32: - Basic support added for fp 0.0/-0.0 and generation of vector constants (which should always work btw given their size with vrepih). - Single-lane vector instructions like WFLCSB not used for fp16 (yet), even though it should be possible to add _16 variants. Doesn't seem important, so skipping. Should fp16 inline asm operands also be supported at this point? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -513,11 +514,26 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM, } // Handle floating-point types. + // Promote all f16 operations to float, with some exceptions below. + for (unsigned Opc = 0; Opc < ISD::BUILTIN_OP_END; ++Opc) +setOperationAction(Opc, MVT::f16, Promote); + setOperationAction(ISD::ConstantFP, MVT::f16, Expand); + for (MVT VT : {MVT::f32, MVT::f64, MVT::f128}) { +setLoadExtAction(ISD::EXTLOAD, VT, MVT::f16, Expand); +setTruncStoreAction(VT, MVT::f16, Expand); +setOperationAction(ISD::FP_EXTEND, VT, Custom); +setOperationAction(ISD::STRICT_FP_EXTEND, VT, Custom); JonPsson1 wrote: Moved them down but not sure if they should be together (as they are treated the same), or if we should maintain the separation of the STRICT operations (-> move FP_EXTEND up) https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -255,4 +255,9 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) { } setLibcallName(RTLIB::MULO_I128, nullptr); } + + if (TT.isSystemZ()) { +setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2"); +setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2"); + } tgross35 wrote: Regarding how to build and link, they are in compiler-rt if that can be built https://github.com/llvm/llvm-project/blob/fa22100d57631bbb0a507dd27e3ebb24b1354623/compiler-rt/lib/builtins/truncsfhf2.c#L15. `__trunc` and `__extend` are what you want to emit here, I'm just not sure what exactly this file needs to do because it seems like `HasLegalHalfType` controls `__extend`/`__trunc` vs. `__gnu_` lowering somehow https://github.com/llvm/llvm-project/pull/109164#issuecomment-2433525551. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -102,6 +102,7 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM, addRegisterClass(MVT::i32, &SystemZ::GR32BitRegClass); addRegisterClass(MVT::i64, &SystemZ::GR64BitRegClass); if (!useSoftFloat()) { +addRegisterClass(MVT::f16, &SystemZ::FP16BitRegClass); uweigand wrote: If it's not too hard, I think it might be good. Not so much because of performance, but also because it makes the code look more similar to what we do for the other types. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -513,11 +514,26 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM, } // Handle floating-point types. + // Promote all f16 operations to float, with some exceptions below. + for (unsigned Opc = 0; Opc < ISD::BUILTIN_OP_END; ++Opc) +setOperationAction(Opc, MVT::f16, Promote); + setOperationAction(ISD::ConstantFP, MVT::f16, Expand); + for (MVT VT : {MVT::f32, MVT::f64, MVT::f128}) { +setLoadExtAction(ISD::EXTLOAD, VT, MVT::f16, Expand); +setTruncStoreAction(VT, MVT::f16, Expand); +setOperationAction(ISD::FP_EXTEND, VT, Custom); +setOperationAction(ISD::STRICT_FP_EXTEND, VT, Custom); uweigand wrote: Minor nit, but those two should probably be done in the loop below. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -513,11 +514,37 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM, } // Handle floating-point types. + // Promote all f16 operations to float, with some exceptions below. + for (unsigned Opc = 0; Opc < ISD::BUILTIN_OP_END; ++Opc) +setOperationAction(Opc, MVT::f16, Promote); + setOperationAction(ISD::ConstantFP, MVT::f16, Expand); + for (MVT VT : {MVT::f32, MVT::f64, MVT::f128}) { +setLoadExtAction(ISD::EXTLOAD, VT, MVT::f16, Expand); +setTruncStoreAction(VT, MVT::f16, Expand); + } + setOperationAction(ISD::LOAD, MVT::f16, Custom); + setOperationAction(ISD::ATOMIC_LOAD, MVT::f16, Custom); + setOperationAction(ISD::STORE, MVT::f16, Custom); + setOperationAction(ISD::ATOMIC_STORE, MVT::f16, Custom); + setOperationAction(ISD::FP_ROUND, MVT::f16, Custom); + setOperationAction(ISD::FP_EXTEND, MVT::f32, Custom); + setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom); + setOperationAction(ISD::FP_EXTEND, MVT::f128, Custom); + for (unsigned I = MVT::FIRST_FP_VALUETYPE; I <= MVT::LAST_FP_VALUETYPE; ++I) { MVT VT = MVT::SimpleValueType(I); if (isTypeLegal(VT)) { + // No special instructions for these. + setOperationAction(ISD::FSIN, VT, Expand); + setOperationAction(ISD::FCOS, VT, Expand); + setOperationAction(ISD::FSINCOS, VT, Expand); + setOperationAction(ISD::FREM, VT, Expand); + setOperationAction(ISD::FPOW, VT, Expand); uweigand wrote: Shouldn't these be Promote just like all the other f16 operations? Expand triggers a libcall, which doesn't match the excess-precision setting - also, we actually don't have f16 libcalls in libm ... https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -102,6 +102,7 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM, addRegisterClass(MVT::i32, &SystemZ::GR32BitRegClass); addRegisterClass(MVT::i64, &SystemZ::GR64BitRegClass); if (!useSoftFloat()) { +addRegisterClass(MVT::f16, &SystemZ::FP16BitRegClass); uweigand wrote: Do we want to add a VR16BitRegClass if we have vector support? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -255,4 +255,9 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) { } setLibcallName(RTLIB::MULO_I128, nullptr); } + + if (TT.isSystemZ()) { +setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2"); +setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2"); + } JonPsson1 wrote: you may be right - as I wrote above I have not really tried this before. Would you happen to know how to build and link these? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -513,11 +514,37 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM, } // Handle floating-point types. + // Promote all f16 operations to float, with some exceptions below. + for (unsigned Opc = 0; Opc < ISD::BUILTIN_OP_END; ++Opc) +setOperationAction(Opc, MVT::f16, Promote); + setOperationAction(ISD::ConstantFP, MVT::f16, Expand); + for (MVT VT : {MVT::f32, MVT::f64, MVT::f128}) { +setLoadExtAction(ISD::EXTLOAD, VT, MVT::f16, Expand); +setTruncStoreAction(VT, MVT::f16, Expand); + } + setOperationAction(ISD::LOAD, MVT::f16, Custom); + setOperationAction(ISD::ATOMIC_LOAD, MVT::f16, Custom); + setOperationAction(ISD::STORE, MVT::f16, Custom); + setOperationAction(ISD::ATOMIC_STORE, MVT::f16, Custom); + setOperationAction(ISD::FP_ROUND, MVT::f16, Custom); + setOperationAction(ISD::FP_EXTEND, MVT::f32, Custom); + setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom); + setOperationAction(ISD::FP_EXTEND, MVT::f128, Custom); + for (unsigned I = MVT::FIRST_FP_VALUETYPE; I <= MVT::LAST_FP_VALUETYPE; ++I) { MVT VT = MVT::SimpleValueType(I); if (isTypeLegal(VT)) { + // No special instructions for these. + setOperationAction(ISD::FSIN, VT, Expand); + setOperationAction(ISD::FCOS, VT, Expand); + setOperationAction(ISD::FSINCOS, VT, Expand); + setOperationAction(ISD::FREM, VT, Expand); + setOperationAction(ISD::FPOW, VT, Expand); tgross35 wrote: Just crosslinking that there is an effort to add f16 libcalls https://github.com/llvm/llvm-project/issues/95250 but I have no clue what the plan is as far as lowering to them. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -255,4 +255,9 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) { } setLibcallName(RTLIB::MULO_I128, nullptr); } + + if (TT.isSystemZ()) { +setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2"); +setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2"); + } tgross35 wrote: Hm, I see they default to the `__gnu_` functions in this file. Some targets (wasm, hexagon) manually set it to `__extendhfsf2` and `__truncsfhf2` in `*SelfLowering.cpp` but why do targets like x86 correctly lower to these as well without an override either in this file or in selflowering? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -255,4 +255,9 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) { } setLibcallName(RTLIB::MULO_I128, nullptr); } + + if (TT.isSystemZ()) { +setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2"); +setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2"); + } tgross35 wrote: Why do these names need to be set, aren't these the default? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
JonPsson1 wrote: Updated per review. - strict fp-round / fp-extend added with tests. - math functions like fsin promoted instead of expanded to non-existing fsinh. - conversion functions from e.g. f16 -> f64 used instead of separate steps. - __fp16 argument/return values removed (and tests in systemz-abi.c removed). - docs/LanguageExtensions: SystemZ added as supporting _Float16. Note on compiler-rt: not sure how to build llvm conversion functions and link them (have not tried this yet), but added the mapping in RuntimeLibcalls.cpp. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -91,11 +91,28 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo { "-v128:64-a:8:16-n32:64"); } MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128; + +// True if the backend supports operations on the half LLVM IR type. +// By setting this to false, conversions will happen for _Float16 around +// a statement by default, with operations done in float. However, if +// -ffloat16-excess-precision=none is given, no conversions will be made +// and instead the backend will promote each half operation to float +// individually. +HasLegalHalfType = false; +// Allow half arguments and return values (__fp16). +HalfArgsAndReturns = true; JonPsson1 wrote: ok removed. Interesting that this is left out on purpose even though it would be simple to allow... https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
https://github.com/JonPsson1 edited https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -6285,6 +6465,16 @@ SDValue SystemZTargetLowering::LowerOperation(SDValue Op, return lowerAddrSpaceCast(Op, DAG); case ISD::ROTL: return lowerShift(Op, DAG, SystemZISD::VROTL_BY_SCALAR); + case ISD::FP_EXTEND: +//case ISD::STRICT_FP_EXTEND: JonPsson1 wrote: fixed https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -513,11 +514,37 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM, } // Handle floating-point types. + // Promote all f16 operations to float, with some exceptions below. + for (unsigned Opc = 0; Opc < ISD::BUILTIN_OP_END; ++Opc) +setOperationAction(Opc, MVT::f16, Promote); + setOperationAction(ISD::ConstantFP, MVT::f16, Expand); + for (MVT VT : {MVT::f32, MVT::f64, MVT::f128}) { +setLoadExtAction(ISD::EXTLOAD, VT, MVT::f16, Expand); +setTruncStoreAction(VT, MVT::f16, Expand); + } + setOperationAction(ISD::LOAD, MVT::f16, Custom); + setOperationAction(ISD::ATOMIC_LOAD, MVT::f16, Custom); + setOperationAction(ISD::STORE, MVT::f16, Custom); + setOperationAction(ISD::ATOMIC_STORE, MVT::f16, Custom); + setOperationAction(ISD::FP_ROUND, MVT::f16, Custom); + setOperationAction(ISD::FP_EXTEND, MVT::f32, Custom); + setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom); + setOperationAction(ISD::FP_EXTEND, MVT::f128, Custom); + for (unsigned I = MVT::FIRST_FP_VALUETYPE; I <= MVT::LAST_FP_VALUETYPE; ++I) { MVT VT = MVT::SimpleValueType(I); if (isTypeLegal(VT)) { + // No special instructions for these. + setOperationAction(ISD::FSIN, VT, Expand); + setOperationAction(ISD::FCOS, VT, Expand); + setOperationAction(ISD::FSINCOS, VT, Expand); + setOperationAction(ISD::FREM, VT, Expand); + setOperationAction(ISD::FPOW, VT, Expand); JonPsson1 wrote: I tried Promote but ended up with an f32 operation that was not handled further. Did not find anything like "promote and expand/re-visit", unfortunately, so I did a manual "custom-promotion" and then common-code revisited the new f32 node. As there is no support for these libcalls for f16, it seems only the corresponding llvm intrinsics need to be handled, I would assume. I started with this, but not quite sure if this is the right way or how many of these (all?) should be handled here. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -6108,6 +6160,133 @@ static SDValue lowerAddrSpaceCast(SDValue Op, SelectionDAG &DAG) { return Op; } +SDValue SystemZTargetLowering::LowerFP_EXTEND(SDValue Op, + SelectionDAG &DAG) const { + bool IsStrict = Op->isStrictFPOpcode(); + SDValue In = Op.getOperand(IsStrict ? 1 : 0); + MVT VT = Op.getSimpleValueType(); + MVT SVT = In.getSimpleValueType(); + if (SVT != MVT::f16) +return Op; + + SDLoc DL(Op); + SDValue Chain = IsStrict ? Op.getOperand(0) : SDValue(); + + // Need a libcall. XXX factor out (below) + TargetLowering::CallLoweringInfo CLI(DAG); + Chain = IsStrict ? Op.getOperand(0) : DAG.getEntryNode(); + TargetLowering::ArgListTy Args; + TargetLowering::ArgListEntry Entry; + Entry.Node = In; + Entry.Ty = EVT(SVT).getTypeForEVT(*DAG.getContext()); + Args.push_back(Entry); + SDValue Callee = DAG.getExternalSymbol( +getLibcallName(RTLIB::FPEXT_F16_F32), getPointerTy(DAG.getDataLayout())); + CLI.setDebugLoc(DL).setChain(Chain).setLibCallee( +CallingConv::C, EVT(MVT::f32).getTypeForEVT(*DAG.getContext()), Callee, +std::move(Args)); + SDValue Res; + std::tie(Res,Chain) = LowerCallTo(CLI); + if (IsStrict) +Res = DAG.getMergeValues({Res, Chain}, DL); + + return DAG.getNode(ISD::FP_EXTEND, DL, VT, Res); +} + +SDValue SystemZTargetLowering::LowerFP_ROUND(SDValue Op, + SelectionDAG &DAG) const { + bool IsStrict = Op->isStrictFPOpcode(); + SDValue In = Op.getOperand(IsStrict ? 1 : 0); + MVT VT = Op.getSimpleValueType(); + MVT SVT = In.getSimpleValueType(); + if (VT != MVT::f16) +return SDValue(); // XXX? + + SDLoc DL(Op); + SDValue Chain = IsStrict ? Op.getOperand(0) : SDValue(); + + if (SVT != MVT::f32) { +SDValue Rnd = DAG.getIntPtrConstant(0, DL, /*isTarget=*/true); +In = DAG.getNode(ISD::FP_ROUND, DL, MVT::f32, In, Rnd); + } JonPsson1 wrote: ok - seems to work fine to let LegalizeDAG expand these conversions also given that those different versions are available. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -6108,6 +6160,133 @@ static SDValue lowerAddrSpaceCast(SDValue Op, SelectionDAG &DAG) { return Op; } +SDValue SystemZTargetLowering::LowerFP_EXTEND(SDValue Op, + SelectionDAG &DAG) const { + bool IsStrict = Op->isStrictFPOpcode(); + SDValue In = Op.getOperand(IsStrict ? 1 : 0); + MVT VT = Op.getSimpleValueType(); + MVT SVT = In.getSimpleValueType(); + if (SVT != MVT::f16) +return Op; + + SDLoc DL(Op); + SDValue Chain = IsStrict ? Op.getOperand(0) : SDValue(); + + // Need a libcall. XXX factor out (below) JonPsson1 wrote: huh - you're right. ExpandNode() fails but it fall-through to ConvertNodeToLibcall(). https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -102,6 +102,7 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM, addRegisterClass(MVT::i32, &SystemZ::GR32BitRegClass); addRegisterClass(MVT::i64, &SystemZ::GR64BitRegClass); if (!useSoftFloat()) { +addRegisterClass(MVT::f16, &SystemZ::FP16BitRegClass); JonPsson1 wrote: So far I have a few points as "todo" for vector support: - add a VR16 reg class - use VLE/VSTE for reload/spill - Use generated vector constants. It wouldn't be too hard to add this, but not sure if it's relevant given all the slow conversion function calls. On the other hand, even if it's super-slow, maybe a noticeable speedup might have some value..? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
https://github.com/JonPsson1 edited https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
https://github.com/uweigand commented: Not a full review, but some general comments inline. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
https://github.com/JonPsson1 edited https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -6285,6 +6465,16 @@ SDValue SystemZTargetLowering::LowerOperation(SDValue Op, return lowerAddrSpaceCast(Op, DAG); case ISD::ROTL: return lowerShift(Op, DAG, SystemZISD::VROTL_BY_SCALAR); + case ISD::FP_EXTEND: +//case ISD::STRICT_FP_EXTEND: uweigand wrote: We should support the strict variants here as well. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
https://github.com/uweigand edited https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -91,11 +91,28 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo { "-v128:64-a:8:16-n32:64"); } MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128; + +// True if the backend supports operations on the half LLVM IR type. +// By setting this to false, conversions will happen for _Float16 around +// a statement by default, with operations done in float. However, if +// -ffloat16-excess-precision=none is given, no conversions will be made +// and instead the backend will promote each half operation to float +// individually. +HasLegalHalfType = false; +// Allow half arguments and return values (__fp16). +HalfArgsAndReturns = true; uweigand wrote: I don't think we should support this, this looks like a ARM-only feature. It's false on Intel as well. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -6108,6 +6160,133 @@ static SDValue lowerAddrSpaceCast(SDValue Op, SelectionDAG &DAG) { return Op; } +SDValue SystemZTargetLowering::LowerFP_EXTEND(SDValue Op, + SelectionDAG &DAG) const { + bool IsStrict = Op->isStrictFPOpcode(); + SDValue In = Op.getOperand(IsStrict ? 1 : 0); + MVT VT = Op.getSimpleValueType(); + MVT SVT = In.getSimpleValueType(); + if (SVT != MVT::f16) +return Op; + + SDLoc DL(Op); + SDValue Chain = IsStrict ? Op.getOperand(0) : SDValue(); + + // Need a libcall. XXX factor out (below) uweigand wrote: Can't we just return SDValue() here so that common call falls back to Expand (which in this case generates a libcall)? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -6108,6 +6160,133 @@ static SDValue lowerAddrSpaceCast(SDValue Op, SelectionDAG &DAG) { return Op; } +SDValue SystemZTargetLowering::LowerFP_EXTEND(SDValue Op, + SelectionDAG &DAG) const { + bool IsStrict = Op->isStrictFPOpcode(); + SDValue In = Op.getOperand(IsStrict ? 1 : 0); + MVT VT = Op.getSimpleValueType(); + MVT SVT = In.getSimpleValueType(); + if (SVT != MVT::f16) +return Op; + + SDLoc DL(Op); + SDValue Chain = IsStrict ? Op.getOperand(0) : SDValue(); + + // Need a libcall. XXX factor out (below) + TargetLowering::CallLoweringInfo CLI(DAG); + Chain = IsStrict ? Op.getOperand(0) : DAG.getEntryNode(); + TargetLowering::ArgListTy Args; + TargetLowering::ArgListEntry Entry; + Entry.Node = In; + Entry.Ty = EVT(SVT).getTypeForEVT(*DAG.getContext()); + Args.push_back(Entry); + SDValue Callee = DAG.getExternalSymbol( +getLibcallName(RTLIB::FPEXT_F16_F32), getPointerTy(DAG.getDataLayout())); + CLI.setDebugLoc(DL).setChain(Chain).setLibCallee( +CallingConv::C, EVT(MVT::f32).getTypeForEVT(*DAG.getContext()), Callee, +std::move(Args)); + SDValue Res; + std::tie(Res,Chain) = LowerCallTo(CLI); + if (IsStrict) +Res = DAG.getMergeValues({Res, Chain}, DL); + + return DAG.getNode(ISD::FP_EXTEND, DL, VT, Res); +} + +SDValue SystemZTargetLowering::LowerFP_ROUND(SDValue Op, + SelectionDAG &DAG) const { + bool IsStrict = Op->isStrictFPOpcode(); + SDValue In = Op.getOperand(IsStrict ? 1 : 0); + MVT VT = Op.getSimpleValueType(); + MVT SVT = In.getSimpleValueType(); + if (VT != MVT::f16) +return SDValue(); // XXX? + + SDLoc DL(Op); + SDValue Chain = IsStrict ? Op.getOperand(0) : SDValue(); + + if (SVT != MVT::f32) { +SDValue Rnd = DAG.getIntPtrConstant(0, DL, /*isTarget=*/true); +In = DAG.getNode(ISD::FP_ROUND, DL, MVT::f32, In, Rnd); + } uweigand wrote: The double rounding may yield an incorrect result. I think GCC just uses different libcalls for all source types. Another alternative might be to use the "round to prepare for shorter precision" mode for the first operation. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
@@ -185,6 +185,8 @@ bool SystemZABIInfo::isFPArgumentType(QualType Ty) const { if (const BuiltinType *BT = Ty->getAs()) switch (BT->getKind()) { +case BuiltinType::Half: // __fp16 uweigand wrote: Then I think we shouldn't have this either. https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
JonPsson1 wrote: Patch improved further: - Atomic memops handled. - Spill/reload Handled in loadRegFromStackSlot() and storeRegToStackSlot(). VRegs can be used here which makes it straightforward, but special sequences needed (without using VSTE/VLE). - __fp16: HalfArgsAndReturns=true => __fp16 arguments allowed. Tests added. - f16 vectors: Tests added. All seems to work. - strict fp: Again the question of conversion functions: IDS::STRICT_FP_ROUND/STRICT_FP_EXTEND needs to be lowered to something, but not sure if that requires special treatment, or if the same conversion functions can be used. Maybe wait with strict fp16? https://github.com/llvm/llvm-project/pull/109164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits