[Bug libgcc/66382] POWER8 Vector optimized implementation of __float128 (IEEE754 128-bit Binary Floating Point)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66382 --- Comment #7 from Michael Meissner --- I do not think it is worthwhile to expand the IEEE 128-bit software emulation routines at the point of call in ISA 2.07 (power8). This is due to the fact that a lot of processing goes on in the emulation library. Now if people were motivated, we could replace the soft-fp functions with tuned versions written for the Power8 ISA. However, I don't think it is going to be an easy task. I'll list the things that come to mind first. They might be able to be done, but it is a time/effort calculation of whether the return on investment is worth it. The first issue is the next ISA (3.0) has support in it already for doing IEEE 128-bit floating point in hardware, including supporting the various rounding modes, etc. If you configure and build gcc with an assembler that supports the ISA 3.0 instruction set, it will add in IFUNC support so that when a program is run on ISA 3.0 hardware, it will automatically use a version of __addkf3 that uses the xsaddqp instruction instead of doing the emulation. So this effort would only be for the current generation of hardware. The next issue is right now you cannot do 64-bit scalar int arithmetic in the VSX unit. At present, the compiler does not allow DImode into the Altivec registers, but the only support for 64-bit integer arithmetic uses an Altivec encoding and only uses Altivec registers (vaddudm, etc.). I am working on patches to allow DImode variables in Altivec registers (and later SImode, HImode, QImode). My first run with the patch shows 1 benchmark 2% faster (perlbench) and one 2% slower (omnetpp), but I feel it needs a lot more tuning. At the moment, that work is lower priority, as I am trying to make __float128 _Complex work as my highest priority (obviously other people could take up this work). After allowing DImode into Altivec registers, and perhaps doing 64-bit arithmetic via the 64-bit integer vector instructions, another issue is that the cycle time of the vector unit is 1/2 that of the GPR unit, so it will need a lot of tuning. I don't think the ISA 2.07 instruction set is general enough to do inserts and extracts of 128-bit values that you would need for packing and unpacking the IEEE 128-bit floating point values. ISA 3.0 has all of this support, including specialized instructions to extract/set the exponent or mantissa, but then it also has the hardware support for IEEE 128-bit floating point. Load/stores are also problematical in ISA 2.07, given there is no d-form addressing for Altivec registers. So if you spill a DImode value in an Altivec register, you need to load up the offset in a GPR to do the memory operation.
[Bug libgcc/66382] POWER8 Vector optimized implementation of __float128 (IEEE754 128-bit Binary Floating Point)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66382 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #6 from Segher Boessenkool --- Given the following program: __float128 f(__float128 a, __float128 b) { return a*b; } When compiled with: gcc -Wall -W -O2 mulf.c -mfloat128 -mcpu=power8 you currently get (boring stuff cut out): mflr 0 std 0,16(1) stdu 1,-32(1) bl __mulkf3 nop addi 1,1,32 ld 0,16(1) mtlr 0 blr The task is to either optimise __mulkf3 to use vector math, or to expand it inline even, where that make sense (this may more often make sense with -ffast-math).
[Bug libgcc/66382] POWER8 Vector optimized implementation of __float128 (IEEE754 128-bit Binary Floating Point)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66382 Steven Munroe changed: What|Removed |Added CC||meissner at gcc dot gnu.org --- Comment #5 from Steven Munroe --- With GCC6.0 we have basis support for the __float128 type and libgcc has soft-fp implementation for __float128. This uses vector registers for __float128 parameter passing, but is implemented as 64-bit Integer internally. This is functional but not optimized. Mike Meissner can help you with the gcc configurations
[Bug libgcc/66382] POWER8 Vector optimized implementation of __float128 (IEEE754 128-bit Binary Floating Point)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66382 --- Comment #4 from joseph at codesourcery dot com --- On Mon, 14 Mar 2016, dan.parrot at mail dot com wrote: > However, I am still unable to get gcc to compile a very simple program > when passed the -msoft-float option. Here is the program (test.c) For 32-bit, -msoft-float is a distinct ABI from -mhard-float, and requires its own copies of libgcc, libc, etc.; you can't just build and run a soft-float program on a hard-float system without building all those separate libraries and arranging for the program to find them at link time and run time. For 64-bit, -msoft-float is not supported, and hard-float is always available. (__float128 might use software or hardware floating point - software before POWER9, hardware with POWER9 - but that's controlled by other options.)
[Bug libgcc/66382] POWER8 Vector optimized implementation of __float128 (IEEE754 128-bit Binary Floating Point)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66382 --- Comment #3 from dan.parrot at mail dot com --- I am trying to configure gcc so that instead of generating instructions that use the hardware floating point unit, it will generate instructions that utilize integer operations to emulate floating point operations. The description of the bug by David Edelsohn on Bugzilla implies that such emulation is currently available and would be made more efficient if integer operations are replaced by vector operations. It is this replacement that I'm trying to effect. However, I am still unable to get gcc to compile a very simple program when passed the -msoft-float option. Here is the program (test.c) == #include #include #include int main() { printf("\n No. of bytes in a long double is %d.\n", sizeof(long double)); long double x = 5.55L; long double y = -5.56L; long double z = x + y; long double w = x - y; printf("\n Sum : \n x = %Lf \n y = %Lf \n z = %Lf \n", x, y, z); printf("\n Diff. : \n x = %Lf \n y = %Lf \n w = %llf \n", x, y, w); return (int)(z + w); } === Without the -msoft-float flag to gcc, it prints the expected result. With the -msoft-float flag, linking fails. My questions is this: Can you successfully compile the simple program above in any version of gcc? If yes, what is the output of "gcc -v"? If not, does it mean there is in fact no software emulation available? Thanks. Dan. On 03/14/2016 03:43 PM, munroesj at us dot ibm.com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66382 > > --- Comment #2 from Steven Munroe --- > What is the issue? You want to configure __float128 without also configuring > altivec/VMX/VSX? > > The PowerPC 64-bit ABI is defined to pass __float128 values in 128-bit vector > registers and return _float128 values in VR2. > > How parameters are passed is independent of the computation. >
[Bug libgcc/66382] POWER8 Vector optimized implementation of __float128 (IEEE754 128-bit Binary Floating Point)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66382 --- Comment #2 from Steven Munroe --- What is the issue? You want to configure __float128 without also configuring altivec/VMX/VSX? The PowerPC 64-bit ABI is defined to pass __float128 values in 128-bit vector registers and return _float128 values in VR2. How parameters are passed is independent of the computation.
[Bug libgcc/66382] POWER8 Vector optimized implementation of __float128 (IEEE754 128-bit Binary Floating Point)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66382 dan.parrot at mail dot com changed: What|Removed |Added CC||dan.parrot at mail dot com --- Comment #1 from dan.parrot at mail dot com --- Could you provide the options to ./configure which build gcc while allowing flag -msoft-float and type __float128 to co-exist? Or is changing the code to allow that combination part of the task here? I have been unable to configure a build that accepts them (-msoft-float and __float128) together.