Re: [Qemu-devel] [PATCH v4 13/22] fpu/softfloat: re-factor mul

2018-02-19 Thread Alex Bennée

Peter Maydell  writes:

> On 6 February 2018 at 16:48, Alex Bennée  wrote:
>> We can now add float16_mul and use the common decompose and
>> canonicalize functions to have a single implementation for
>> float16/32/64 versions.
>>
>> Signed-off-by: Alex Bennée 
>> Signed-off-by: Richard Henderson 
>>
>> ---
>> v3
>
>> +/*
>> + * Returns the result of multiplying the floating-point values `a' and
>> + * `b'. The operation is performed according to the IEC/IEEE Standard
>> + * for Binary Floating-Point Arithmetic.
>> + */
>> +
>> +static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)
>> +{
>> +bool sign = a.sign ^ b.sign;
>> +
>> +if (a.cls == float_class_normal && b.cls == float_class_normal) {
>> +uint64_t hi, lo;
>> +int exp = a.exp + b.exp;
>> +
>> +mul64To128(a.frac, b.frac, &hi, &lo);
>
> It seems a shame that we previously were able to use a
> 32x32->64 multiply for the float32 case, and now we have to
> do an expensive 64x64->128 multiply regardless...

Actually for mul the hit isn't too bad. When we do a div however you do
notice a bit of a gulf:

 https://i.imgur.com/KMWceo8.png

We could start passing &floatN_params to the functions much like the
sqrt function and be a bit smarter when we do our multiply and let the
compiler figure it out as we go.

Another avenue worth exploring is ensuring we use native Int128 support
where we can so these wide operations can use wide registers where
available.

However both of these things for future optimisations given it doesn't
show up in dbt-bench timings.

>
> Regardless
> Reviewed-by: Peter Maydell 
>
> thanks
> -- PMM


--
Alex Bennée



Re: [Qemu-devel] [PATCH v4 13/22] fpu/softfloat: re-factor mul

2018-02-13 Thread Richard Henderson
On 02/13/2018 07:20 AM, Peter Maydell wrote:
>> +static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)
>> +{
>> +bool sign = a.sign ^ b.sign;
>> +
>> +if (a.cls == float_class_normal && b.cls == float_class_normal) {
>> +uint64_t hi, lo;
>> +int exp = a.exp + b.exp;
>> +
>> +mul64To128(a.frac, b.frac, &hi, &lo);
> 
> It seems a shame that we previously were able to use a
> 32x32->64 multiply for the float32 case, and now we have to
> do an expensive 64x64->128 multiply regardless...

To be fair, I've proposed two different solutions addressing that -- c++
templates and glibc macros -- and you like neither.  Is there a third
alternative that does not involve code duplication?


r~



Re: [Qemu-devel] [PATCH v4 13/22] fpu/softfloat: re-factor mul

2018-02-13 Thread Peter Maydell
On 6 February 2018 at 16:48, Alex Bennée  wrote:
> We can now add float16_mul and use the common decompose and
> canonicalize functions to have a single implementation for
> float16/32/64 versions.
>
> Signed-off-by: Alex Bennée 
> Signed-off-by: Richard Henderson 
>
> ---
> v3

> +/*
> + * Returns the result of multiplying the floating-point values `a' and
> + * `b'. The operation is performed according to the IEC/IEEE Standard
> + * for Binary Floating-Point Arithmetic.
> + */
> +
> +static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)
> +{
> +bool sign = a.sign ^ b.sign;
> +
> +if (a.cls == float_class_normal && b.cls == float_class_normal) {
> +uint64_t hi, lo;
> +int exp = a.exp + b.exp;
> +
> +mul64To128(a.frac, b.frac, &hi, &lo);

It seems a shame that we previously were able to use a
32x32->64 multiply for the float32 case, and now we have to
do an expensive 64x64->128 multiply regardless...

Regardless
Reviewed-by: Peter Maydell 

thanks
-- PMM