Peter Maydell <peter.mayd...@linaro.org> writes: > On 6 February 2018 at 16:48, Alex Bennée <alex.ben...@linaro.org> wrote: >> We can now add float16_mul and use the common decompose and >> canonicalize functions to have a single implementation for >> float16/32/64 versions. >> >> Signed-off-by: Alex Bennée <alex.ben...@linaro.org> >> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> >> >> --- >> v3 > >> +/* >> + * Returns the result of multiplying the floating-point values `a' and >> + * `b'. The operation is performed according to the IEC/IEEE Standard >> + * for Binary Floating-Point Arithmetic. >> + */ >> + >> +static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s) >> +{ >> + bool sign = a.sign ^ b.sign; >> + >> + if (a.cls == float_class_normal && b.cls == float_class_normal) { >> + uint64_t hi, lo; >> + int exp = a.exp + b.exp; >> + >> + mul64To128(a.frac, b.frac, &hi, &lo); > > It seems a shame that we previously were able to use a > 32x32->64 multiply for the float32 case, and now we have to > do an expensive 64x64->128 multiply regardless...

Actually for mul the hit isn't too bad. When we do a div however you do notice a bit of a gulf: https://i.imgur.com/KMWceo8.png We could start passing &floatN_params to the functions much like the sqrt function and be a bit smarter when we do our multiply and let the compiler figure it out as we go. Another avenue worth exploring is ensuring we use native Int128 support where we can so these wide operations can use wide registers where available. However both of these things for future optimisations given it doesn't show up in dbt-bench timings. > > Regardless > Reviewed-by: Peter Maydell <peter.mayd...@linaro.org> > > thanks > -- PMM -- Alex Bennée