On Tue, 23 May 2023, Richard Henderson wrote:
On 5/23/23 06:57, BALATON Zoltan wrote:
This solves the softfloat related usages, the rest probably are lower overhead, I could not measure any more improvement with removing asserts on top of this patch. I still have these functions high in my profiling result:

children  self    command          symbol
11.40%    10.86%  qemu-system-ppc  helper_compute_fprf_float64

You might need to dig in with perf here, but my first guess is

#define COMPUTE_CLASS(tp)                                      \
static int tp##_classify(tp arg)                               \
{                                                              \
   int ret = tp##_is_neg(arg) * is_neg;                       \
   if (unlikely(tp##_is_any_nan(arg))) {                      \
       float_status dummy = { };  /* snan_bit_is_one = 0 */   \
       ret |= (tp##_is_signaling_nan(arg, &dummy)             \
               ? is_snan : is_qnan);                          \
   } else if (unlikely(tp##_is_infinity(arg))) {              \
       ret |= is_inf;                                         \
   } else if (tp##_is_zero(arg)) {                            \
       ret |= is_zero;                                        \
   } else if (tp##_is_zero_or_denormal(arg)) {                \
       ret |= is_denormal;                                    \
   } else {                                                   \
       ret |= is_normal;                                      \
   }                                                          \
   return ret;                                                \
}

The tests are poorly ordered, testing many unlikely things before the most likely thing (normal). A better ordering would be

   if (likely(tp##_is_normal(arg))) {
   } else if (tp##_is_zero(arg)) {
   } else if (tp##_is_zero_or_denormal(arg)) {
   } else if (tp##_is_infinity(arg)) {
   } else {
       // nan case
   }

Secondly, we compute the classify bitmask, and then deconstruct the mask again in set_fprf_from_class. Since we don't use the classify bitmask for anything else, better would be to compute the fprf value directly in the if-ladder.

Thanks for the guidance. Alex, will you make a patch of this too or should I try to figure out how to do that? I'm not sure I understood everything in the above but read only once.

Regards,
BALATON Zoltan

11.25%     0.61%  qemu-system-ppc  helper_fmadds

This is unsurprising, and nothing much that can be done.
All of the work is in muladd doing the arithmetic.

Unrelated to this patch I also started to see random crashes with a DSI on a dcbz instruction now which did not happen before (or not frequently enough for me to notice). I did not bisect that as it happens randomly but I wonder if it could be related to recent unaligned access changes or some other TCG change? Any idea what to check?

No idea.


r~

Reply via email to