Hi, While trying conversion of single precision float value to half precision value for ARM, it seems the code generates incorrect values in some of the scenarios :
"inline uint32_t perform_round16(iss_info *iss, uint32_t sign, int16_t exp, uint32_t frac, FPRounding rounding)" [Case 1] 1. From ARM specs overflow_to_inf is true and result is an overflow condition. if N != 16 || fpcr.AHP == '0' then // Single, double or IEEE half precision if biased_exp >= 2^E - 1 then result = if overflow_to_inf then FPInfinity(sign) else FPMaxNormal(sign); FPProcessException(FPExc_Overflow, fpcr); error = 1.0; // Ensure that an Inexact exception occurs In qemu, we always return the value as : >> return packFloat16(zSign, 0x1f, 0); In case overflow_to_inf is false we need to return FPMaxNormal which is : >> return float_num16(sign, 0x1e, 0x3ff); [Case 2] 1. From ARM specs : if round_up then int_mant = int_mant + 1; if int_mant == 2^F then // Rounded up from denormalized to normalized biased_exp = 1; if int_mant == 2^(F+1) then // Rounded up to next exponent biased_exp = biased_exp + 1; int_mant = int_mant DIV 2; result = sign : biased_exp<N-F-2:0> : int_mant<F-1:0>; [QEMU] if (exp < -10) { return float_num16(sign, 0, 0); } The incremented round up value seems to be lost in this scenario. Kindly, let me know in case more data points are required. Thanks, Gaurav