Hi,

On 11/26/25 11:50, gaosong wrote:
[snip]
I run this test with qemu on x86  and loongarch machine.
but the results is not same.
on x86
gaosong@fedora:/home1/gaosong/work/clean/qemu$ ./build/qemu-loongarch64 -cpu max test
  frecip: 0.333333
frecipe: 0.333333
frsqrt: 0.577350
frsqrte: 0.577350
SC.Q passed

on Loongson-3C6000/D
[root@localhost gs]# ./test
frecip: 0.333333
frecipe: 0.333332
frsqrt: 0.577350
frsqrte: 0.577345
test: test.c:49: test_sc_q: Assertion `res == 0' failed.
Aborted (core dumped)

1. The results from frecipe/frsqrte differ from those on the physical machine. Is this due to precision issues?    Should we align with the physical precision? Or can we disregard this discrepancy?

The problem is that Loongson never published the exact algorithm used in LA664 micro-architecture regarding frecipe/frsqrte. Of course it's plain impossible to match hardware behavior without the info. I remember trying the famous fast inverse square root algorithm in Quake III but the results didn't match frsqrte behavior, and I didn't investigate further.

I just googled again and found [1] though, where someone has figured out the operations of x86 RSQRTSS; I don't have time to test it against LoongArch myself, unfortunately, but anyone interested can have a try...

[1]: https://stackoverflow.com/questions/58614226/is-there-a-c-function-that-returns-exactly-the-value-of-the-built-in-cpu-opera

Reply via email to