Hi,
On 11/26/25 11:50, gaosong wrote:
[snip]
I run this test with qemu on x86 and loongarch machine.
but the results is not same.
on x86
gaosong@fedora:/home1/gaosong/work/clean/qemu$ ./build/qemu-loongarch64
-cpu max test
frecip: 0.333333
frecipe: 0.333333
frsqrt: 0.577350
frsqrte: 0.577350
SC.Q passed
on Loongson-3C6000/D
[root@localhost gs]# ./test
frecip: 0.333333
frecipe: 0.333332
frsqrt: 0.577350
frsqrte: 0.577345
test: test.c:49: test_sc_q: Assertion `res == 0' failed.
Aborted (core dumped)
1. The results from frecipe/frsqrte differ from those on the physical
machine. Is this due to precision issues?
Should we align with the physical precision? Or can we disregard
this discrepancy?
The problem is that Loongson never published the exact algorithm used in
LA664 micro-architecture regarding frecipe/frsqrte. Of course it's plain
impossible to match hardware behavior without the info. I remember
trying the famous fast inverse square root algorithm in Quake III but
the results didn't match frsqrte behavior, and I didn't investigate further.
I just googled again and found [1] though, where someone has figured out
the operations of x86 RSQRTSS; I don't have time to test it against
LoongArch myself, unfortunately, but anyone interested can have a try...
[1]:
https://stackoverflow.com/questions/58614226/is-there-a-c-function-that-returns-exactly-the-value-of-the-built-in-cpu-opera