Re: [PATCH v2 0/7] Add LoongArch v1.1 instructions

WANG Xuerui Sat, 06 Dec 2025 10:41:17 -0800

Hi,

On 11/26/25 11:50, gaosong wrote:

[snip]
I run this test with qemu on x86  and loongarch machine.
but the results is not same.
on x86
gaosong@fedora:/home1/gaosong/work/clean/qemu$ ./build/qemu-loongarch64-cpu max test
  frecip: 0.333333
frecipe: 0.333333
frsqrt: 0.577350
frsqrte: 0.577350
SC.Q passed
on Loongson-3C6000/D
[root@localhost gs]# ./test
frecip: 0.333333
frecipe: 0.333332
frsqrt: 0.577350
frsqrte: 0.577345
test: test.c:49: test_sc_q: Assertion `res == 0' failed.
Aborted (core dumped)
1. The results from frecipe/frsqrte differ from those on the physicalmachine. Is this due to precision issues? Should we align with the physical precision? Or can we disregardthis discrepancy?

The problem is that Loongson never published the exact algorithm used inLA664 micro-architecture regarding frecipe/frsqrte. Of course it's plainimpossible to match hardware behavior without the info. I remembertrying the famous fast inverse square root algorithm in Quake III butthe results didn't match frsqrte behavior, and I didn't investigate further.

I just googled again and found [1] though, where someone has figured outthe operations of x86 RSQRTSS; I don't have time to test it againstLoongArch myself, unfortunately, but anyone interested can have a try...

[1]:https://stackoverflow.com/questions/58614226/is-there-a-c-function-that-returns-exactly-the-value-of-the-built-in-cpu-opera

Re: [PATCH v2 0/7] Add LoongArch v1.1 instructions

Reply via email to