在 2025/11/19 下午8:24, Jiajie Chen 写道:
Latest revision of LoongArch ISA is out at
https://www.loongson.cn/uploads/images/2023102309132647981.%E9%BE%99%E8%8A%AF%E6%9E%B6%E6%9E%84%E5%8F%82%E8%80%83%E6%89%8B%E5%86%8C%E5%8D%B7%E4%B8%80_r1p10.pdf
(Chinese only). The revision includes the following updates:
- estimated fp reciporcal instructions: frecip -> frecipe, frsqrt ->
frsqrte
- 128-bit width store-conditional instruction: sc.q
- ll.w/d with acquire semantic: llacq.w/d, sc.w/d with release semantic:
screl.w/d
- compare and swap instructions: amcas[_db].b/w/h/d
- byte and word-wide amswap/add instructions: am{swap/add}[_db].{b/h}
- new definition for dbar hints
- clarify 32-bit division instruction hebavior
- clarify load ordering when accessing the same address
- introduce message signaled interrupt
- introduce hardware page table walker
The new revision is implemented in the Loongson 3A6000 processor.
This patch series implements all the new instructions. The v1 version
can be found at
https://patchew.org/QEMU/[email protected]/.
A simple testcase to test the new fp and sc.q instructions:
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
void test_fp() {
float a = 3.0;
float b;
asm volatile("frecip.s %0, %1" : "=f"(b) : "f"(a));
printf("frecip: %f\n", b);
asm volatile("frecipe.s %0, %1" : "=f"(b) : "f"(a));
printf("frecipe: %f\n", b);
asm volatile("frsqrt.s %0, %1" : "=f"(b) : "f"(a));
printf("frsqrt: %f\n", b);
asm volatile("frsqrte.s %0, %1" : "=f"(b) : "f"(a));
printf("frsqrte: %f\n", b);
}
uint64_t rand64() { return ((uint64_t)rand() << 32) | rand(); }
void test_sc_q() {
__int128 val = rand64();
val = (val << 64) | rand64();
__int128 *ptr = &val;
uint64_t add_lo = rand64();
uint64_t add_hi = rand64();
__int128 add = add_hi;
add = (add << 64) | add_lo;
__int128 expect = val + add;
int res = 0;
asm volatile("ll.d $t1, %1, 0\nld.d $t2, %1, 8\nadd.d $t1, $t1, %2\nadd.d "
"$t2, $t2, %3\nsc.q $t1, $t2, %1\nmove %0, $t1"
: "=r"(res), "+r"(ptr)
: "r"(add_lo), "r"(add_hi)
: "$t1", "$t2", "memory");
assert(res == 1);
assert(val == expect);
// change memory content to make sc fail
res = 1;
asm volatile("ll.d $t1, %1, 0\nld.d $t2, %1, 8\naddi.d $t1, $t1, 1\nst.d "
"$t1, %1, 0\nsc.q $t1, $t2, %1\nmove %0, $t1"
: "=r"(res), "+r"(ptr)
:
: "$t1", "$t2", "memory");
assert(res == 0);
res = 1;
asm volatile("ll.d $t1, %1, 0\nld.d $t2, %1, 8\naddi.d $t2, $t2, 1\nst.d "
"$t2, %1, 8\nsc.q $t1, $t2, %1\nmove %0, $t1"
: "=r"(res), "+r"(ptr)
:
: "$t1", "$t2", "memory");
assert(res == 0);
printf("SC.Q passed\n");
}
int main(int argc, char *argv[]) {
test_fp();
test_sc_q();
return 0;
}
Compile and test by:
loongarch64-linux-gnu-gcc test.c -o test -static && ./qemu-loongarch64 -cpu max
test
Hi,
I run this test with qemu on x86 and loongarch machine.
but the results is not same.
on x86
gaosong@fedora:/home1/gaosong/work/clean/qemu$ ./build/qemu-loongarch64
-cpu max test
frecip: 0.333333
frecipe: 0.333333
frsqrt: 0.577350
frsqrte: 0.577350
SC.Q passed
on Loongson-3C6000/D
[root@localhost gs]# ./test
frecip: 0.333333
frecipe: 0.333332
frsqrt: 0.577350
frsqrte: 0.577345
test: test.c:49: test_sc_q: Assertion `res == 0' failed.
Aborted (core dumped)
1. The results from frecipe/frsqrte differ from those on the physical machine.
Is this due to precision issues?
Should we align with the physical precision? Or can we disregard this
discrepancy?
2. sc.q—I haven't identified the issue yet.
Thanks.
Song Gao
Jiajie Chen (7):
target/loongarch: Require atomics to be aligned
target/loongarch: Add am{swap/add}[_db].{b/h}
target/loongarch: Add amcas[_db].{b/h/w/d}
target/loongarch: Add estimated reciprocal instructions
target/loongarch: Add llacq/screl instructions
target/loongarch: Add sc.q instructions
target/loongarch: Add LA v1.1 instructions to max cpu
target/loongarch/cpu.c | 11 +-
target/loongarch/cpu.h | 7 +
target/loongarch/disas.c | 33 ++++
target/loongarch/insns.decode | 34 ++++
.../tcg/insn_trans/trans_atomic.c.inc | 145 ++++++++++++++++--
.../tcg/insn_trans/trans_farith.c.inc | 4 +
.../tcg/insn_trans/trans_memory.c.inc | 22 +++
.../loongarch/tcg/insn_trans/trans_vec.c.inc | 8 +
target/loongarch/tcg/translate.c | 6 +-
target/loongarch/translate.h | 30 ++--
10 files changed, 280 insertions(+), 20 deletions(-)