Upon examining the current implementation for getting/setting SIMD and SVE registers via remote GDB, there is a concern about mixed endian support. This patch series aims to address this concern and allow getting and setting the values of NEON and SVE registers via remote GDB regardless of the target endianness.
Consider the following snippet from a GDB session in which a SIMD register's value is set via remote GDB where the QEMU host is little endian and the target is big endian: (gdb) p/x $v0 $1 = {d = {f = {0x0, 0x0}, u = {0x0, 0x0}, s = {0x0, 0x0}}, s = {f = {0x0, 0x0, 0x0, 0x0}, u = {0x0, 0x0, 0x0, 0x0}, s = {0x0, 0x0, 0x0, 0x0}}, h = {bf = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, f = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, u = { 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, s = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, b = {u = {0x0 <repeats 16 times>}, s = {0x0 <repeats 16 times>}}, q = {u = {0x0}, s = {0x0}}} (gdb) set $v0.d.u[0] = 0x010203 (gdb) p/x $v0 $2 = {d = {f = {0x302010000000000, 0x0}, u = {0x302010000000000, 0x0}, s = {0x302010000000000, 0x0}}, s = {f = {0x3020100, 0x0, 0x0, 0x0}, u = {0x3020100, 0x0, 0x0, 0x0}, s = {0x3020100, 0x0, 0x0, 0x0}}, h = {bf = {0x302, 0x100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, f = {0x302, 0x100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, u = {0x302, 0x100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, s = {0x302, 0x100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, b = {u = {0x3, 0x2, 0x1, 0x0 <repeats 13 times>}, s = {0x3, 0x2, 0x1, 0x0 <repeats 13 times>}}, q = {u = {0x3020100000000000000000000000000}, s = { 0x3020100000000000000000000000000}}} The above snippet exemplifies an issue with how the SIMD register value is set when the target endianness differs from the host endianness. A similar issue is evident when setting SVE registers, as is shown by the snippet below where the QEMU host is little endian and the target is big endian: (gdb) p/x $z0 $1 = {q = {u = {0x0 <repeats 16 times>}, s = {0x0 <repeats 16 times>}}, d = {f = {0x0 <repeats 32 times>}, u = {0x0 <repeats 32 times>}, s = {0x0 <repeats 32 times>}}, s = {f = {0x0 <repeats 64 times>}, u = {0x0 <repeats 64 times>}, s = {0x0 <repeats 64 times>}}, h = {f = {0x0 <repeats 128 times>}, u = {0x0 <repeats 128 times>}, s = {0x0 <repeats 128 times>}}, b = {u = {0x0 <repeats 256 times>}, s = {0x0 <repeats 256 times>}}} (gdb) set $z0.q.u[0] = 0x010203 (gdb) p/x $z0 $2 = {q = {u = {0x302010000000000, 0x0 <repeats 15 times>}, s = {0x302010000000000, 0x0 <repeats 15 times>}}, d = {f = {0x0, 0x302010000000000, 0x0 <repeats 30 times>}, u = {0x0, 0x302010000000000, 0x0 <repeats 30 times>}, s = {0x0, 0x302010000000000, 0x0 <repeats 30 times>}}, s = {f = {0x0, 0x0, 0x3020100, 0x0 <repeats 61 times>}, u = {0x0, 0x0, 0x3020100, 0x0 <repeats 61 times>}, s = {0x0, 0x0, 0x3020100, 0x0 <repeats 61 times>}}, h = {f = {0x0, 0x0, 0x0, 0x0, 0x302, 0x100, 0x0 <repeats 122 times>}, u = {0x0, 0x0, 0x0, 0x0, 0x302, 0x100, 0x0 <repeats 122 times>}, s = {0x0, 0x0, 0x0, 0x0, 0x302, 0x100, 0x0 <repeats 122 times>}}, b = {u = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x2, 0x1, 0x0 <repeats 245 times>}, s = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x2, 0x1, 0x0 <repeats 245 times> }}} Note, in the case of SVE, this issue is also present when the host and target are both little endian. Consider the GDB remote session snippet below showcasing this: (gdb) p/x $z0 $6 = {q = {u = {0x0 <repeats 16 times>}, s = {0x0 <repeats 16 times>}}, d = {f = {0x0 <repeats 32 times>}, u = {0x0 <repeats 32 times>}, s = { 0x0 <repeats 32 times>}}, s = {f = {0x0 <repeats 64 times>}, u = { 0x0 <repeats 64 times>}, s = {0x0 <repeats 64 times>}}, h = {f = { 0x0 <repeats 128 times>}, u = {0x0 <repeats 128 times>}, s = { 0x0 <repeats 128 times>}}, b = {u = {0x0 <repeats 256 times>}, s = { 0x0 <repeats 256 times>}}} (gdb) set $z0.q.u[0] = 0x010203 (gdb) p/x $z0 $7 = {q = {u = {0x102030000000000000000, 0x0 <repeats 15 times>}, s = { 0x102030000000000000000, 0x0 <repeats 15 times>}}, d = {f = {0x0, 0x10203, 0x0 <repeats 30 times>}, u = {0x0, 0x10203, 0x0 <repeats 30 times>}, s = {0x0, 0x10203, 0x0 <repeats 30 times>}}, s = {f = {0x0, 0x0, 0x10203, 0x0 <repeats 61 times>}, u = {0x0, 0x0, 0x10203, 0x0 <repeats 61 times>}, s = {0x0, 0x0, 0x10203, 0x0 <repeats 61 times>}}, h = {f = {0x0, 0x0, 0x0, 0x0, 0x203, 0x1, 0x0 <repeats 122 times>}, u = {0x0, 0x0, 0x0, 0x0, 0x203, 0x1, 0x0 <repeats 122 times>}, s = {0x0, 0x0, 0x0, 0x0, 0x203, 0x1, 0x0 <repeats 122 times>}}, b = {u = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x2, 0x1, 0x0 <repeats 245 times>}, s = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x2, 0x1, 0x0 <repeats 245 times>}}} In all scenarios, the value returning on getting the register after setting it to 0x010203 is not preserved in appropriate byte order and hence does not print 0x010203 as expected. The current implementation for the SIMD functionality for getting and setting registers via the gdbstub is implemented as follows: aarch64_gdb_set_fpu_reg: <omitted code> uint64_t *q = aa64_vfp_qreg(env, reg); q[0] = ldq_le_p(buf); q[1] = ldq_le_p(buf + 8); return 16; <omitted code> The following code is a suggested fix for the current implementation that should allow for mixed endian support for getting/setting SIMD registers via the remote GDB protocol. aarch64_gdb_set_fpu_reg: <omitted code> // case 0...31 uint64_t *q = aa64_vfp_qreg(env, reg); if (target_big_endian()){ q[1] = ldq_p(buf); q[0] = ldq_p(buf + 8); } else{ q[0] = ldq_p(buf); q[1] = ldq_p(buf + 8); } return 16; <omitted code> This use of ldq_p rather than ldq_le_p (which the current implementation uses) to load bytes into host endian struct is inspired by the current implementation for getting/setting general purpose registers via remote GDB (which works appropriately regardless of target endianness), as well as the current implementation for getting/setting gprs via GDB with ppc as a target (refer to ppc_cpu_gdb_write_register() for example). Note the the order of setting q[0] and q[1] is suggested to be swapped for big endian targets to ensure that q[1] always holds the most significant half and q[0] always holds the least significant half (refer to the comment in target/arm/cpu.h, line 155). For SVE, the current implementation is as follows for the zregs: aarch64_gdb_set_sve_reg: <omitted code> // case 0...31 int vq, len = 0; uint64_t *p = (uint64_t *) buf; for (vq = 0; vq < cpu->sve_max_vq; vq++) { env->vfp.zregs[reg].d[vq * 2 + 1] = *p++; env->vfp.zregs[reg].d[vq * 2] = *p++; len += 16; } return len; The suggestion here is similar to the one above for SIMD, that ldq_p should be used rather than simple pointer dereferencing. This suggestion aims to allow the QEMU gdbstub to support getting/setting register values correctly regardless of the target endianness. This suggestion aims to yield results such as the following from a remote GDB session, regardless of target endianness: (gdb) p/x $z0 $1 = {q = {u = {0x0 <repeats 16 times>}, s = {0x0 <repeats 16 times>}}, d = {f = {0x0 <repeats 32 times>}, u = {0x0 <repeats 32 times>}, s = {0x0 <repeats 32 times>}}, s = {f = {0x0 <repeats 64 times>}, u = {0x0 <repeats 64 times>}, s = {0x0 <repeats 64 times>}}, h = {f = { 0x0 <repeats 128 times>}, u = {0x0 <repeats 128 times>}, s = { 0x0 <repeats 128 times>}}, b = {u = {0x0 <repeats 256 times>}, s = { 0x0 <repeats 256 times>}}} (gdb) set $z0.q.u[0] = 0x010203 (gdb) p/x $z0 $2 = {q = {u = {0x10203, 0x0 <repeats 15 times>}, s = {0x10203, 0x0 <repeats 15 times>}}, d = {f = {0x10203, 0x0 <repeats 31 times>},u = {0x10203, 0x0 <repeats 31 times>}, s = {0x10203, 0x0 <repeats 31 times>}}, s = {f = {0x10203, 0x0 <repeats 63 times>}, u = {0x10203, 0x0 <repeats 63 times>}, s = {0x10203, 0x0 <repeats 63 times>}}, h = {f = {0x203, 0x1, 0x0 <repeats 126 times>}, u = {0x203, 0x1, 0x0 <repeats 126 times>}, s = {0x203, 0x1, 0x0 <repeats 126 times>}}, b = {u = {0x3, 0x2, 0x1, 0x0 <repeats 253 times>}, s = {0x3, 0x2, 0x1, 0x0 <repeats 253 times>}}} The first patch will implement this change for NEON registers, and the second patch will do so for SVE registers. Glenn Miles (12): ppc/xive2: Fix calculation of END queue sizes ppc/xive2: Use fair irq target search algorithm ppc/xive2: Fix irq preempted by lower priority group irq ppc/xive2: Fix treatment of PIPR in CPPR update pnv/xive2: Support ESB Escalation ppc/xive2: add interrupt priority configuration flags ppc/xive2: Support redistribution of group interrupts ppc/xive: Add more interrupt notification tracing ppc/xive2: Improve pool regs variable name ppc/xive2: Implement "Ack OS IRQ to even report line" TIMA op ppc/xive2: Redistribute group interrupt precluded by CPPR update ppc/xive2: redistribute irqs for pool and phys ctx pull Michael Kowal (4): ppc/xive2: Remote VSDs need to match on forwarding address ppc/xive2: Reset Generation Flipped bit on END Cache Watch pnv/xive2: Print value in invalid register write logging pnv/xive2: Permit valid writes to VC/PC Flush Control registers Nicholas Piggin (34): ppc/xive: Fix xive trace event output ppc/xive: Report access size in XIVE TM operation error logs ppc/xive2: fix context push calculation of IPB priority ppc/xive: Fix PHYS NSR ring matching ppc/xive2: Do not present group interrupt on OS-push if precluded by CPPR ppc/xive2: Set CPPR delivery should account for group priority ppc/xive: tctx_notify should clear the precluded interrupt ppc/xive: Explicitly zero NSR after accepting ppc/xive: Move NSR decoding into helper functions ppc/xive: Fix pulling pool and phys contexts pnv/xive2: VC_ENDC_WATCH_SPEC regs should read back WATCH_FULL ppc/xive: Change presenter .match_nvt to match not present ppc/xive2: Redistribute group interrupt preempted by higher priority interrupt ppc/xive: Add xive_tctx_pipr_present() to present new interrupt ppc/xive: Fix high prio group interrupt being preempted by low prio VP ppc/xive: Split xive recompute from IPB function ppc/xive: tctx signaling registers rework ppc/xive: tctx_accept only lower irq line if an interrupt was presented ppc/xive: Add xive_tctx_pipr_set() helper function ppc/xive2: split tctx presentation processing from set CPPR ppc/xive2: Consolidate presentation processing in context push ppc/xive2: Avoid needless interrupt re-check on CPPR set ppc/xive: Assert group interrupts were redistributed ppc/xive2: implement NVP context save restore for POOL ring ppc/xive2: Prevent pulling of pool context losing phys interrupt ppc/xive: Redistribute phys after pulling of pool context ppc/xive: Check TIMA operations validity ppc/xive2: Implement pool context push TIMA op ppc/xive2: redistribute group interrupts on context push ppc/xive2: Implement set_os_pending TIMA op ppc/xive2: Implement POOL LGS push TIMA op ppc/xive2: Implement PHYS ring VP push TIMA op ppc/xive: Split need_resend into restore_nvp ppc/xive2: Enable lower level contexts on VP push Vacha Bhavsar (2): This patch adds big endian support for NEON GDB remote debugging. It replaces the use of ldq_le_p() with the use of ldq_p(). Additionally, it checks the target endianness to ensure the most significant bits are always in second element. This patch adds big endian support for SVE GDB remote debugging. It replaces the use of pointer dereferencing with the use of ldq_p(). Additionally, it checks the target endianness to ensure the most significant bits are always in second element. -- 2.34.1