arm: Fix big-endian handling for NEON and SVE gdb remote debugging

Vacha Bhavsar Tue, 22 Jul 2025 11:03:09 -0700

Upon examining the current implementation for getting/setting SIMD
and SVE registers via remote GDB, there is a concern about mixed
endian support. This patch series aims to address this concern and
allow getting and setting the values of NEON and SVE registers via
remote GDB regardless of the target endianness.


Consider the following snippet from a GDB session in which a SIMD
register's value is set via remote GDB where the QEMU host is little
endian and the target is big endian:

(gdb) p/x $v0
$1 = {d = {f = {0x0, 0x0}, u = {0x0, 0x0}, s = {0x0, 0x0}}, s = {f =
{0x0, 0x0, 0x0, 0x0}, u = {0x0, 0x0, 0x0, 0x0}, s = {0x0, 0x0,
0x0, 0x0}}, h = {bf = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0}, f = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, u = {
         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, s = {0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0}}, b = {u = {0x0 <repeats 16 times>},
s = {0x0 <repeats 16 times>}}, q = {u = {0x0}, s = {0x0}}}
(gdb) set $v0.d.u[0] = 0x010203
(gdb) p/x $v0
$2 = {d = {f = {0x302010000000000, 0x0}, u = {0x302010000000000, 0x0},
s = {0x302010000000000, 0x0}}, s = {f = {0x3020100, 0x0, 0x0,
0x0}, u = {0x3020100, 0x0, 0x0, 0x0}, s = {0x3020100, 0x0, 0x0,
0x0}}, h = {bf = {0x302, 0x100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
f = {0x302, 0x100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, u = {0x302,
0x100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, s = {0x302, 0x100, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0}}, b = {u = {0x3, 0x2, 0x1, 0x0
<repeats 13 times>}, s = {0x3, 0x2, 0x1, 0x0 <repeats 13 times>}},
q = {u = {0x3020100000000000000000000000000}, s = {
      0x3020100000000000000000000000000}}}

The above snippet exemplifies an issue with how the SIMD register value
is set when the target endianness differs from the host endianness. A
similar issue is evident when setting SVE registers, as is shown by the
snippet below where the QEMU host is little endian and the target is big
endian:

(gdb) p/x $z0
$1 = {q = {u = {0x0 <repeats 16 times>}, s = {0x0 <repeats 16 times>}},
d = {f = {0x0 <repeats 32 times>}, u = {0x0 <repeats 32 times>},
s = {0x0 <repeats 32 times>}}, s = {f = {0x0 <repeats 64 times>},
u = {0x0 <repeats 64 times>}, s = {0x0 <repeats 64 times>}},
h = {f = {0x0 <repeats 128 times>}, u = {0x0 <repeats 128 times>},
s = {0x0 <repeats 128 times>}}, b = {u = {0x0 <repeats 256 times>},
s = {0x0 <repeats 256 times>}}}
(gdb) set $z0.q.u[0] = 0x010203
(gdb) p/x $z0
$2 = {q = {u = {0x302010000000000, 0x0 <repeats 15 times>}, s =
{0x302010000000000, 0x0 <repeats 15 times>}}, d = {f = {0x0,
0x302010000000000, 0x0 <repeats 30 times>}, u = {0x0, 0x302010000000000,
0x0 <repeats 30 times>}, s = {0x0, 0x302010000000000,
0x0 <repeats 30 times>}}, s = {f = {0x0, 0x0, 0x3020100, 0x0
<repeats 61 times>}, u = {0x0, 0x0, 0x3020100, 0x0 <repeats 61 times>},
s = {0x0, 0x0, 0x3020100, 0x0 <repeats 61 times>}}, h = {f = {0x0, 0x0,
0x0, 0x0, 0x302, 0x100, 0x0 <repeats 122 times>}, u = {0x0, 0x0, 0x0,
      0x0, 0x302, 0x100, 0x0 <repeats 122 times>}, s = {0x0, 0x0, 0x0, 0x0,
0x302, 0x100, 0x0 <repeats 122 times>}}, b = {u = {0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x3, 0x2, 0x1, 0x0 <repeats 245 times>}, s = {0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x2, 0x1, 0x0 <repeats 245 times>
}}}

Note, in the case of SVE, this issue is also present when the host
and target are both little endian. Consider the GDB remote session snippet below
showcasing this:

(gdb) p/x $z0
$6 = {q = {u = {0x0 <repeats 16 times>}, s = {0x0 <repeats 16 times>}}, d =
{f = {0x0 <repeats 32 times>}, u = {0x0 <repeats 32 times>}, s = {
      0x0 <repeats 32 times>}}, s = {f = {0x0 <repeats 64 times>}, u = {
      0x0 <repeats 64 times>}, s = {0x0 <repeats 64 times>}}, h = {f = {
      0x0 <repeats 128 times>}, u = {0x0 <repeats 128 times>}, s = {
      0x0 <repeats 128 times>}}, b = {u = {0x0 <repeats 256 times>}, s = {
      0x0 <repeats 256 times>}}}
(gdb) set $z0.q.u[0] = 0x010203
(gdb) p/x $z0
$7 = {q = {u = {0x102030000000000000000, 0x0 <repeats 15 times>}, s = {
      0x102030000000000000000, 0x0 <repeats 15 times>}}, d = {f = {0x0,
0x10203,
      0x0 <repeats 30 times>}, u = {0x0, 0x10203, 0x0 <repeats 30 times>},
s = {0x0,
      0x10203, 0x0 <repeats 30 times>}}, s = {f = {0x0, 0x0, 0x10203,
      0x0 <repeats 61 times>}, u = {0x0, 0x0, 0x10203, 0x0
<repeats 61 times>}, s = {0x0, 0x0, 0x10203, 0x0 <repeats 61 times>}},
h = {f = {0x0, 0x0, 0x0, 0x0, 0x203, 0x1, 0x0 <repeats 122 times>},
u = {0x0, 0x0, 0x0, 0x0, 0x203, 0x1, 0x0 <repeats 122 times>},
s = {0x0, 0x0, 0x0, 0x0, 0x203, 0x1, 0x0 <repeats 122 times>}},
b = {u = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x2, 0x1,
0x0 <repeats 245 times>}, s = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x3, 0x2, 0x1, 0x0 <repeats 245 times>}}}

In all scenarios, the value returning on getting the register after setting it
to 0x010203 is not preserved in appropriate byte order and hence does not
print 0x010203 as expected.

The current implementation for the SIMD functionality for getting and setting
registers via the gdbstub is implemented as follows:

aarch64_gdb_set_fpu_reg:
<omitted code>
uint64_t *q = aa64_vfp_qreg(env, reg);
q[0] = ldq_le_p(buf);
q[1] = ldq_le_p(buf + 8);
return 16;
<omitted code>

The following code is a suggested fix for the current implementation that
should allow for mixed endian support for getting/setting SIMD registers
via the remote GDB protocol.

aarch64_gdb_set_fpu_reg:
<omitted code>
// case 0...31
uint64_t *q = aa64_vfp_qreg(env, reg);
if (target_big_endian()){
        q[1] = ldq_p(buf);
        q[0] = ldq_p(buf + 8);
}
else{
        q[0] = ldq_p(buf);
        q[1] = ldq_p(buf + 8);
}
return 16;
<omitted code>

This use of ldq_p rather than ldq_le_p (which the current implementation
uses) to load bytes into host endian struct is inspired by the current
implementation for getting/setting general purpose registers via remote
GDB (which works appropriately regardless of target endianness), as well
as the current implementation for getting/setting gprs via GDB with ppc
as a target (refer to ppc_cpu_gdb_write_register() for example). Note the
the order of setting q[0] and q[1] is suggested to be swapped for big
endian targets to ensure that q[1] always holds the most significant half
and q[0] always holds the least significant half (refer to the comment
in target/arm/cpu.h, line 155).

For SVE, the current implementation is as follows for the zregs:

aarch64_gdb_set_sve_reg:
<omitted code>
// case 0...31
int vq, len = 0;
uint64_t *p = (uint64_t *) buf;
for (vq = 0; vq < cpu->sve_max_vq; vq++) {
env->vfp.zregs[reg].d[vq * 2 + 1] = *p++;
        env->vfp.zregs[reg].d[vq * 2] = *p++;
        len += 16;
}
return len;

The suggestion here is similar to the one above for SIMD, that ldq_p
should be used rather than simple pointer dereferencing. This suggestion
aims to allow the QEMU gdbstub to support getting/setting register values
correctly regardless of the target endianness. This suggestion aims to
yield results such as the following from a remote GDB session, regardless
of target endianness:

(gdb) p/x $z0
$1 = {q = {u = {0x0 <repeats 16 times>}, s = {0x0 <repeats 16 times>}},
d = {f = {0x0 <repeats 32 times>}, u = {0x0 <repeats 32 times>},
s = {0x0 <repeats 32 times>}}, s = {f = {0x0 <repeats 64 times>},
u = {0x0 <repeats 64 times>}, s = {0x0 <repeats 64 times>}}, h = {f = {
      0x0 <repeats 128 times>}, u = {0x0 <repeats 128 times>}, s = {
      0x0 <repeats 128 times>}}, b = {u = {0x0 <repeats 256 times>}, s = {
      0x0 <repeats 256 times>}}}
(gdb) set $z0.q.u[0] = 0x010203
(gdb) p/x $z0
$2 = {q = {u = {0x10203, 0x0 <repeats 15 times>}, s = {0x10203,
      0x0 <repeats 15 times>}}, d = {f = {0x10203, 0x0
<repeats 31 times>},u = {0x10203, 0x0 <repeats 31 times>},
s = {0x10203, 0x0 <repeats 31 times>}}, s = {f = {0x10203,
0x0 <repeats 63 times>}, u = {0x10203, 0x0 <repeats 63 times>},
s = {0x10203, 0x0 <repeats 63 times>}}, h = {f = {0x203, 0x1,
0x0 <repeats 126 times>}, u = {0x203, 0x1, 0x0 <repeats 126 times>},
s = {0x203, 0x1, 0x0 <repeats 126 times>}}, b = {u = {0x3, 0x2, 0x1,
0x0 <repeats 253 times>}, s = {0x3, 0x2, 0x1, 0x0 <repeats 253 times>}}}

The first patch will implement this change for NEON registers,
and the second patch will do so for SVE registers.

Glenn Miles (12):
  ppc/xive2: Fix calculation of END queue sizes
  ppc/xive2: Use fair irq target search algorithm
  ppc/xive2: Fix irq preempted by lower priority group irq
  ppc/xive2: Fix treatment of PIPR in CPPR update
  pnv/xive2: Support ESB Escalation
  ppc/xive2: add interrupt priority configuration flags
  ppc/xive2: Support redistribution of group interrupts
  ppc/xive: Add more interrupt notification tracing
  ppc/xive2: Improve pool regs variable name
  ppc/xive2: Implement "Ack OS IRQ to even report line" TIMA op
  ppc/xive2: Redistribute group interrupt precluded by CPPR update
  ppc/xive2: redistribute irqs for pool and phys ctx pull

Michael Kowal (4):
  ppc/xive2: Remote VSDs need to match on forwarding address
  ppc/xive2: Reset Generation Flipped bit on END Cache Watch
  pnv/xive2: Print value in invalid register write logging
  pnv/xive2: Permit valid writes to VC/PC Flush Control registers

Nicholas Piggin (34):
  ppc/xive: Fix xive trace event output
  ppc/xive: Report access size in XIVE TM operation error logs
  ppc/xive2: fix context push calculation of IPB priority
  ppc/xive: Fix PHYS NSR ring matching
  ppc/xive2: Do not present group interrupt on OS-push if precluded by
    CPPR
  ppc/xive2: Set CPPR delivery should account for group priority
  ppc/xive: tctx_notify should clear the precluded interrupt
  ppc/xive: Explicitly zero NSR after accepting
  ppc/xive: Move NSR decoding into helper functions
  ppc/xive: Fix pulling pool and phys contexts
  pnv/xive2: VC_ENDC_WATCH_SPEC regs should read back WATCH_FULL
  ppc/xive: Change presenter .match_nvt to match not present
  ppc/xive2: Redistribute group interrupt preempted by higher priority
    interrupt
  ppc/xive: Add xive_tctx_pipr_present() to present new interrupt
  ppc/xive: Fix high prio group interrupt being preempted by low prio VP
  ppc/xive: Split xive recompute from IPB function
  ppc/xive: tctx signaling registers rework
  ppc/xive: tctx_accept only lower irq line if an interrupt was
    presented
  ppc/xive: Add xive_tctx_pipr_set() helper function
  ppc/xive2: split tctx presentation processing from set CPPR
  ppc/xive2: Consolidate presentation processing in context push
  ppc/xive2: Avoid needless interrupt re-check on CPPR set
  ppc/xive: Assert group interrupts were redistributed
  ppc/xive2: implement NVP context save restore for POOL ring
  ppc/xive2: Prevent pulling of pool context losing phys interrupt
  ppc/xive: Redistribute phys after pulling of pool context
  ppc/xive: Check TIMA operations validity
  ppc/xive2: Implement pool context push TIMA op
  ppc/xive2: redistribute group interrupts on context push
  ppc/xive2: Implement set_os_pending TIMA op
  ppc/xive2: Implement POOL LGS push TIMA op
  ppc/xive2: Implement PHYS ring VP push TIMA op
  ppc/xive: Split need_resend into restore_nvp
  ppc/xive2: Enable lower level contexts on VP push

Vacha Bhavsar (2):
  This patch adds big endian support for NEON GDB remote debugging. It
    replaces the use of ldq_le_p() with the use of ldq_p().
    Additionally, it checks the target endianness to ensure the most
    significant bits are always in second element.
  This patch adds big endian support for SVE GDB remote debugging. It
    replaces the use of pointer dereferencing with the use of ldq_p().
    Additionally, it checks the target endianness to ensure the most
    significant bits are always in second element.

-- 
2.34.1

[PATCH v2 0/2] target/arm: Fix big-endian handling for NEON and SVE gdb remote debugging

Reply via email to