On Mon, 10 Mar 2025 at 14:42, Peter Maydell <peter.mayd...@linaro.org> wrote:
>
> On Mon, 10 Mar 2025 at 01:28, Philippe Mathieu-Daudé <phi...@linaro.org> 
> wrote:
> >
> > Hi,
> >
> > This series add support for (async) FIFO on the transmit path
> > of the PL011 UART.
>
> This hasn't made the last pre-softfreeze arm pullreq, but
> I think we can reasonably call "don't do blocking I/O"
> enough of a bugfix for it to be ok to go in early in the
> freeze cycle for rc0.
>
> I've applied it to target-arm.next.

...but it still fails 'make check-functional', though in a
less easy-to-reproduce way than it did. The problem turns out
to be that when the guest kernel is doing its earlycon
output (which is by polling, not interrupt driven) the output
can be corrupted, which makes the aarch64/test_arm_virt test
fail to find the "Kernel command line:" output it is looking for.

This seems to be because the pl011 code and the chardev
code disagree about how "couldn't write anything" is
reported. pl011 here is looking for "0 means wrote nothing",
but the chardev code reports it as "-1 and errno is EAGAIN".

I think the chardev code is actually what we need to fix here,
because it makes basically no effort to guarantee that the
errno from the underlying write is still in 'errno' by the
time qemu_chr_fe_write() returns. In particular it may
call qemu_chr_write_log() or replay_char_write_event_save(),
both of which will happily trash errno if something fails
during their execution.

So my long-term preference for fixing this is:
 * fix up any callsites that can't handle a 0 return for
   "wrote no bytes"
 * make (and document) qemu_chr_fe_write()'s return value be
    - 0 == wrote no bytes
    - >0 == wrote some bytes
    - <0 == a negative-errno indicating a definite error


I had planned in the meantime that we could deal with
this by squashing in this change to the last patch in
this series:

--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -275,6 +275,9 @@ static gboolean pl011_xmit_cb(void *do_not_use,
GIOCondition cond, void *opaque)
     /* Transmit as much data as we can. */
     bytes_consumed = qemu_chr_fe_write(&s->chr, buf, count);
     trace_pl011_fifo_tx_xmit_consumed(bytes_consumed);
+    if (bytes_consumed < 0 && errno == EAGAIN) {
+        bytes_consumed = 0;
+    }
     if (bytes_consumed < 0) {
         /* Error in back-end: drain the fifo. */
         printf("oops, bytes_consumed = %d errno = %d\n",
bytes_consumed, errno);


which makes the code handle both "returns 0" and "returns -1
with errno=EAGAIN" as "try again later".

But even with that I still see the check-functional
test failing on a clang sanitizer build, though without
any clear reason why. It's intermittent; running the
test like this:

(cd build/arm-clang/ ; PYTHONPATH=../../python:../../tests/functional
QEMU_TEST_QEMU_BINARY=./qemu-system-aarch64 ./pyvenv/bin/python3
../../tests/functional/test_arm_virt.py)

I got one pass once but mostly it hangs after printing
some of the early console output. A debug build seems
more reliable, oddly.

I'll try to continue investigating this this week, but
in the meantime I'm going to have to drop this series
from target-arm.next again, I'm afraid :-(

thanks
-- PMM

Reply via email to