On Mon, 10 Mar 2025 at 14:42, Peter Maydell <peter.mayd...@linaro.org> wrote: > > On Mon, 10 Mar 2025 at 01:28, Philippe Mathieu-Daudé <phi...@linaro.org> > wrote: > > > > Hi, > > > > This series add support for (async) FIFO on the transmit path > > of the PL011 UART. > > This hasn't made the last pre-softfreeze arm pullreq, but > I think we can reasonably call "don't do blocking I/O" > enough of a bugfix for it to be ok to go in early in the > freeze cycle for rc0. > > I've applied it to target-arm.next.
...but it still fails 'make check-functional', though in a less easy-to-reproduce way than it did. The problem turns out to be that when the guest kernel is doing its earlycon output (which is by polling, not interrupt driven) the output can be corrupted, which makes the aarch64/test_arm_virt test fail to find the "Kernel command line:" output it is looking for. This seems to be because the pl011 code and the chardev code disagree about how "couldn't write anything" is reported. pl011 here is looking for "0 means wrote nothing", but the chardev code reports it as "-1 and errno is EAGAIN". I think the chardev code is actually what we need to fix here, because it makes basically no effort to guarantee that the errno from the underlying write is still in 'errno' by the time qemu_chr_fe_write() returns. In particular it may call qemu_chr_write_log() or replay_char_write_event_save(), both of which will happily trash errno if something fails during their execution. So my long-term preference for fixing this is: * fix up any callsites that can't handle a 0 return for "wrote no bytes" * make (and document) qemu_chr_fe_write()'s return value be - 0 == wrote no bytes - >0 == wrote some bytes - <0 == a negative-errno indicating a definite error I had planned in the meantime that we could deal with this by squashing in this change to the last patch in this series: --- a/hw/char/pl011.c +++ b/hw/char/pl011.c @@ -275,6 +275,9 @@ static gboolean pl011_xmit_cb(void *do_not_use, GIOCondition cond, void *opaque) /* Transmit as much data as we can. */ bytes_consumed = qemu_chr_fe_write(&s->chr, buf, count); trace_pl011_fifo_tx_xmit_consumed(bytes_consumed); + if (bytes_consumed < 0 && errno == EAGAIN) { + bytes_consumed = 0; + } if (bytes_consumed < 0) { /* Error in back-end: drain the fifo. */ printf("oops, bytes_consumed = %d errno = %d\n", bytes_consumed, errno); which makes the code handle both "returns 0" and "returns -1 with errno=EAGAIN" as "try again later". But even with that I still see the check-functional test failing on a clang sanitizer build, though without any clear reason why. It's intermittent; running the test like this: (cd build/arm-clang/ ; PYTHONPATH=../../python:../../tests/functional QEMU_TEST_QEMU_BINARY=./qemu-system-aarch64 ./pyvenv/bin/python3 ../../tests/functional/test_arm_virt.py) I got one pass once but mostly it hangs after printing some of the early console output. A debug build seems more reliable, oddly. I'll try to continue investigating this this week, but in the meantime I'm going to have to drop this series from target-arm.next again, I'm afraid :-( thanks -- PMM