I gave another and I have a couple of comments.

Booting Linux with earlycon enabled take quite a while. I can see the
characters coming slower than on the minitel. It seems to be a bit better
after switching off the bootconsole. Overall Linux is taking ~20 times to
boot with pl011 vs HVC console.

I do agree that pl011 is emulated and therefore you have to trap after each
character. But 20 times sounds far too much.

I think this slowness could be due to ratelimiting of the pl011 events
in xenconosle. Currently, the rate limit is
set to 30 events per 200 msecs (see RATE_LIMIT_ALLOWANCE/RATE_LIMIT_PERIOD).

I increased the rate limit to 600 events (30 * 20) per 200 msecs. With
this change,
I see that the the find command is running faster and smoother.
Earlier the find output would be jerky.

I think there might be another solution avoiding increasing the rate limit.

If you look at the earlycon code for pl011 in Linux:

static void pl011_putc(struct uart_port *port, int c)
        while (readl(port->membase + UART01x_FR) & UART01x_FR_TXFF)
        if (port->iotype == UPIO_MEM32)
                writel(c, port->membase + UART01x_DR);
                writeb(c, port->membase + UART01x_DR);
        while (readl(port->membase + UART01x_FR) & UART01x_FR_BUSY)

Linux will wait the UART to be idle before sending a new character.

Now looking at vpl011 emulation, the busy bit set when a new character is
queued (see vpl011_write_data). This bit will only be cleared when the
console daemon will raise an event and the queue is empty (see

This means for earlycon, you will need a round trip Guest -> Xen -> Dom0 ->
Xen -> Guest for each single character. This is a bit counterproductive and
combined with the limit it makes it worse.

I would take a different approach on the BUSY bit. We can consider the queue
between Xen and xenconsoled as outside of the UART. If the character is
queued, then job done. I think this would improve quite a lot of the

Yes. This.

The guest sees a register, which is essentially a synchronous interface
to the guest. The current code, as you already see, will issue one event
for every character. That's excessive.

I am actually not suggesting to modify that at the moment. I think you may have other trouble with the interaction between the user and th console by doing that. Imagine you want to print the prompt, it may lag a bit before getting it.

The only thing I suggest is to not set the BUSY bit in the UART everytime a character is queued.

The interface between Xen and xenconsoled can be asynchronous, it can
opt to queue X characters before sending an event, also setup a oneshot
timer to avoid hanging.

This however has some other implications -- it might not be as reliable
as the original method because data is not guaranteed to hit backend. If
the guest crashes very early on, depending the actual implementation you
might not be able get the data.

Would it be possible to ask xenconsoled to dump everything on domain crash? Some kind of synchronization.


Julien Grall

