Hi,

Apologies, I haven't been in touch for a while. I have an update that
I would like to give.

On 4/16/25 12:50 PM, Eugenio Perez Martin wrote:
On Mon, Apr 14, 2025 at 11:20 AM Sahil Siddiq <icegambi...@gmail.com> wrote:

Hi,

On 3/26/25 1:05 PM, Eugenio Perez Martin wrote:
On Mon, Mar 24, 2025 at 2:59 PM Sahil Siddiq <icegambi...@gmail.com> wrote:
I managed to fix a few issues while testing this patch series.
There is still one issue that I am unable to resolve. I thought
I would send this patch series for review in case I have missed
something.

The issue is that this patch series does not work every time. I
am able to ping L0 from L2 and vice versa via packed SVQ when it
works.

So we're on a very good track then!

When this doesn't work, both VMs throw a "Destination Host
Unreachable" error. This is sometimes (not always) accompanied
by the following kernel error (thrown by L2-kernel):

virtio_net virtio1: output.0:id 1 is not a head!


How many packets have been sent or received before hitting this? If
the answer to that is "the vq size", maybe there is a bug in the code
that handles the wraparound of the packed vq, as the used and avail
flags need to be twisted. You can count them in the SVQ code.

I did a lot more testing. This issue is quite unpredictable in terms
of the time at which it appears after booting L2. So far, it almost
always appears after booting L2. Even when pinging works, this issue
appears after several seconds of pinging.


Maybe you can speed it up with ping -f?

Thank you, I was able to run tests much faster with the -f option. So
far I have noticed that the RX queue does not give problems. When all
the descriptors are used it is able to wrap around without issues.

The total number of svq descriptors varied in every test run. But in
every case, all 256 indices were filled in the descriptor region for
vq with vq_idx = 0. This is the RX vq, right?

Right!

The TX queue seems to be problematic. More on this below.

This was filled while L2
was booting. In the case when the ctrl vq is disabled, I am not sure
what is responsible for filling the vqs in the data plane during
booting.

The nested guest's driver fills the rx queue at startup. After that,
that nested guest kicks and SVQ receives the descriptors. It copies
the descriptors to the shadow virtqueue and then kicks L0 QEMU.

Understood.

=====
The issue is hit most frequently when the following command is run
in L0:
$ ip addr add 111.1.1.1/24 dev tap0
$ ip link set tap0 up

or, running the following in L2:
# ip addr add 111.1.1.2/24 dev eth0


I guess those are able to start the network, aren't they?

Yes, that's correct.

The other vq (vq_idx=1) is not filled completely before the issue is
hit.
I have been noting down the numbers and here is an example:

295 descriptors were added individually to the queues i.e., there were no 
chains (vhost_svq_add_packed)
|_ 256 additions in vq_idx = 0, all with unique ids
      |---- 27 descriptors (ids 0 through 26) were received later from the 
device (vhost_svq_get_buf_packed)
|_ 39 additions in vq_idx = 1
      |_ 13 descriptors had id = 0
      |_ 26 descriptors had id = 1
      |---- All descriptors were received at some point from the device 
(vhost_svq_get_buf_packed)

There was one case in which vq_idx=0 had wrapped around. I verified
that flags were set appropriately during the wrap (avail and used flags
were flipped as expected).


Ok sounds like you're able to reach it before filling the queue. I'd
go for debugging notifications for this one then. More on this below.

=====
The next common situation where this issue is hit is during startup.
Before L2 can finish booting successfully, this error is thrown:

virtio_net virtio1: output.0:id 0 is not a head!

258 descriptors were added individually to the queues during startup (there 
were no chains) (vhost_svq_add_packed)
|_ 256 additions in vq_idx = 0, all with unique ids
     |---- None of them were received by the device (vhost_svq_get_buf_packed)
|_ 2 additions in vq_idx = 1
     |_ id = 0 in index 0
     |_ id = 1 in index 1
     |---- Both descriptors were received at some point during startup from the 
device (vhost_svq_get_buf_packed)

=====
Another case is after several seconds of pinging L0 from L2.

[   99.034114] virtio_net virtio1: output.0:id 0 is not a head!


So the L2 guest sees a descriptor it has not made available
previously. This can be caused because SVQ returns the same descriptor
twice, or it doesn't fill the id or flags properly. It can also be
caused because we're not protecting the write ordering in the ring,
but I don't see anything obviously wrong by looking at the code.

366 descriptors were added individually to the queues i.e., there were no 
chains (vhost_svq_add_packed)
|_ 289 additions in vq_idx = 0, wrap-around was observed with avail and used 
flags inverted for 33 descriptors
|   |---- 40 descriptors (ids 0 through 39) were received from the device 
(vhost_svq_get_buf_packed)
|_ 77 additions in vq_idx = 1
      |_ 76 descriptors had id = 0
      |_ 1 descriptor had id = 1
      |---- all 77 descriptors were received at some point from the device 
(vhost_svq_get_buf_packed)

I am not entirely sure now if there's an issue in the packed vq
implementation in QEMU or if this is being caused due to some sort
of race condition in linux.

"id is not a head" is being thrown because vq->packed.desc_state[id].data
doesn't exist for the corresponding id in Linux [1]. But QEMU seems to have
stored some data for this id via vhost_svq_add() [2]. Linux sets the value
of vq->packed.desc_state[id].data in its version of virtqueue_add_packed() [3].


Let's keep debugging further. Can you trace the ids that the L2 kernel
makes available, and then the ones that it uses? At the same time, can
you trace the ids that the svq sees in vhost_svq_get_buf and the ones
that flushes? This allows us to check the set of available descriptors
at any given time.

In the linux kernel, I am printing which descriptor is received in which
queue in drivers/virtio/virtio_ring.c:virtqueue_get_buf_ctx_packed() [1].
I see the following lines getting printed for the TX queue:

[  192.101591] output.0 -> id: 0
[  213.737417] output.0 -> id: 0
[  213.738714] output.0 -> id: 1
[  213.740093] output.0 -> id: 0
[  213.741521] virtio_net virtio1: output.0:id 0 is not a head!

In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_add_packed(), I am
printing the head_idx, id, len, flags and vq_idx. Just before the crash,
the following lines are printed:

head_idx: 157, id: 0, len: 122, flags: 32768, vq idx: 1
head_idx: 158, id: 0, len: 122, flags: 32768, vq idx: 1
head_idx: 159, id: 0, len: 66, flags: 32768, vq idx: 1
head_idx: 160, id: 1, len: 102, flags: 32768, vq idx: 1

In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf_packed(), I
am printing the id, last_used index, used wrap counter and vq_idx. These
are the lines just before the crash:

id: 0, last_used: 158, used_wrap_counter: 0, vq idx: 1
id: 0, last_used: 159, used_wrap_counter: 0, vq idx: 1
id: 0, last_used: 160, used_wrap_counter: 0, vq idx: 1
id: 1, last_used: 161, used_wrap_counter: 0, vq idx: 1

In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_flush() [2], I am 
tracking
the values of i and vq_idx in the outer do..while() loop as well as in the inner
while(true) loop. The value of i is used as the "idx" in virtqueue_fill() [3] 
and
as "count" in virtqueue_flush() [4]. Lines printed in each iteration of the 
outer
do...while loop are enclosed between "===" lines. These are the lines just 
before
the crash:

===
in_loop: i: 0, vq idx: 1
in_loop: i: 1, vq idx: 1
out_loop: i: 1, vq idx: 1
===
in_loop: i: 0, vq idx: 1
in_loop: i: 1, vq idx: 1
out_loop: i: 1, vq idx: 1
===
in_loop: i: 0, vq idx: 1
in_loop: i: 1, vq idx: 1
in_loop: i: 2, vq idx: 1
out_loop: i: 2, vq idx: 1
===
in_loop: i: 0, vq idx: 1
out_loop: i: 0, vq idx: 1

I have only investigated which descriptors the kernel uses. I'll also check
which descriptors are made available by the kernel. I'll let you know what I
find.

Thanks,
Sahil

[1] 
https://github.com/torvalds/linux/blob/master/drivers/virtio/virtio_ring.c#L1727
[2] 
https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.c#L499
[3] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/virtio.c#L1008
[4] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/virtio.c#L1147


Reply via email to