Hi,
On 3/26/25 1:05 PM, Eugenio Perez Martin wrote:
On Mon, Mar 24, 2025 at 2:59 PM Sahil Siddiq <icegambi...@gmail.com> wrote:
I managed to fix a few issues while testing this patch series.
There is still one issue that I am unable to resolve. I thought
I would send this patch series for review in case I have missed
something.
The issue is that this patch series does not work every time. I
am able to ping L0 from L2 and vice versa via packed SVQ when it
works.
So we're on a very good track then!
When this doesn't work, both VMs throw a "Destination Host
Unreachable" error. This is sometimes (not always) accompanied
by the following kernel error (thrown by L2-kernel):
virtio_net virtio1: output.0:id 1 is not a head!
How many packets have been sent or received before hitting this? If
the answer to that is "the vq size", maybe there is a bug in the code
that handles the wraparound of the packed vq, as the used and avail
flags need to be twisted. You can count them in the SVQ code.
I did a lot more testing. This issue is quite unpredictable in terms
of the time at which it appears after booting L2. So far, it almost
always appears after booting L2. Even when pinging works, this issue
appears after several seconds of pinging.
The total number of svq descriptors varied in every test run. But in
every case, all 256 indices were filled in the descriptor region for
vq with vq_idx = 0. This is the RX vq, right? This was filled while L2
was booting. In the case when the ctrl vq is disabled, I am not sure
what is responsible for filling the vqs in the data plane during
booting.
=====
The issue is hit most frequently when the following command is run
in L0:
$ ip addr add 111.1.1.1/24 dev tap0
$ ip link set tap0 up
or, running the following in L2:
# ip addr add 111.1.1.2/24 dev eth0
The other vq (vq_idx=1) is not filled completely before the issue is
hit. I have been noting down the numbers and here is an example:
295 descriptors were added individually to the queues i.e., there were no
chains (vhost_svq_add_packed)
|_ 256 additions in vq_idx = 0, all with unique ids
|---- 27 descriptors (ids 0 through 26) were received later from the device
(vhost_svq_get_buf_packed)
|_ 39 additions in vq_idx = 1
|_ 13 descriptors had id = 0
|_ 26 descriptors had id = 1
|---- All descriptors were received at some point from the device
(vhost_svq_get_buf_packed)
There was one case in which vq_idx=0 had wrapped around. I verified
that flags were set appropriately during the wrap (avail and used flags
were flipped as expected).
=====
The next common situation where this issue is hit is during startup.
Before L2 can finish booting successfully, this error is thrown:
virtio_net virtio1: output.0:id 0 is not a head!
258 descriptors were added individually to the queues during startup (there
were no chains) (vhost_svq_add_packed)
|_ 256 additions in vq_idx = 0, all with unique ids
|---- None of them were received by the device (vhost_svq_get_buf_packed)
|_ 2 additions in vq_idx = 1
|_ id = 0 in index 0
|_ id = 1 in index 1
|---- Both descriptors were received at some point during startup from the
device (vhost_svq_get_buf_packed)
=====
Another case is after several seconds of pinging L0 from L2.
[ 99.034114] virtio_net virtio1: output.0:id 0 is not a head!
366 descriptors were added individually to the queues i.e., there were no
chains (vhost_svq_add_packed)
|_ 289 additions in vq_idx = 0, wrap-around was observed with avail and used
flags inverted for 33 descriptors
| |---- 40 descriptors (ids 0 through 39) were received from the device
(vhost_svq_get_buf_packed)
|_ 77 additions in vq_idx = 1
|_ 76 descriptors had id = 0
|_ 1 descriptor had id = 1
|---- all 77 descriptors were received at some point from the device
(vhost_svq_get_buf_packed)
I am not entirely sure now if there's an issue in the packed vq
implementation in QEMU or if this is being caused due to some sort
of race condition in linux.
"id is not a head" is being thrown because vq->packed.desc_state[id].data
doesn't exist for the corresponding id in Linux [1]. But QEMU seems to have
stored some data for this id via vhost_svq_add() [2]. Linux sets the value
of vq->packed.desc_state[id].data in its version of virtqueue_add_packed() [3].
This error is not thrown always, but when it is thrown, the id
varies. This is invariably followed by a soft lockup:
[...]
[ 284.662292] Call Trace:
[ 284.662292] <IRQ>
[ 284.662292] ? watchdog_timer_fn+0x1e6/0x270
[ 284.662292] ? __pfx_watchdog_timer_fn+0x10/0x10
[ 284.662292] ? __hrtimer_run_queues+0x10f/0x2b0
[ 284.662292] ? hrtimer_interrupt+0xf8/0x230
[ 284.662292] ? __sysvec_apic_timer_interrupt+0x4d/0x140
[ 284.662292] ? sysvec_apic_timer_interrupt+0x39/0x90
[ 284.662292] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 284.662292] ? virtqueue_enable_cb_delayed+0x115/0x150
[ 284.662292] start_xmit+0x2a6/0x4f0 [virtio_net]
[ 284.662292] ? netif_skb_features+0x98/0x300
[ 284.662292] dev_hard_start_xmit+0x61/0x1d0
[ 284.662292] sch_direct_xmit+0xa4/0x390
[ 284.662292] __dev_queue_xmit+0x84f/0xdc0
[ 284.662292] ? nf_hook_slow+0x42/0xf0
[ 284.662292] ip_finish_output2+0x2b8/0x580
[ 284.662292] igmp_ifc_timer_expire+0x1d5/0x430
[ 284.662292] ? __pfx_igmp_ifc_timer_expire+0x10/0x10
[ 284.662292] call_timer_fn+0x21/0x130
[ 284.662292] ? __pfx_igmp_ifc_timer_expire+0x10/0x10
[ 284.662292] __run_timers+0x21f/0x2b0
[ 284.662292] run_timer_softirq+0x1d/0x40
[ 284.662292] __do_softirq+0xc9/0x2c8
[ 284.662292] __irq_exit_rcu+0xa6/0xc0
[ 284.662292] sysvec_apic_timer_interrupt+0x72/0x90
[ 284.662292] </IRQ>
[ 284.662292] <TASK>
[ 284.662292] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 284.662292] RIP: 0010:pv_native_safe_halt+0xf/0x20
[ 284.662292] Code: 22 d7 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 53 75 3f 00 fb f4 <c3> cc c0
[ 284.662292] RSP: 0018:ffffb8f0000b3ed8 EFLAGS: 00000212
[ 284.662292] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
[ 284.662292] RDX: 4000000000000000 RSI: 0000000000000083 RDI: 00000000000289ec
[ 284.662292] RBP: ffff96f200810000 R08: 0000000000000000 R09: 0000000000000001
[ 284.662292] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 284.662292] R13: 0000000000000000 R14: ffff96f200810000 R15: 0000000000000000
[ 284.662292] default_idle+0x9/0x20
[ 284.662292] default_idle_call+0x2c/0xe0
[ 284.662292] do_idle+0x226/0x270
[ 284.662292] cpu_startup_entry+0x2a/0x30
[ 284.662292] start_secondary+0x11e/0x140
[ 284.662292] secondary_startup_64_no_verify+0x184/0x18b
[ 284.662292] </TASK>
The soft lockup seems to happen in
drivers/net/virtio_net.c:start_xmit() [1].
Maybe it gets stuck in the do {} while(...
!virtqueue_enable_cb_delayed()) ? you can add a printk in
virtqueue_enable_cb_delayed return and check if it matches with the
speed you're sending or receiving ping. For example, if ping is each
second, you should not see a lot of traces.
If this does not work I'd try never disabling notifications, both in
the kernel and SVQ, and check if that works.
In order to disable notifications, will something have to be commented
out in the implementation?
[...]
QEMU command to boot L1:
$ sudo ./qemu/build/qemu-system-x86_64 \
-enable-kvm \
-drive
file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \
-net nic,model=virtio \
-net user,hostfwd=tcp::2222-:22 \
-device intel-iommu,snoop-control=on \
-device
virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,mq=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_mac_addr=off,packed=on,event_idx=off,bus=pcie.0,addr=0x4
\
-netdev tap,id=net0,script=no,downscript=no,vhost=off \
-nographic \
-m 8G \
-smp 4 \
-M q35 \
-cpu host 2>&1 | tee vm.log
I have added "-device virtio-net-pci,indirect_desc=off,queue_reset=off"
to the L0 QEMU command to boot L1.
Thanks,
Sahil
[1]
https://github.com/torvalds/linux/blob/master/drivers/virtio/virtio_ring.c#L1762
[2]
https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.c#L290
[3]
https://github.com/torvalds/linux/blob/master/drivers/virtio/virtio_ring.c#L1564