Hi,
I think I have finally found the reason behind this issue.
The order in which "add_packed" and "get_buf_packed" are performed in the
nested guest kernel (L2 kernel) and QEMU are a little different. Due to
this, the values in free_head and svq->desc_next[] differ and the guest
crashes at some point. More below.
On 6/26/25 1:07 PM, Eugenio Perez Martin wrote:
On Thu, Jun 26, 2025 at 7:16 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
I think there's something off in the way "free_head", "last_used_idx" and
"desc_next" values are calculated in vhost_svq_get_buf_packed() [1].
In the latest test run, QEMU sent ids 0 through 28 to L2. L2 started receiving
them in order till id 8. At this point it received id 7 again for some reason
and then crashed.
L2:
[ 1641.129218] (prepare_packed) output.0 -> needs_kick: 1
[ 1641.130621] (notify) output.0 -> function will return true
[ 1641.132022] output.0 -> id: 0
[ 1739.502358] input.0 -> id: 0
[ 1739.503003] input.0 -> id: 1
[ 1739.562024] input.0 -> id: 2
[ 1739.578682] input.0 -> id: 3
[ 1739.661913] input.0 -> id: 4
[ 1739.828796] input.0 -> id: 5
[ 1739.829789] input.0 -> id: 6
[ 1740.078757] input.0 -> id: 7
[ 1740.079749] input.0 -> id: 8
[ 1740.080382] input.0 -> id: 7 <----Received 7 again
[ 1740.081614] virtio_net virtio1: input.0:id 7 is not a head!
QEMU logs (vhost_svq_get_buf_packed):
------
size : svq->vring.num
len : svq->vring_packed.vring.desc[last_used].len
id : svq->vring_packed.vring.desc[last_used].id
num : svq->desc_state[id].ndescs
last_used_chain : Result of vhost_svq_last_desc_of_chain(svq, num, id) [2]
free_head : svq->free_head
last_used : (last_used_idx & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR)) +
num
used_wrap_counter : !!(last_used_idx & (1 << VRING_PACKED_EVENT_F_WRAP_CTR))
------
size: 256, len: 82, id: 7, vq idx: 0
id: 7, last_used_chain: 7, free_head: 6, vq idx: 0
num: 1, free_head: 7, id: 7, last_used: 8, used_wrap_counter: 1, vq idx: 0
------
size: 256, len: 104, id: 8, vq idx: 0
id: 8, last_used_chain: 8, free_head: 7, vq idx: 0
num: 1, free_head: 8, id: 8, last_used: 9, used_wrap_counter: 1, vq idx: 0
------
size: 256, len: 98, id: 9, vq idx: 0
id: 9, last_used_chain: 9, free_head: 8, vq idx: 0
num: 1, free_head: 9, id: 9, last_used: 10, used_wrap_counter: 1, vq idx: 0
------
size: 256, len: 104, id: 10, vq idx: 0
id: 10, last_used_chain: 10, free_head: 9, vq idx: 0
num: 1, free_head: 10, id: 10, last_used: 11, used_wrap_counter: 1, vq idx: 0
I have a few more ideas of what to do. I'll let you know if I find something
else.
I cannot find anything just by inspection. What about printing all the
desc_state and all desc_next to check for incoherencies in each
svq_add and get_buf?
In this test, all 256 descriptors were filled in the RX vq.
In the TX queue, L2 kernel would add one descriptor at a time and notify
QEMU. QEMU would then register it in its SVQ and mark it as "available".
After processing the descriptor, QEMU would mark it as "used" and flush it
back to L2. L2, in turn, would mark this descriptor as "used". After this
process, L2 would add the next descriptor in the TX vq while reusing this
ID. This was observed from idx 0 till idx 7.
L2's debug logs:
[ 18.379112] (use_indirect?) output.0 -> verdict: 0 <-----
Begin adding descriptor in idx 6
[ 18.387134] (add_packed) output.0 -> idx: 6
[ 18.389897] (add_packed) output.0 -> id: 0
[ 18.392290] (add_packed) output.0 -> len: 74
[ 18.394606] (add_packed) output.0 -> addr: 5012315726
[ 18.397043] (add_packed) output.0 -> next id: 1
[ 18.399861] Entering prepare_packed: output.0
[ 18.402478] (prepare_packed) output.0 -> needs_kick: 1
[ 18.404998] (notify) output.0 -> function will return true <-----
Notify QEMU
[ 18.406349] output.0 -> id: 0, idx: 6 <-----
Mark ID 0 in idx 6 as used
[ 18.409482] output.0 -> old free_head: 1, new free_head: 0 <----- ID
0 can be reused
[ 18.410919] (after get_buf processed) output.0 -> id: 0, idx: 7 <-----
Next slot is idx 7
[ 18.921895] (use_indirect?) output.0 -> verdict: 0 <-----
Begin adding descriptor with ID = 0 in idx 7
[ 18.930093] (add_packed) output.0 -> idx: 7
[ 18.935715] (add_packed) output.0 -> id: 0
[ 18.937609] (add_packed) output.0 -> len: 122
[ 18.939614] (add_packed) output.0 -> addr: 4925868038
[ 18.941710] (add_packed) output.0 -> next id: 1
[ 18.944032] Entering prepare_packed: output.0
[ 18.946148] (prepare_packed) output.0 -> needs_kick: 1
[ 18.948234] (notify) output.0 -> function will return true <-----
Notify QEMU
[ 18.949606] output.0 -> id: 0, idx: 7 <-----
Mark ID 0 in idx 7 as used
[ 18.952756] output.0 -> old free_head: 1, new free_head: 0 <----- ID
0 can be reused
[ 18.955154] (after get_buf processed) output.0 -> id: 0, idx: 8 <-----
Next slot is idx 8
There was no issue in QEMU till this point.
[ 19.177536] (use_indirect?) output.0 -> verdict: 0 <-----
Begin adding descriptor with ID = 0 in idx 8
[ 19.182415] (add_packed) output.0 -> idx: 8
[ 19.187257] (add_packed) output.0 -> id: 0
[ 19.191355] (add_packed) output.0 -> len: 102
[ 19.195131] (add_packed) output.0 -> addr: 4370702342
[ 19.199224] (add_packed) output.0 -> next id: 1
[ 19.204929] Entering prepare_packed: output.0
[ 19.209505] (prepare_packed) output.0 -> needs_kick: 1
[ 19.213820] (notify) output.0 -> function will return true <-----
Notify QEMU
[ 19.218792] (use_indirect?) output.0 -> verdict: 0 <----- Next
slot is idx 9
[ 19.224730] (add_packed) output.0 -> idx: 9
[ 19.227067] (add_packed) output.0 -> id: 1 <----- ID 0
can't be reused yet, so use ID = 1
[ 19.229090] (add_packed) output.0 -> len: 330
[ 19.231182] (add_packed) output.0 -> addr: 4311020614
[ 19.233302] (add_packed) output.0 -> next id: 2
[ 19.235620] Entering prepare_packed: output.0
[ 19.237781] (prepare_packed) output.0 -> needs_kick: 1
[ 19.239958] (notify) output.0 -> function will return true <-----
Notify QEMU
[ 19.237780] output.0 -> id: 0, idx: 8 <----- Mark
ID 0 in idx 8 as used
[ 19.243676] output.0 -> old free_head: 2, new free_head: 0 <----- ID 0
can now be reused
[ 19.245214] (after get_buf processed) output.0 -> id: 0, idx: 9 <----- Next
slot is idx 9
[ 19.247097] output.0 -> id: 1, idx: 9 <----- Mark
ID 1 in idx 9 as used
[ 19.249612] output.0 -> old free_head: 0, new free_head: 1 <----- ID 1
can now be reused
[ 19.252266] (after get_buf processed) output.0 -> id: 1, idx: 10 <----- Next
slot is idx 10
ID 0 and ID 1 in idx 8 and idx 9 respectively are pushed to QEMU
before either of them are marked as used.
But in QEMU, the order is slightly different.
num: 1, init_flags: 128
<----- vhost_svq_add_packed()
idx: 8, id: 0, len: 0, flags: 0, vq idx: 1
<----- Before adding descriptor
idx: 8, id: 0, len: 102, flags: 128, vq idx: 1
<----- After adding descriptor
Finally: new_idx: 9, head_idx: 8, id: 0, len: 102, flags: 128, vq idx: 1
svq->vring.num: 256
<----- Begin vhost_svq_get_buf_packed()
descriptor_len: 0
descriptor_id: 0
<----- Mark ID = 0 as used
last_used: 8
<----- Processing idx 8
used_wrap_counter: 1
svq->desc_state[id].ndescs: 1
free_head: 0
<----- Update free_head to 0.
last_used: 9
<----- Update last_used to 9.
vq idx: 1
<----- End vhost_svq_get_buf_packed()
i: 0
<----- vhost_svq_flush()
descriptor_len: 0
elem->len: 22086
i: 1
elem_is_null: 1
vq idx: 1
<----- End vhost_svq_flush()
num: 1, init_flags: 128
<----- vhost_svq_add_packed()
idx: 9, id: 0, len: 0, flags: 0, curr: 0, vq idx: 1
<----- Before adding descriptor
idx: 9, id: 0, len: 330, flags: 128, curr: 1, vq idx: 1
<----- After adding descriptor
Finally: new_idx: 10, head_idx: 9, id: 0, len: 330, flags: 128, vq idx: 1
<----- ID 0 has been reused (versus ID 1 in L2)
svq->vring.num: 256
<----- Begin vhost_svq_get_buf_packed()
descriptor_len: 0
descriptor_id: 0
<----- Mark ID = 0 as used
last_used: 9
<----- Processing idx 9
used_wrap_counter: 1
svq->desc_state[id].ndescs: 1
free_head: 0
<----- Update free_head to 0.
last_used: 10
<----- Update last_used to 10.
vq idx: 1
<----- End vhost_svq_get_buf_packed()
i: 0
<----- vhost_svq_flush()
descriptor_len: 0
elem->len: 22086
i: 1
elem_is_null: 1
vq idx: 1
<----- End vhost_svq_flush()
In QEMU, id 0 is added in idx 8. But it's marked as used before a
descriptor can be added in idx 9. Because of this there's a discrepancy
in the value of free_head and in svq->desc_next.
In the current implementation, the values of ID are generated, maintained
and processed by QEMU instead of reading from the guest's memory. I think
reading the value of ID from the guest memory (similar to reading the
descriptor length from guest memory) should resolve this issue.
The alternative would be to ensure that "add_packed" and "get_buf_packed"
are synchronized between the guest and QEMU.
What are your thoughts on this?
Thanks,
Sahil