On Wed, Jul 30, 2025 at 4:33 PM Sahil Siddiq <icegambi...@gmail.com> wrote:
>
> Hi,
>
> I think I have finally found the reason behind this issue.
>
> The order in which "add_packed" and "get_buf_packed" are performed in the
> nested guest kernel (L2 kernel) and QEMU are a little different. Due to
> this, the values in free_head and svq->desc_next[] differ and the guest
> crashes at some point. More below.
>
> On 6/26/25 1:07 PM, Eugenio Perez Martin wrote:
> > On Thu, Jun 26, 2025 at 7:16 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
> >> I think there's something off in the way "free_head", "last_used_idx" and
> >> "desc_next" values are calculated in vhost_svq_get_buf_packed() [1].
> >>
> >> In the latest test run, QEMU sent ids 0 through 28 to L2. L2 started 
> >> receiving
> >> them in order till id 8. At this point it received id 7 again for some 
> >> reason
> >> and then crashed.
> >>
> >> L2:
> >>
> >> [ 1641.129218] (prepare_packed) output.0 -> needs_kick: 1
> >> [ 1641.130621] (notify) output.0 -> function will return true
> >> [ 1641.132022] output.0 -> id: 0
> >> [ 1739.502358] input.0 -> id: 0
> >> [ 1739.503003] input.0 -> id: 1
> >> [ 1739.562024] input.0 -> id: 2
> >> [ 1739.578682] input.0 -> id: 3
> >> [ 1739.661913] input.0 -> id: 4
> >> [ 1739.828796] input.0 -> id: 5
> >> [ 1739.829789] input.0 -> id: 6
> >> [ 1740.078757] input.0 -> id: 7
> >> [ 1740.079749] input.0 -> id: 8
> >> [ 1740.080382] input.0 -> id: 7    <----Received 7 again
> >> [ 1740.081614] virtio_net virtio1: input.0:id 7 is not a head!
> >>
> >> QEMU logs (vhost_svq_get_buf_packed):
> >> ------
> >> size              : svq->vring.num
> >> len               : svq->vring_packed.vring.desc[last_used].len
> >> id                : svq->vring_packed.vring.desc[last_used].id
> >> num               : svq->desc_state[id].ndescs
> >> last_used_chain   : Result of vhost_svq_last_desc_of_chain(svq, num, id) 
> >> [2]
> >> free_head         : svq->free_head
> >> last_used         : (last_used_idx & ~(1 << 
> >> VRING_PACKED_EVENT_F_WRAP_CTR)) + num
> >> used_wrap_counter : !!(last_used_idx & (1 << 
> >> VRING_PACKED_EVENT_F_WRAP_CTR))
> >> ------
> >> size: 256, len: 82, id: 7, vq idx: 0
> >> id: 7, last_used_chain: 7, free_head: 6, vq idx: 0
> >> num: 1, free_head: 7, id: 7, last_used: 8, used_wrap_counter: 1, vq idx: 0
> >> ------
> >> size: 256, len: 104, id: 8, vq idx: 0
> >> id: 8, last_used_chain: 8, free_head: 7, vq idx: 0
> >> num: 1, free_head: 8, id: 8, last_used: 9, used_wrap_counter: 1, vq idx: 0
> >> ------
> >> size: 256, len: 98, id: 9, vq idx: 0
> >> id: 9, last_used_chain: 9, free_head: 8, vq idx: 0
> >> num: 1, free_head: 9, id: 9, last_used: 10, used_wrap_counter: 1, vq idx: 0
> >> ------
> >> size: 256, len: 104, id: 10, vq idx: 0
> >> id: 10, last_used_chain: 10, free_head: 9, vq idx: 0
> >> num: 1, free_head: 10, id: 10, last_used: 11, used_wrap_counter: 1, vq 
> >> idx: 0
> >>
> >> I have a few more ideas of what to do. I'll let you know if I find 
> >> something
> >> else.
> >>
> > I cannot find anything just by inspection. What about printing all the
> > desc_state and all desc_next to check for incoherencies in each
> > svq_add and get_buf?
> In this test, all 256 descriptors were filled in the RX vq.
>
> In the TX queue, L2 kernel would add one descriptor at a time and notify
> QEMU. QEMU would then register it in its SVQ and mark it as "available".
> After processing the descriptor, QEMU would mark it as "used" and flush it
> back to L2. L2, in turn, would mark this descriptor as "used". After this
> process, L2 would add the next descriptor in the TX vq while reusing this
> ID. This was observed from idx 0 till idx 7.
>
> L2's debug logs:
>
> [   18.379112] (use_indirect?) output.0 -> verdict: 0                <----- 
> Begin adding descriptor in idx 6
> [   18.387134] (add_packed) output.0 -> idx: 6
> [   18.389897] (add_packed) output.0 -> id: 0
> [   18.392290] (add_packed) output.0 -> len: 74
> [   18.394606] (add_packed) output.0 -> addr: 5012315726
> [   18.397043] (add_packed) output.0 -> next id: 1
> [   18.399861] Entering prepare_packed: output.0
> [   18.402478] (prepare_packed) output.0 -> needs_kick: 1
> [   18.404998] (notify) output.0 -> function will return true        <----- 
> Notify QEMU
> [   18.406349] output.0 -> id: 0, idx: 6                             <----- 
> Mark ID 0 in idx 6 as used
> [   18.409482] output.0 -> old free_head: 1, new free_head: 0        <----- 
> ID 0 can be reused
> [   18.410919] (after get_buf processed) output.0 -> id: 0, idx: 7   <----- 
> Next slot is idx 7
> [   18.921895] (use_indirect?) output.0 -> verdict: 0                <----- 
> Begin adding descriptor with ID = 0 in idx 7
> [   18.930093] (add_packed) output.0 -> idx: 7
> [   18.935715] (add_packed) output.0 -> id: 0
> [   18.937609] (add_packed) output.0 -> len: 122
> [   18.939614] (add_packed) output.0 -> addr: 4925868038
> [   18.941710] (add_packed) output.0 -> next id: 1
> [   18.944032] Entering prepare_packed: output.0
> [   18.946148] (prepare_packed) output.0 -> needs_kick: 1
> [   18.948234] (notify) output.0 -> function will return true        <----- 
> Notify QEMU
> [   18.949606] output.0 -> id: 0, idx: 7                             <----- 
> Mark ID 0 in idx 7 as used
> [   18.952756] output.0 -> old free_head: 1, new free_head: 0        <----- 
> ID 0 can be reused
> [   18.955154] (after get_buf processed) output.0 -> id: 0, idx: 8   <----- 
> Next slot is idx 8
>
> There was no issue in QEMU till this point.
>
> [   19.177536] (use_indirect?) output.0 -> verdict: 0                <----- 
> Begin adding descriptor with ID = 0 in idx 8
> [   19.182415] (add_packed) output.0 -> idx: 8
> [   19.187257] (add_packed) output.0 -> id: 0
> [   19.191355] (add_packed) output.0 -> len: 102
> [   19.195131] (add_packed) output.0 -> addr: 4370702342
> [   19.199224] (add_packed) output.0 -> next id: 1
> [   19.204929] Entering prepare_packed: output.0
> [   19.209505] (prepare_packed) output.0 -> needs_kick: 1
> [   19.213820] (notify) output.0 -> function will return true       <----- 
> Notify QEMU
> [   19.218792] (use_indirect?) output.0 -> verdict: 0               <----- 
> Next slot is idx 9
> [   19.224730] (add_packed) output.0 -> idx: 9
> [   19.227067] (add_packed) output.0 -> id: 1                       <----- ID 
> 0 can't be reused yet, so use ID = 1
> [   19.229090] (add_packed) output.0 -> len: 330
> [   19.231182] (add_packed) output.0 -> addr: 4311020614
> [   19.233302] (add_packed) output.0 -> next id: 2
> [   19.235620] Entering prepare_packed: output.0
> [   19.237781] (prepare_packed) output.0 -> needs_kick: 1
> [   19.239958] (notify) output.0 -> function will return true       <----- 
> Notify QEMU
> [   19.237780] output.0 -> id: 0, idx: 8                            <----- 
> Mark ID 0 in idx 8 as used
> [   19.243676] output.0 -> old free_head: 2, new free_head: 0       <----- ID 
> 0 can now be reused
> [   19.245214] (after get_buf processed) output.0 -> id: 0, idx: 9  <----- 
> Next slot is idx 9
> [   19.247097] output.0 -> id: 1, idx: 9                            <----- 
> Mark ID 1 in idx 9 as used
> [   19.249612] output.0 -> old free_head: 0, new free_head: 1       <----- ID 
> 1 can now be reused
> [   19.252266] (after get_buf processed) output.0 -> id: 1, idx: 10 <----- 
> Next slot is idx 10
>
> ID 0 and ID 1 in idx 8 and idx 9 respectively are pushed to QEMU
> before either of them are marked as used.
>
> But in QEMU, the order is slightly different.
>
> num: 1, init_flags: 128                                                       
>  <----- vhost_svq_add_packed()
> idx: 8, id: 0, len: 0, flags: 0, vq idx: 1                                    
>  <----- Before adding descriptor
> idx: 8, id: 0, len: 102, flags: 128, vq idx: 1                                
>  <----- After adding descriptor
> Finally: new_idx: 9, head_idx: 8, id: 0, len: 102, flags: 128, vq idx: 1
> svq->vring.num: 256                                                           
>  <----- Begin vhost_svq_get_buf_packed()
> descriptor_len: 0
> descriptor_id: 0                                                              
>  <----- Mark ID = 0 as used
> last_used: 8                                                                  
>  <----- Processing idx 8
> used_wrap_counter: 1
> svq->desc_state[id].ndescs: 1
> free_head: 0                                                                  
>  <----- Update free_head to 0.
> last_used: 9                                                                  
>  <----- Update last_used to 9.
> vq idx: 1                                                                     
>  <----- End vhost_svq_get_buf_packed()
> i: 0                                                                          
>  <----- vhost_svq_flush()
> descriptor_len: 0
> elem->len: 22086
> i: 1
> elem_is_null: 1
> vq idx: 1                                                                     
>  <----- End vhost_svq_flush()
> num: 1, init_flags: 128                                                       
>  <----- vhost_svq_add_packed()
> idx: 9, id: 0, len: 0, flags: 0, curr: 0, vq idx: 1                           
>  <----- Before adding descriptor
> idx: 9, id: 0, len: 330, flags: 128, curr: 1, vq idx: 1                       
>  <----- After adding descriptor
> Finally: new_idx: 10, head_idx: 9, id: 0, len: 330, flags: 128, vq idx: 1     
>  <----- ID 0 has been reused (versus ID 1 in L2)
> svq->vring.num: 256                                                           
>  <----- Begin vhost_svq_get_buf_packed()
> descriptor_len: 0
> descriptor_id: 0                                                              
>  <----- Mark ID = 0 as used
> last_used: 9                                                                  
>  <----- Processing idx 9
> used_wrap_counter: 1
> svq->desc_state[id].ndescs: 1
> free_head: 0                                                                  
>  <----- Update free_head to 0.
> last_used: 10                                                                 
>  <----- Update last_used to 10.
> vq idx: 1                                                                     
>  <----- End vhost_svq_get_buf_packed()
> i: 0                                                                          
>  <----- vhost_svq_flush()
> descriptor_len: 0
> elem->len: 22086
> i: 1
> elem_is_null: 1
> vq idx: 1                                                                     
>  <----- End vhost_svq_flush()
>
> In QEMU, id 0 is added in idx 8. But it's marked as used before a
> descriptor can be added in idx 9. Because of this there's a discrepancy
> in the value of free_head and in svq->desc_next.
>
> In the current implementation, the values of ID are generated, maintained
> and processed by QEMU instead of reading from the guest's memory. I think
> reading the value of ID from the guest memory (similar to reading the
> descriptor length from guest memory) should resolve this issue.
>

Ok you made a good catch here :).

The 1:1 sync is hard to achieve as a single buffer in the guest may
need to be splitted in many buffers in the host.

> The alternative would be to ensure that "add_packed" and "get_buf_packed"
> are synchronized between the guest and QEMU.
>

Yes, they're synchronized. When the guest makes an available
descriptor, its head is saved in the VirtQueueElement of the SVQ's
head idx on svq->desc_state.

Reviewing patch 3/7 I see you're actually returning the id of the
first descriptor of the chain in *head, while it should be the id of
the *last* descriptor. It should not be the cause of the failure, as I
don't see any descriptor chain in the log. To keep the free linked
list happy we may need to store the head of the descriptor chain in
the vq too.

Now, why is SVQ id 0 being reused? Sounds like free_list is not
initialized to 0, 1, 2... but to something else like 0, 0, 0, etc. Can
you print the whole list in each iteration?

> What are your thoughts on this?
>
> Thanks,
> Sahil
>
>
>


Reply via email to