Re: kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue

Steven Luong (sluong) via Virtualization Tue, 08 Jan 2019 13:08:52 -0800

Michael,

Thank you very much for your help. I greatly appreciate it.


Steven

On 12/17/18, 4:16 PM, "Michael S. Tsirkin" <m...@redhat.com> wrote:

    On Mon, Dec 17, 2018 at 11:56:59PM +0000, Steven Luong (sluong) wrote:
    > 
    > 
    > On 12/17/18, 2:55 PM, "Michael S. Tsirkin" <m...@redhat.com> wrote:
    > 
    >     On Thu, Dec 13, 2018 at 11:24:28PM +0000, Steven Luong (sluong) via 
Virtualization wrote:
    >     > Folks,
    >     > 
    >     >  
    >     > 
    >     > We came across a memory race condition between VPP vhost driver and 
the kernel
    >     > vhost. VPP is running a tap interface over vhost backend. In this 
case, VPP is
    >     > acting as the vhost driver mode and the kernel vhost is acting as 
the vhost
    >     > device mode.
    >     > 
    >     >  
    >     > 
    >     > In the kernel vhost’s TX traffic direction which is VPP’s RX 
traffic direction,
    >     > kernel vhost is the producer and VPP is the consumer.
    >     
    >     Looking at vhost net kernel code, it seems to use the
    >     same terminology as virtio net.
    >     Can you pls clarify which ring number do you use 0 or 1?
    > 
    > <SVL> 0. </SVL>
    >     
    >     I will assume ring 0 below.
    >     
    >     
    >     
    >     > Kernel vhost places
    >     > traffic in kernel vhost’s TX vring. VPP removes traffic in VPP’s RX 
vring.
    >     
    >     
    >     So VPP makes buffers available and vhost kernel uses them?
    >     
    > <SVL> Correct. </SVL>
    >     
    >     > It
    >     > is inevitable that the vring may become full under heavy traffic 
and the kernel
    >     > vhost may have to stop and wait for the guest (VPP) to empty the 
vring and to
    >     > refill the vring with descriptors. When that case happens, kernel 
vhost clears
    >     > the bit in the vring’s used flag to request an 
interrupt/notification. Due to
    >     > shared memory race condition, VPP may miss the clearing of the 
vring’s used
    >     > flag from kernel vhost and didn’t send kernel vhost an interrupt 
after VPP
    >     > empties and refills the vring with new descriptors.
    >     
    >     So kernel vhost's wakeups are signalled through eventfd - I assume
    >     this is what you mean by an interrupt?
    >     
    >     
    >     To prevent the race that you describe, vhost has this code:
    >     
    >                     /* OK, now we need to know about added descriptors. */
    >                     if (!headcount) {
    >                             if (unlikely(busyloop_intr)) {
    >                                     vhost_poll_queue(&vq->poll);
    >                             } else if 
(unlikely(vhost_enable_notify(&net->dev, vq))) {
    >                                     /* They have slipped one in as we were
    >                                      * doing that: check again. */
    >                                     vhost_disable_notify(&net->dev, vq);
    >                                     continue;
    >                             }
    >                             /* Nothing new?  Wait for eventfd to tell us
    >     
    >     So ring should be re-checked after notifications are enabled.
    >     If ring is no longer empty, vhost will proceed to handle the ring.  
    >     This code has been around for many years.
    >     
    >     Thus if VPP doesn't work with it then I suspect it does something
    >     differently, e.g. is it possible that it is missing a memory
    >     barrier after writing out the available buffers and before
    >     checking the flags?
    >     
    >     <SVL> Can the race condition be avoided totally? Here is the scenario.
    > 
    > t0: VPP refills the vring with new descriptors, not yet sets avail->idx 
to make it available to kernel vhost. 
    > t1: kernel vhost checks the vring, still sees no space available, but 
does not yet execute the line of code to clear the used->flags
    > t2: vpp executes sfence, and sets avail->idx to make it available to 
kernel vhost
    
    At this point you need a full memory barrier. E.g. mfence, or xor.
    Otherwise the read can get re-ordered before the write.
    
    > t3: vpp checks used->flags, it is still 1, vpp skips sending the interrupt
    > t4: kernel vhost clears used->flags to indicate interrupt is required. 
But VPP just missed it.
    
    fine but at this point kernel vhost will recheck using vhost_get_avail
    and process the buffers. So the lack of notification isn't a problem:
    
    bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
    {
            __virtio16 avail_idx;
            int r;
    
            if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
                    return false;
            vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
            if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
                    r = vhost_update_used_flags(vq);
                    if (r) {
                            vq_err(vq, "Failed to enable notification at %p: 
%d\n",
                                   &vq->used->flags, r);
                            return false;
                    }
            } else {
                    r = vhost_update_avail_event(vq, vq->avail_idx);
                    if (r) {
                            vq_err(vq, "Failed to update avail event index at 
%p: %d\n",
                                   vhost_avail_event(vq), r);
                            return false;
                    }
            }
            /* They could have slipped one in as we were doing that: make
             * sure it's written, then check again. */
            smp_mb();
            r = vhost_get_avail(vq, avail_idx, &vq->avail->idx);
            if (r) {
                    vq_err(vq, "Failed to check avail idx at %p: %d\n",
                           &vq->avail->idx, r);
                    return false;
            }
    
            return vhost16_to_cpu(vq, avail_idx) != vq->avail_idx;
    }
    
    >The result is kernel vhost got stuck even though the vring is filled with 
new descriptors.
    
    Shouldn't be. Maybe you are missing a fence before reading used->flags?
    
    
    > Steven
    > 
    > </SVL>
    >     
    >     
    >     > Unfortunately, this missed
    >     > notification causes the kernel vhost to be stuck because once the 
kernel vhost
    >     > is waiting for an interrupt/notification from the guest, only an 
interrupt/
    >     > notification from the guest can resume/re-enable the guest from 
submitting new
    >     > packets to the vring. This design seems vulnerable.
    >     
    >     The protocol right now relies on notifications never being lost.
    >     
    >     I can imagine an alternative where device also re-checks
    >     the rings periodically, but that would involve running
    >     timers host side which is only possible if there's a
    >     free processor that can handle them. Further,
    >     it will still lead to stalls and packet drops, which will
    >     be harder to debug.
    >     
    >     That's just my $.02, pls feel free to take it with the virtio tc
    >     maybe others will feel differently.
    >     
    >     > Should the kernel vhost
    >     > totally depend on the notification from the guest? Quoting the text 
from
    >     > 
    >     >  
    >     > 
    >     > http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html
    >     > 
    >     >  
    >     > 
    >     > /* The device uses this in used->flags to advise the driver: don’t 
kick me 
    >     >  * when you add a buffer.  It’s unreliable, so it’s simply an 
    >     >  * optimization. */ 
    >     > #define VIRTQ_USED_F_NO_NOTIFY  1 
    >     > 
    >     >  
    >     > 
    >     > I interpret that the notification is simply an optimization, not a 
reliable
    >     > notification mechanism. 
    >     
    >     What was meant I think is that suppression is unreliable.
    >     
    >     >So the kernel vhost should not bet the farm on it.
    >     > 
    >     >  
    >     > 
    >     > We encounter the same race condition in kernel vhost’s RX traffic 
direction
    >     > which causes the kernel vhost to be stuck due to missed 
interrupt/notification
    >     > although it is less frequent.
    >     
    >     
    >     For the reverse direction the spec actually has some pseudo-code and 
a suggestion
    >     about handling that race:
    >     
    >           For optimal performance, a driver MAY disable used buffer 
notifications while processing the used
    >           ring, but beware the problem of missing notifications between 
emptying the ring and reenabling no-
    >           tifications. This is usually handled by re-checking for more 
used buffers after notifications are re-
    >           enabled:
    >     
    >                   virtq_disable_used_buffer_notifications(vq);
    >                   for (;;) {
    >                           if (vq->last_seen_used != 
le16_to_cpu(virtq->used.idx)) {
    >                           virtq_enable_used_buffer_notifications(vq);
    >                           mb();
    >                           if (vq->last_seen_used != 
le16_to_cpu(virtq->used.idx))
    >                                   break;
    >                           virtq_disable_used_buffer_notifications(vq);
    >                   }
    >                   struct virtq_used_elem *e = 
virtq.used->ring[vq->last_seen_used%vsz];
    >                   process_buffer(e);
    >                   vq->last_seen_used++;
    >     
    >     
    >     
    >     >  
    >     > 
    >     > Steven
    >     > 
    >     
    >     > _______________________________________________
    >     > Virtualization mailing list
    >     > Virtualization@lists.linux-foundation.org
    >     > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
    >     
    >     
    > 
    

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue

Reply via email to