Michael,

Thank you very much for your help. I greatly appreciate it.

Steven

On 12/17/18, 4:16 PM, "Michael S. Tsirkin" <m...@redhat.com> wrote:

    On Mon, Dec 17, 2018 at 11:56:59PM +0000, Steven Luong (sluong) wrote:
    > 
    > 
    > On 12/17/18, 2:55 PM, "Michael S. Tsirkin" <m...@redhat.com> wrote:
    > 
    >     On Thu, Dec 13, 2018 at 11:24:28PM +0000, Steven Luong (sluong) via 
Virtualization wrote:
    >     > Folks,
    >     > 
    >     >  
    >     > 
    >     > We came across a memory race condition between VPP vhost driver and 
the kernel
    >     > vhost. VPP is running a tap interface over vhost backend. In this 
case, VPP is
    >     > acting as the vhost driver mode and the kernel vhost is acting as 
the vhost
    >     > device mode.
    >     > 
    >     >  
    >     > 
    >     > In the kernel vhost’s TX traffic direction which is VPP’s RX 
traffic direction,
    >     > kernel vhost is the producer and VPP is the consumer.
    >     
    >     Looking at vhost net kernel code, it seems to use the
    >     same terminology as virtio net.
    >     Can you pls clarify which ring number do you use 0 or 1?
    > 
    > <SVL> 0. </SVL>
    >     
    >     I will assume ring 0 below.
    >     
    >     
    >     
    >     > Kernel vhost places
    >     > traffic in kernel vhost’s TX vring. VPP removes traffic in VPP’s RX 
vring.
    >     
    >     
    >     So VPP makes buffers available and vhost kernel uses them?
    >     
    > <SVL> Correct. </SVL>
    >     
    >     > It
    >     > is inevitable that the vring may become full under heavy traffic 
and the kernel
    >     > vhost may have to stop and wait for the guest (VPP) to empty the 
vring and to
    >     > refill the vring with descriptors. When that case happens, kernel 
vhost clears
    >     > the bit in the vring’s used flag to request an 
interrupt/notification. Due to
    >     > shared memory race condition, VPP may miss the clearing of the 
vring’s used
    >     > flag from kernel vhost and didn’t send kernel vhost an interrupt 
after VPP
    >     > empties and refills the vring with new descriptors.
    >     
    >     So kernel vhost's wakeups are signalled through eventfd - I assume
    >     this is what you mean by an interrupt?
    >     
    >     
    >     To prevent the race that you describe, vhost has this code:
    >     
    >                     /* OK, now we need to know about added descriptors. */
    >                     if (!headcount) {
    >                             if (unlikely(busyloop_intr)) {
    >                                     vhost_poll_queue(&vq->poll);
    >                             } else if 
(unlikely(vhost_enable_notify(&net->dev, vq))) {
    >                                     /* They have slipped one in as we were
    >                                      * doing that: check again. */
    >                                     vhost_disable_notify(&net->dev, vq);
    >                                     continue;
    >                             }
    >                             /* Nothing new?  Wait for eventfd to tell us
    >     
    >     So ring should be re-checked after notifications are enabled.
    >     If ring is no longer empty, vhost will proceed to handle the ring.  
    >     This code has been around for many years.
    >     
    >     Thus if VPP doesn't work with it then I suspect it does something
    >     differently, e.g. is it possible that it is missing a memory
    >     barrier after writing out the available buffers and before
    >     checking the flags?
    >     
    >     <SVL> Can the race condition be avoided totally? Here is the scenario.
    > 
    > t0: VPP refills the vring with new descriptors, not yet sets avail->idx 
to make it available to kernel vhost. 
    > t1: kernel vhost checks the vring, still sees no space available, but 
does not yet execute the line of code to clear the used->flags
    > t2: vpp executes sfence, and sets avail->idx to make it available to 
kernel vhost
    
    At this point you need a full memory barrier. E.g. mfence, or xor.
    Otherwise the read can get re-ordered before the write.
    
    > t3: vpp checks used->flags, it is still 1, vpp skips sending the interrupt
    > t4: kernel vhost clears used->flags to indicate interrupt is required. 
But VPP just missed it.
    
    fine but at this point kernel vhost will recheck using vhost_get_avail
    and process the buffers. So the lack of notification isn't a problem:
    
    bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
    {
            __virtio16 avail_idx;
            int r;
    
            if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
                    return false;
            vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
            if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
                    r = vhost_update_used_flags(vq);
                    if (r) {
                            vq_err(vq, "Failed to enable notification at %p: 
%d\n",
                                   &vq->used->flags, r);
                            return false;
                    }
            } else {
                    r = vhost_update_avail_event(vq, vq->avail_idx);
                    if (r) {
                            vq_err(vq, "Failed to update avail event index at 
%p: %d\n",
                                   vhost_avail_event(vq), r);
                            return false;
                    }
            }
            /* They could have slipped one in as we were doing that: make
             * sure it's written, then check again. */
            smp_mb();
            r = vhost_get_avail(vq, avail_idx, &vq->avail->idx);
            if (r) {
                    vq_err(vq, "Failed to check avail idx at %p: %d\n",
                           &vq->avail->idx, r);
                    return false;
            }
    
            return vhost16_to_cpu(vq, avail_idx) != vq->avail_idx;
    }
    
    >The result is kernel vhost got stuck even though the vring is filled with 
new descriptors.
    
    Shouldn't be. Maybe you are missing a fence before reading used->flags?
    
    
    > Steven
    > 
    > </SVL>
    >     
    >     
    >     > Unfortunately, this missed
    >     > notification causes the kernel vhost to be stuck because once the 
kernel vhost
    >     > is waiting for an interrupt/notification from the guest, only an 
interrupt/
    >     > notification from the guest can resume/re-enable the guest from 
submitting new
    >     > packets to the vring. This design seems vulnerable.
    >     
    >     The protocol right now relies on notifications never being lost.
    >     
    >     I can imagine an alternative where device also re-checks
    >     the rings periodically, but that would involve running
    >     timers host side which is only possible if there's a
    >     free processor that can handle them. Further,
    >     it will still lead to stalls and packet drops, which will
    >     be harder to debug.
    >     
    >     That's just my $.02, pls feel free to take it with the virtio tc
    >     maybe others will feel differently.
    >     
    >     > Should the kernel vhost
    >     > totally depend on the notification from the guest? Quoting the text 
from
    >     > 
    >     >  
    >     > 
    >     > http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html
    >     > 
    >     >  
    >     > 
    >     > /* The device uses this in used->flags to advise the driver: don’t 
kick me 
    >     >  * when you add a buffer.  It’s unreliable, so it’s simply an 
    >     >  * optimization. */ 
    >     > #define VIRTQ_USED_F_NO_NOTIFY  1 
    >     > 
    >     >  
    >     > 
    >     > I interpret that the notification is simply an optimization, not a 
reliable
    >     > notification mechanism. 
    >     
    >     What was meant I think is that suppression is unreliable.
    >     
    >     >So the kernel vhost should not bet the farm on it.
    >     > 
    >     >  
    >     > 
    >     > We encounter the same race condition in kernel vhost’s RX traffic 
direction
    >     > which causes the kernel vhost to be stuck due to missed 
interrupt/notification
    >     > although it is less frequent.
    >     
    >     
    >     For the reverse direction the spec actually has some pseudo-code and 
a suggestion
    >     about handling that race:
    >     
    >           For optimal performance, a driver MAY disable used buffer 
notifications while processing the used
    >           ring, but beware the problem of missing notifications between 
emptying the ring and reenabling no-
    >           tifications. This is usually handled by re-checking for more 
used buffers after notifications are re-
    >           enabled:
    >     
    >                   virtq_disable_used_buffer_notifications(vq);
    >                   for (;;) {
    >                           if (vq->last_seen_used != 
le16_to_cpu(virtq->used.idx)) {
    >                           virtq_enable_used_buffer_notifications(vq);
    >                           mb();
    >                           if (vq->last_seen_used != 
le16_to_cpu(virtq->used.idx))
    >                                   break;
    >                           virtq_disable_used_buffer_notifications(vq);
    >                   }
    >                   struct virtq_used_elem *e = 
virtq.used->ring[vq->last_seen_used%vsz];
    >                   process_buffer(e);
    >                   vq->last_seen_used++;
    >     
    >     
    >     
    >     >  
    >     > 
    >     > Steven
    >     > 
    >     
    >     > _______________________________________________
    >     > Virtualization mailing list
    >     > Virtualization@lists.linux-foundation.org
    >     > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
    >     
    >     
    > 
    

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to