Re: kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue

Steven Luong (sluong) via Virtualization Wed, 19 Dec 2018 17:54:21 -0800


On 12/17/18, 2:55 PM, "Michael S. Tsirkin" <[email protected]> wrote:


    On Thu, Dec 13, 2018 at 11:24:28PM +0000, Steven Luong (sluong) via 
Virtualization wrote:
    > Folks,
    > 
    >  
    > 
    > We came across a memory race condition between VPP vhost driver and the 
kernel
    > vhost. VPP is running a tap interface over vhost backend. In this case, 
VPP is
    > acting as the vhost driver mode and the kernel vhost is acting as the 
vhost
    > device mode.
    > 
    >  
    > 
    > In the kernel vhost’s TX traffic direction which is VPP’s RX traffic 
direction,
    > kernel vhost is the producer and VPP is the consumer.
    
    Looking at vhost net kernel code, it seems to use the
    same terminology as virtio net.
    Can you pls clarify which ring number do you use 0 or 1?

<SVL> 0. </SVL>
    
    I will assume ring 0 below.
    
    
    
    > Kernel vhost places
    > traffic in kernel vhost’s TX vring. VPP removes traffic in VPP’s RX vring.
    
    
    So VPP makes buffers available and vhost kernel uses them?
    
<SVL> Correct. </SVL>
    
    > It
    > is inevitable that the vring may become full under heavy traffic and the 
kernel
    > vhost may have to stop and wait for the guest (VPP) to empty the vring 
and to
    > refill the vring with descriptors. When that case happens, kernel vhost 
clears
    > the bit in the vring’s used flag to request an interrupt/notification. 
Due to
    > shared memory race condition, VPP may miss the clearing of the vring’s 
used
    > flag from kernel vhost and didn’t send kernel vhost an interrupt after VPP
    > empties and refills the vring with new descriptors.
    
    So kernel vhost's wakeups are signalled through eventfd - I assume
    this is what you mean by an interrupt?
    
    
    To prevent the race that you describe, vhost has this code:
    
                    /* OK, now we need to know about added descriptors. */
                    if (!headcount) {
                            if (unlikely(busyloop_intr)) {
                                    vhost_poll_queue(&vq->poll);
                            } else if (unlikely(vhost_enable_notify(&net->dev, 
vq))) {
                                    /* They have slipped one in as we were
                                     * doing that: check again. */
                                    vhost_disable_notify(&net->dev, vq);
                                    continue;
                            }
                            /* Nothing new?  Wait for eventfd to tell us
    
    So ring should be re-checked after notifications are enabled.
    If ring is no longer empty, vhost will proceed to handle the ring.  
    This code has been around for many years.
    
    Thus if VPP doesn't work with it then I suspect it does something
    differently, e.g. is it possible that it is missing a memory
    barrier after writing out the available buffers and before
    checking the flags?
    
    <SVL> Can the race condition be avoided totally? Here is the scenario.

t0: VPP refills the vring with new descriptors, not yet sets avail->idx to make 
it available to kernel vhost. 
t1: kernel vhost checks the vring, still sees no space available, but does not 
yet execute the line of code to clear the used->flags
t2: vpp executes sfence, and sets avail->idx to make it available to kernel 
vhost
t3: vpp checks used->flags, it is still 1, vpp skips sending the interrupt
t4: kernel vhost clears used->flags to indicate interrupt is required. But VPP 
just missed it. The result is kernel vhost got stuck even though the vring is 
filled with new descriptors.

Steven

</SVL>
    
    
    > Unfortunately, this missed
    > notification causes the kernel vhost to be stuck because once the kernel 
vhost
    > is waiting for an interrupt/notification from the guest, only an 
interrupt/
    > notification from the guest can resume/re-enable the guest from 
submitting new
    > packets to the vring. This design seems vulnerable.
    
    The protocol right now relies on notifications never being lost.
    
    I can imagine an alternative where device also re-checks
    the rings periodically, but that would involve running
    timers host side which is only possible if there's a
    free processor that can handle them. Further,
    it will still lead to stalls and packet drops, which will
    be harder to debug.
    
    That's just my $.02, pls feel free to take it with the virtio tc
    maybe others will feel differently.
    
    > Should the kernel vhost
    > totally depend on the notification from the guest? Quoting the text from
    > 
    >  
    > 
    > http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html
    > 
    >  
    > 
    > /* The device uses this in used->flags to advise the driver: don’t kick 
me 
    >  * when you add a buffer.  It’s unreliable, so it’s simply an 
    >  * optimization. */ 
    > #define VIRTQ_USED_F_NO_NOTIFY  1 
    > 
    >  
    > 
    > I interpret that the notification is simply an optimization, not a 
reliable
    > notification mechanism. 
    
    What was meant I think is that suppression is unreliable.
    
    >So the kernel vhost should not bet the farm on it.
    > 
    >  
    > 
    > We encounter the same race condition in kernel vhost’s RX traffic 
direction
    > which causes the kernel vhost to be stuck due to missed 
interrupt/notification
    > although it is less frequent.
    
    
    For the reverse direction the spec actually has some pseudo-code and a 
suggestion
    about handling that race:
    
        For optimal performance, a driver MAY disable used buffer notifications 
while processing the used
        ring, but beware the problem of missing notifications between emptying 
the ring and reenabling no-
        tifications. This is usually handled by re-checking for more used 
buffers after notifications are re-
        enabled:
    
                virtq_disable_used_buffer_notifications(vq);
                for (;;) {
                        if (vq->last_seen_used != le16_to_cpu(virtq->used.idx)) 
{
                        virtq_enable_used_buffer_notifications(vq);
                        mb();
                        if (vq->last_seen_used != le16_to_cpu(virtq->used.idx))
                                break;
                        virtq_disable_used_buffer_notifications(vq);
                }
                struct virtq_used_elem *e = 
virtq.used->ring[vq->last_seen_used%vsz];
                process_buffer(e);
                vq->last_seen_used++;
    
    
    
    >  
    > 
    > Steven
    > 
    
    > _______________________________________________
    > Virtualization mailing list
    > [email protected]
    > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
    
    

_______________________________________________
Virtualization mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue

Reply via email to