Re: [Qemu-devel] [PATCH] virtio-net: announce self by guest

Jason Wang Mon, 19 May 2014 03:03:12 -0700

On 05/18/2014 05:04 PM, Michael S. Tsirkin wrote:
> On Fri, May 16, 2014 at 01:02:51PM +0800, Jason Wang wrote:
>> On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote:
>>> On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote:
>>>> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote:
>>>>> Thanks, looks good.
>>>>> Some minor comments below,
>>>>>
>>>>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote:
>>>>>> It's hard to track all mac addresses and their configurations (e.g
>>>>>> vlan or ipv6) in qemu. Without those informations, it's impossible to
>>>>> s/those informations/this information/
>>>>>
>>>>>> build proper garp packet after migration. The only possible solution
>>>>>> to this is let guest (who knew all configurations) to do this.
>>>>> s/knew/knows/
>>>>>
>>>>>> So, this patch introduces a new readonly config status bit of virtio-net,
>>>>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
>>>>>> presence of its link through config update interrupt.When guest has
>>>>>> done the announcement, it should ack the notification through
>>>>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
>>>>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
>>>>>> Linux guest).
>>>>>>
>>>>>> During load, a counter of announcing rounds were set so that the after
>>>>> s/were/is/
>>>>> s/the after/after/
>>>> Will correct those typos.
>>>>>> the vm is running it can trigger rounds of config interrupts to notify
>>>>>> the guest to build and send the correct garps.
>>>>>>
>>>>>> Tested with ping to guest with vlan during migration. Without the
>>>>>> patch, lots of the packets were lost after migration. With the patch,
>>>>>> could not notice packet loss after migration.
>>>>> below changelog should go after ---, until the ack list.
>>>>>
>>>> Ok.
>>>>>> Reference:
>>>>>> RFC v2: 
>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html
>>>>>> RFC v1: 
>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html
>>>>>> V7:     
>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html
>>>>>>
>>>>>> Changes from RFC v2:
>>>>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME
>>>>>> - compat self announce for 2.0 machine type
>>>>>>
>>>>>> Changes from RFC v1:
>>>>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset
>>>>>> - free announce timer during clean
>>>>>> - make announce work for non-vhost case
>>>>>>
>>>>>> Changes from V7:
>>>>>> - Instead of introducing a global method for each kind of nic, this
>>>>>>   version limits the changes to virtio-net itself.
>>>>>>
>>>>>> Cc: Liuyongan <liuyon...@huawei.com>
>>>>>> Cc: Amos Kong <ak...@redhat.com>
>>>>>> Signed-off-by: Jason Wang <jasow...@redhat.com>
>>>>>> ---
>>>>>>  hw/net/virtio-net.c            |   48 
>>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>>  include/hw/i386/pc.h           |    5 ++++
>>>>>>  include/hw/virtio/virtio-net.h |   16 +++++++++++++
>>>>>>  3 files changed, 69 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>> index 940a7cf..98d59e9 100644
>>>>>> --- a/hw/net/virtio-net.c
>>>>>> +++ b/hw/net/virtio-net.c
>>>>>> @@ -25,6 +25,7 @@
>>>>>>  #include "monitor/monitor.h"
>>>>>>  
>>>>>>  #define VIRTIO_NET_VM_VERSION    11
>>>>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS    3
>>>>>>  
>>>>>>  #define MAC_TABLE_ENTRIES    64
>>>>>>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
>>>>> I would make it  5 to be consistent with SELF_ANNOUNCE_ROUNDS
>>>>> in savevm.c
>>>>>
>>>>> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h
>>>>> and reuse it.
>>>> Ok.
>>>>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t 
>>>>>> status)
>>>>>>          (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>>>>>  }
>>>>>>  
>>>>>> +static void virtio_net_announce_timer(void *opaque)
>>>>>> +{
>>>>>> +    VirtIONet *n = opaque;
>>>>>> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>>> +
>>>>>> +    if (n->announce &&
>>>>> I would make it > 0 here, just in case it becomes negative as a result
>>>>> of some bug.
>>>> Sure.
>>>>>> +        virtio_net_started(n, vdev->status) &&
>>>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) {
>>>>>> +
>>>>>> +        n->announce--;
>>>>>> +        n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>>>> +        virtio_notify_config(vdev);
>>>>>> +    } else {
>>>>>> +        timer_del(n->announce_timer);
>>>>> why do this here?
>>>>>
>>>>>> +        n->announce = 0;
>>>>> why is this here?
>>>>>
>>>> If guest shuts down the device, there's no need to do the announcing.
>>> It's still weird.
>>> Either announce is 0 and then timer was not running
>>> or it's > 0 and then this won't trigger.
>> Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another
>> question, we use QEMU_CLOCK_VIRTUAL while qemu is using
>> QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure
>> whether this is safe.
> Meaning QEMU_CLOCK_REALTIME that qemu uses?
> Not sure either but it doesn't modify guest state so it seems safe.
>
>
>> And if the link was down, it's better for us to
>> stop the announcing?
> I think that it doesn't matter: guest won't announce when
> link is down, anyway.
> Not worth it to write extra logic here in the host.


Ok.
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>>>>  {
>>>>>>      VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct 
>>>>>> VirtIODevice *vdev, uint8_t status)
>>>>>>  
>>>>>>      virtio_net_vhost_status(n, status);
>>>>>>  
>>>>>> +    virtio_net_announce_timer(n);
>>>>>> +
>>>>> why do this here? why not right after we set announce counter?
>>>> The reasons are:
>>>>
>>>> - The counters were set in load, but the device is not running so we
>>>> could not inject the interrupt at that time.
>>> I see. This makes sense - but this isn't intuitive.
>>> Why don't we simply start timer with current time?
>>> Need to make sure it runs fine if time passes, but
>>> I think it's fine.
>> Not sure I get the point, I didn't see any differences except for an
>> extra timer fire.
> The only reason you call virtio_net_announce_timer from  set_status
> is because it gets run on vm start/stop.
> It's true but not intuitive.
> Just run timer always from timer, it's clearer this way :)
>

Sure.
>>>> - We can stop the progress when guest is shutting down the device.
>>> On shut down guest will reset device stopping timer - this seems enough.
>> Yes, I see.
>>>>>>      for (i = 0; i < n->max_queues; i++) {
>>>>>>          q = &n->vqs[i];
>>>>>>  
>>>>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>>>>      n->nobcast = 0;
>>>>>>      /* multiqueue is disabled by default */
>>>>>>      n->curr_queues = 1;
>>>>>> +    timer_del(n->announce_timer);
>>>>>> +    n->announce = 0;
>>>>>> +    n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>>  
>>>>>>      /* Flush any MAC and VLAN filter table state */
>>>>>>      n->mac_table.in_use = 0;
>>>>>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet 
>>>>>> *n, uint8_t cmd,
>>>>>>      return VIRTIO_NET_OK;
>>>>>>  }
>>>>>>  
>>>>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
>>>>>> +                                      struct iovec *iov, unsigned int 
>>>>>> iov_cnt)
>>>>>> +{
>>>>>> +    if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
>>>>>> +        n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>> +        if (n->announce) {
>>>>> I would make it > 0 here, just in case it becomes negative as a result
>>>>> of some bug.
>>>> Ok.
>>>>>> +            timer_mod(n->announce_timer,
>>>>>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 +
>>>>>> +                      (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * 
>>>>>> 100);
>>>>> savevm.c has this code, and a comment:
>>>>>         /* delay 50ms, 150ms, 250ms, ... */
>>>>>   50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100
>>>>>
>>>>> how about an API in include/migration/vmstate.h
>>>>>
>>>>> static inline
>>>>> int64_t self_announce_delay(int round)
>>>>> {
>>>>>   assert(round < SELF_ANNOUNCE_ROUNDS && round > 0);
>>>>>         /* delay 50ms, 150ms, 250ms, ... */
>>>>>   return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
>>>>> }
>>>>>
>>>>> or something to this end.
>>>>>
>>>> Good idea, will do this.
>>>>>> +        }
>>>>>> +        return VIRTIO_NET_OK;
>>>>>> +    } else {
>>>>>> +        return VIRTIO_NET_ERR;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>>  static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
>>>>>>                                  struct iovec *iov, unsigned int iov_cnt)
>>>>>>  {
>>>>>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice 
>>>>>> *vdev, VirtQueue *vq)
>>>>>>              status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
>>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
>>>>>>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, 
>>>>>> iov_cnt);
>>>>>> +        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
>>>>>> +            status = virtio_net_handle_announce(n, ctrl.cmd, iov, 
>>>>>> iov_cnt);
>>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
>>>>>>              status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
>>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
>>>>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void 
>>>>>> *opaque, int version_id)
>>>>>>          qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>>>>>      }
>>>>>>  
>>>>>> +    n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS;
>>>>> Well if virtio_net_handle_announce runs now it will start timer
>>>>> in the past, right?
>>>>> This seems wrong.
>>>> Not sure I get the case. When in virtio_net_load() the vm is not even
>>>> running so looks like virtio_net_handle_announce() could not run in the
>>>> same time.
>>> I see, this works because you decrement it when VM starts running.
>>> I think it's not a good idea to rely on this though,
>>> better do everything from timer, and handle
>>> the case of command arriving too early.
>>>
>> Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer.
>>
>> For the case of command arriving too early. Is there anything else we
>> need to do? Since we only start the next timer when we get ack command.
>>
>> Thanks
> I think we need to make sure we don't set the timer in the past or
> very far in the future.

n->announce is between 0 and VIRTIO_NET_ANNOUNCE_ROUNDS - 1, when doing
timer_mod(), so it looks ok. There's another case that if vm was stopped
for a long time, the timer will also be delayed, but it's still safe in
this case.

Re: [Qemu-devel] [PATCH] virtio-net: announce self by guest

Reply via email to