On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote: > On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote: >> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote: >>> Thanks, looks good. >>> Some minor comments below, >>> >>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote: >>>> It's hard to track all mac addresses and their configurations (e.g >>>> vlan or ipv6) in qemu. Without those informations, it's impossible to >>> s/those informations/this information/ >>> >>>> build proper garp packet after migration. The only possible solution >>>> to this is let guest (who knew all configurations) to do this. >>> s/knew/knows/ >>> >>>> So, this patch introduces a new readonly config status bit of virtio-net, >>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce >>>> presence of its link through config update interrupt.When guest has >>>> done the announcement, it should ack the notification through >>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new >>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by >>>> Linux guest). >>>> >>>> During load, a counter of announcing rounds were set so that the after >>> s/were/is/ >>> s/the after/after/ >> Will correct those typos. >>>> the vm is running it can trigger rounds of config interrupts to notify >>>> the guest to build and send the correct garps. >>>> >>>> Tested with ping to guest with vlan during migration. Without the >>>> patch, lots of the packets were lost after migration. With the patch, >>>> could not notice packet loss after migration. >>> below changelog should go after ---, until the ack list. >>> >> Ok. >>>> Reference: >>>> RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html >>>> RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html >>>> V7: https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html >>>> >>>> Changes from RFC v2: >>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME >>>> - compat self announce for 2.0 machine type >>>> >>>> Changes from RFC v1: >>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset >>>> - free announce timer during clean >>>> - make announce work for non-vhost case >>>> >>>> Changes from V7: >>>> - Instead of introducing a global method for each kind of nic, this >>>> version limits the changes to virtio-net itself. >>>> >>>> Cc: Liuyongan <liuyon...@huawei.com> >>>> Cc: Amos Kong <ak...@redhat.com> >>>> Signed-off-by: Jason Wang <jasow...@redhat.com> >>>> --- >>>> hw/net/virtio-net.c | 48 >>>> ++++++++++++++++++++++++++++++++++++++++ >>>> include/hw/i386/pc.h | 5 ++++ >>>> include/hw/virtio/virtio-net.h | 16 +++++++++++++ >>>> 3 files changed, 69 insertions(+), 0 deletions(-) >>>> >>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>> index 940a7cf..98d59e9 100644 >>>> --- a/hw/net/virtio-net.c >>>> +++ b/hw/net/virtio-net.c >>>> @@ -25,6 +25,7 @@ >>>> #include "monitor/monitor.h" >>>> >>>> #define VIRTIO_NET_VM_VERSION 11 >>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS 3 >>>> >>>> #define MAC_TABLE_ENTRIES 64 >>>> #define MAX_VLAN (1 << 12) /* Per 802.1Q definition */ >>> I would make it 5 to be consistent with SELF_ANNOUNCE_ROUNDS >>> in savevm.c >>> >>> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h >>> and reuse it. >> Ok. >>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t >>>> status) >>>> (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running; >>>> } >>>> >>>> +static void virtio_net_announce_timer(void *opaque) >>>> +{ >>>> + VirtIONet *n = opaque; >>>> + VirtIODevice *vdev = VIRTIO_DEVICE(n); >>>> + >>>> + if (n->announce && >>> I would make it > 0 here, just in case it becomes negative as a result >>> of some bug. >> Sure. >>>> + virtio_net_started(n, vdev->status) && >>>> + vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) && >>>> + vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) { >>>> + >>>> + n->announce--; >>>> + n->status |= VIRTIO_NET_S_ANNOUNCE; >>>> + virtio_notify_config(vdev); >>>> + } else { >>>> + timer_del(n->announce_timer); >>> why do this here? >>> >>>> + n->announce = 0; >>> why is this here? >>> >> If guest shuts down the device, there's no need to do the announcing. > It's still weird. > Either announce is 0 and then timer was not running > or it's > 0 and then this won't trigger.
Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another question, we use QEMU_CLOCK_VIRTUAL while qemu is using QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure whether this is safe. And if the link was down, it's better for us to stop the announcing? > >>>> + } >>>> +} >>>> + >>>> static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) >>>> { >>>> VirtIODevice *vdev = VIRTIO_DEVICE(n); >>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice >>>> *vdev, uint8_t status) >>>> >>>> virtio_net_vhost_status(n, status); >>>> >>>> + virtio_net_announce_timer(n); >>>> + >>> why do this here? why not right after we set announce counter? >> The reasons are: >> >> - The counters were set in load, but the device is not running so we >> could not inject the interrupt at that time. > I see. This makes sense - but this isn't intuitive. > Why don't we simply start timer with current time? > Need to make sure it runs fine if time passes, but > I think it's fine. Not sure I get the point, I didn't see any differences except for an extra timer fire. > >> - We can stop the progress when guest is shutting down the device. > On shut down guest will reset device stopping timer - this seems enough. Yes, I see. >>>> for (i = 0; i < n->max_queues; i++) { >>>> q = &n->vqs[i]; >>>> >>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev) >>>> n->nobcast = 0; >>>> /* multiqueue is disabled by default */ >>>> n->curr_queues = 1; >>>> + timer_del(n->announce_timer); >>>> + n->announce = 0; >>>> + n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>> >>>> /* Flush any MAC and VLAN filter table state */ >>>> n->mac_table.in_use = 0; >>>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, >>>> uint8_t cmd, >>>> return VIRTIO_NET_OK; >>>> } >>>> >>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd, >>>> + struct iovec *iov, unsigned int >>>> iov_cnt) >>>> +{ >>>> + if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) { >>>> + n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>> + if (n->announce) { >>> I would make it > 0 here, just in case it becomes negative as a result >>> of some bug. >> Ok. >>>> + timer_mod(n->announce_timer, >>>> + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 + >>>> + (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * >>>> 100); >>> savevm.c has this code, and a comment: >>> /* delay 50ms, 150ms, 250ms, ... */ >>> 50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100 >>> >>> how about an API in include/migration/vmstate.h >>> >>> static inline >>> int64_t self_announce_delay(int round) >>> { >>> assert(round < SELF_ANNOUNCE_ROUNDS && round > 0); >>> /* delay 50ms, 150ms, 250ms, ... */ >>> return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100; >>> } >>> >>> or something to this end. >>> >> Good idea, will do this. >>>> + } >>>> + return VIRTIO_NET_OK; >>>> + } else { >>>> + return VIRTIO_NET_ERR; >>>> + } >>>> +} >>>> + >>>> static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd, >>>> struct iovec *iov, unsigned int iov_cnt) >>>> { >>>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, >>>> VirtQueue *vq) >>>> status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt); >>>> } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) { >>>> status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, >>>> iov_cnt); >>>> + } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) { >>>> + status = virtio_net_handle_announce(n, ctrl.cmd, iov, >>>> iov_cnt); >>>> } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) { >>>> status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt); >>>> } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) { >>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void >>>> *opaque, int version_id) >>>> qemu_get_subqueue(n->nic, i)->link_down = link_down; >>>> } >>>> >>>> + n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS; >>> Well if virtio_net_handle_announce runs now it will start timer >>> in the past, right? >>> This seems wrong. >> Not sure I get the case. When in virtio_net_load() the vm is not even >> running so looks like virtio_net_handle_announce() could not run in the >> same time. > I see, this works because you decrement it when VM starts running. > I think it's not a good idea to rely on this though, > better do everything from timer, and handle > the case of command arriving too early. > Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer. For the case of command arriving too early. Is there anything else we need to do? Since we only start the next timer when we get ack command. Thanks