On 05/18/2014 05:04 PM, Michael S. Tsirkin wrote: > On Fri, May 16, 2014 at 01:02:51PM +0800, Jason Wang wrote: >> On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote: >>> On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote: >>>> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote: >>>>> Thanks, looks good. >>>>> Some minor comments below, >>>>> >>>>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote: >>>>>> It's hard to track all mac addresses and their configurations (e.g >>>>>> vlan or ipv6) in qemu. Without those informations, it's impossible to >>>>> s/those informations/this information/ >>>>> >>>>>> build proper garp packet after migration. The only possible solution >>>>>> to this is let guest (who knew all configurations) to do this. >>>>> s/knew/knows/ >>>>> >>>>>> So, this patch introduces a new readonly config status bit of virtio-net, >>>>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce >>>>>> presence of its link through config update interrupt.When guest has >>>>>> done the announcement, it should ack the notification through >>>>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new >>>>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by >>>>>> Linux guest). >>>>>> >>>>>> During load, a counter of announcing rounds were set so that the after >>>>> s/were/is/ >>>>> s/the after/after/ >>>> Will correct those typos. >>>>>> the vm is running it can trigger rounds of config interrupts to notify >>>>>> the guest to build and send the correct garps. >>>>>> >>>>>> Tested with ping to guest with vlan during migration. Without the >>>>>> patch, lots of the packets were lost after migration. With the patch, >>>>>> could not notice packet loss after migration. >>>>> below changelog should go after ---, until the ack list. >>>>> >>>> Ok. >>>>>> Reference: >>>>>> RFC v2: >>>>>> https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html >>>>>> RFC v1: >>>>>> https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html >>>>>> V7: >>>>>> https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html >>>>>> >>>>>> Changes from RFC v2: >>>>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME >>>>>> - compat self announce for 2.0 machine type >>>>>> >>>>>> Changes from RFC v1: >>>>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset >>>>>> - free announce timer during clean >>>>>> - make announce work for non-vhost case >>>>>> >>>>>> Changes from V7: >>>>>> - Instead of introducing a global method for each kind of nic, this >>>>>> version limits the changes to virtio-net itself. >>>>>> >>>>>> Cc: Liuyongan <liuyon...@huawei.com> >>>>>> Cc: Amos Kong <ak...@redhat.com> >>>>>> Signed-off-by: Jason Wang <jasow...@redhat.com> >>>>>> --- >>>>>> hw/net/virtio-net.c | 48 >>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>> include/hw/i386/pc.h | 5 ++++ >>>>>> include/hw/virtio/virtio-net.h | 16 +++++++++++++ >>>>>> 3 files changed, 69 insertions(+), 0 deletions(-) >>>>>> >>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>>> index 940a7cf..98d59e9 100644 >>>>>> --- a/hw/net/virtio-net.c >>>>>> +++ b/hw/net/virtio-net.c >>>>>> @@ -25,6 +25,7 @@ >>>>>> #include "monitor/monitor.h" >>>>>> >>>>>> #define VIRTIO_NET_VM_VERSION 11 >>>>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS 3 >>>>>> >>>>>> #define MAC_TABLE_ENTRIES 64 >>>>>> #define MAX_VLAN (1 << 12) /* Per 802.1Q definition */ >>>>> I would make it 5 to be consistent with SELF_ANNOUNCE_ROUNDS >>>>> in savevm.c >>>>> >>>>> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h >>>>> and reuse it. >>>> Ok. >>>>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t >>>>>> status) >>>>>> (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running; >>>>>> } >>>>>> >>>>>> +static void virtio_net_announce_timer(void *opaque) >>>>>> +{ >>>>>> + VirtIONet *n = opaque; >>>>>> + VirtIODevice *vdev = VIRTIO_DEVICE(n); >>>>>> + >>>>>> + if (n->announce && >>>>> I would make it > 0 here, just in case it becomes negative as a result >>>>> of some bug. >>>> Sure. >>>>>> + virtio_net_started(n, vdev->status) && >>>>>> + vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) && >>>>>> + vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) { >>>>>> + >>>>>> + n->announce--; >>>>>> + n->status |= VIRTIO_NET_S_ANNOUNCE; >>>>>> + virtio_notify_config(vdev); >>>>>> + } else { >>>>>> + timer_del(n->announce_timer); >>>>> why do this here? >>>>> >>>>>> + n->announce = 0; >>>>> why is this here? >>>>> >>>> If guest shuts down the device, there's no need to do the announcing. >>> It's still weird. >>> Either announce is 0 and then timer was not running >>> or it's > 0 and then this won't trigger. >> Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another >> question, we use QEMU_CLOCK_VIRTUAL while qemu is using >> QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure >> whether this is safe. > Meaning QEMU_CLOCK_REALTIME that qemu uses? > Not sure either but it doesn't modify guest state so it seems safe. > > >> And if the link was down, it's better for us to >> stop the announcing? > I think that it doesn't matter: guest won't announce when > link is down, anyway. > Not worth it to write extra logic here in the host.
Ok. >>>>>> + } >>>>>> +} >>>>>> + >>>>>> static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) >>>>>> { >>>>>> VirtIODevice *vdev = VIRTIO_DEVICE(n); >>>>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct >>>>>> VirtIODevice *vdev, uint8_t status) >>>>>> >>>>>> virtio_net_vhost_status(n, status); >>>>>> >>>>>> + virtio_net_announce_timer(n); >>>>>> + >>>>> why do this here? why not right after we set announce counter? >>>> The reasons are: >>>> >>>> - The counters were set in load, but the device is not running so we >>>> could not inject the interrupt at that time. >>> I see. This makes sense - but this isn't intuitive. >>> Why don't we simply start timer with current time? >>> Need to make sure it runs fine if time passes, but >>> I think it's fine. >> Not sure I get the point, I didn't see any differences except for an >> extra timer fire. > The only reason you call virtio_net_announce_timer from set_status > is because it gets run on vm start/stop. > It's true but not intuitive. > Just run timer always from timer, it's clearer this way :) > Sure. >>>> - We can stop the progress when guest is shutting down the device. >>> On shut down guest will reset device stopping timer - this seems enough. >> Yes, I see. >>>>>> for (i = 0; i < n->max_queues; i++) { >>>>>> q = &n->vqs[i]; >>>>>> >>>>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev) >>>>>> n->nobcast = 0; >>>>>> /* multiqueue is disabled by default */ >>>>>> n->curr_queues = 1; >>>>>> + timer_del(n->announce_timer); >>>>>> + n->announce = 0; >>>>>> + n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>>>> >>>>>> /* Flush any MAC and VLAN filter table state */ >>>>>> n->mac_table.in_use = 0; >>>>>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet >>>>>> *n, uint8_t cmd, >>>>>> return VIRTIO_NET_OK; >>>>>> } >>>>>> >>>>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd, >>>>>> + struct iovec *iov, unsigned int >>>>>> iov_cnt) >>>>>> +{ >>>>>> + if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) { >>>>>> + n->status &= ~VIRTIO_NET_S_ANNOUNCE; >>>>>> + if (n->announce) { >>>>> I would make it > 0 here, just in case it becomes negative as a result >>>>> of some bug. >>>> Ok. >>>>>> + timer_mod(n->announce_timer, >>>>>> + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 + >>>>>> + (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * >>>>>> 100); >>>>> savevm.c has this code, and a comment: >>>>> /* delay 50ms, 150ms, 250ms, ... */ >>>>> 50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100 >>>>> >>>>> how about an API in include/migration/vmstate.h >>>>> >>>>> static inline >>>>> int64_t self_announce_delay(int round) >>>>> { >>>>> assert(round < SELF_ANNOUNCE_ROUNDS && round > 0); >>>>> /* delay 50ms, 150ms, 250ms, ... */ >>>>> return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100; >>>>> } >>>>> >>>>> or something to this end. >>>>> >>>> Good idea, will do this. >>>>>> + } >>>>>> + return VIRTIO_NET_OK; >>>>>> + } else { >>>>>> + return VIRTIO_NET_ERR; >>>>>> + } >>>>>> +} >>>>>> + >>>>>> static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd, >>>>>> struct iovec *iov, unsigned int iov_cnt) >>>>>> { >>>>>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice >>>>>> *vdev, VirtQueue *vq) >>>>>> status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt); >>>>>> } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) { >>>>>> status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, >>>>>> iov_cnt); >>>>>> + } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) { >>>>>> + status = virtio_net_handle_announce(n, ctrl.cmd, iov, >>>>>> iov_cnt); >>>>>> } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) { >>>>>> status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt); >>>>>> } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) { >>>>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void >>>>>> *opaque, int version_id) >>>>>> qemu_get_subqueue(n->nic, i)->link_down = link_down; >>>>>> } >>>>>> >>>>>> + n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS; >>>>> Well if virtio_net_handle_announce runs now it will start timer >>>>> in the past, right? >>>>> This seems wrong. >>>> Not sure I get the case. When in virtio_net_load() the vm is not even >>>> running so looks like virtio_net_handle_announce() could not run in the >>>> same time. >>> I see, this works because you decrement it when VM starts running. >>> I think it's not a good idea to rely on this though, >>> better do everything from timer, and handle >>> the case of command arriving too early. >>> >> Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer. >> >> For the case of command arriving too early. Is there anything else we >> need to do? Since we only start the next timer when we get ack command. >> >> Thanks > I think we need to make sure we don't set the timer in the past or > very far in the future. n->announce is between 0 and VIRTIO_NET_ANNOUNCE_ROUNDS - 1, when doing timer_mod(), so it looks ok. There's another case that if vm was stopped for a long time, the timer will also be delayed, but it's still safe in this case.