* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > On 2014/8/18 20:27, Dr. David Alan Gilbert wrote: > >* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > >>For all NICs(except virtio-net) emulated by qemu, > >>Such as e1000, rtl8139, pcnet and ne2k_pci, > >>Qemu can still receive packets when VM is not running. > >>If this happened in *migration's* last PAUSE VM stage, > >>The new dirty RAM related to the packets will be missed, > >>And this will lead serious network fault in VM. > > > >I'd like to understand more about when exactly this happens. > >migration.c:migration_thread in the last step, takes the iothread, puts > >the runstate into RUN_STATE_FINISH_MIGRATE and only then calls > >qemu_savevm_state_complete > >before unlocking the iothread. > > > > Hmm, Sorry, the description above may not be exact. > > Actually, the action of receiving packets action happens after the > migration thread unlock the iothread-lock(when VM is stopped), but > before the end of the migration(to be exact, before close the socket > fd, which is used to send data to destination). > > I think before close the socket fd, we can not assure all the RAM of the > VM has been copied to the memory of network card or has been send to > the destination, we still have the chance to modify its content. (If i > am wrong, please let me know, Thanks. ;) ) > > If the above situation happens, it will dirty parts of the RAM which > will be send and parts of new dirty RAM page may be missed. As a result, > The contents of memory in the destination is not integral, And this > will lead network fault in VM.
Interesting; the only reason I can think that would happen, is because arch_init.c:ram_save_page uses qemu_put_buffer_async to send most of the RAM pages and that won't make sure the write happens until later. This fix will probably help other migration code as well; postcopy runs the migration for a lot longer after stopping the CPUs, and I am seeing something that might be due to this very occasionally. Dave > Here i have made a test: > (1) Download the new qemu. > (2) Extend the time window between qemu_savevm_state_complete and > migrate_fd_cleanup(where qemu_fclose(s->file) will be called), patch > like(this will extend the chances of receiving packets between the time > window): > diff --git a/migration.c b/migration.c > index 8d675b3..597abf9 100644 > --- a/migration.c > +++ b/migration.c > @@ -614,7 +614,7 @@ static void *migration_thread(void *opaque) > qemu_savevm_state_complete(s->file); > } > qemu_mutex_unlock_iothread(); > - > + sleep(2);/*extend the time window between stop vm and end > migration*/ > if (ret < 0) { > migrate_set_state(s, MIG_STATE_ACTIVE, > MIG_STATE_ERROR); > break; > (3) Start VM (suse11sp1) and in VM ping xxx -i 0.1 > (4) Migrate the VM to another host. > > And the *PING* command in VM will very likely fail with message: > 'Destination HOST Unreachable', the NIC in VM will stay unavailable > unless you run 'service network restart'.(without step 2, you may have > to migration lots of times between two hosts before that happens). > > Thanks, > zhanghailiang > > >If that's true, how does this problem happen - can the net code > >still receive packets with the iothread lock taken? > >qemu_savevm_state_complete does a call to the RAM code, so I think this > >should get > >any RAM changes that happened before the lock was taken. > > > >I'm gently worried about threading stuff as well - is all this net code > >running off fd handlers on the main thread or are there separate threads? > > > >Dave > > > >>To avoid this, we forbid receiving packets in generic net code when > >>VM is not running. Also, when the runstate changes back to running, > >>we definitely need to flush queues to get packets flowing again. > >> > >>Here we implement this in the net layer: > >>(1) Judge the vm runstate in qemu_can_send_packet > >>(2) Add a member 'VMChangeStateEntry *vmstate' to struct NICState, > >>Which will listen for VM runstate changes. > >>(3) Register a handler function for VMstate change. > >>When vm changes back to running, we flush all queues in the callback > >>function. > >>(4) Remove checking vm state in virtio_net_can_receive > >> > >>Signed-off-by: zhanghailiang<zhang.zhanghaili...@huawei.com> > >>--- > >> hw/net/virtio-net.c | 4 ---- > >> include/net/net.h | 2 ++ > >> net/net.c | 32 ++++++++++++++++++++++++++++++++ > >> 3 files changed, 34 insertions(+), 4 deletions(-) > >> > >>diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >>index 268eff9..287d762 100644 > >>--- a/hw/net/virtio-net.c > >>+++ b/hw/net/virtio-net.c > >>@@ -839,10 +839,6 @@ static int virtio_net_can_receive(NetClientState *nc) > >> VirtIODevice *vdev = VIRTIO_DEVICE(n); > >> VirtIONetQueue *q = virtio_net_get_subqueue(nc); > >> > >>- if (!vdev->vm_running) { > >>- return 0; > >>- } > >>- > >> if (nc->queue_index>= n->curr_queues) { > >> return 0; > >> } > >>diff --git a/include/net/net.h b/include/net/net.h > >>index ed594f9..a294277 100644 > >>--- a/include/net/net.h > >>+++ b/include/net/net.h > >>@@ -8,6 +8,7 @@ > >> #include "net/queue.h" > >> #include "migration/vmstate.h" > >> #include "qapi-types.h" > >>+#include "sysemu/sysemu.h" > >> > >> #define MAX_QUEUE_NUM 1024 > >> > >>@@ -96,6 +97,7 @@ typedef struct NICState { > >> NICConf *conf; > >> void *opaque; > >> bool peer_deleted; > >>+ VMChangeStateEntry *vmstate; > >> } NICState; > >> > >> NetClientState *qemu_find_netdev(const char *id); > >>diff --git a/net/net.c b/net/net.c > >>index 6d930ea..21f0d48 100644 > >>--- a/net/net.c > >>+++ b/net/net.c > >>@@ -242,6 +242,29 @@ NetClientState *qemu_new_net_client(NetClientInfo > >>*info, > >> return nc; > >> } > >> > >>+static void nic_vmstate_change_handler(void *opaque, > >>+ int running, > >>+ RunState state) > >>+{ > >>+ NICState *nic = opaque; > >>+ NetClientState *nc; > >>+ int i, queues; > >>+ > >>+ if (!running) { > >>+ return; > >>+ } > >>+ > >>+ queues = MAX(1, nic->conf->peers.queues); > >>+ for (i = 0; i< queues; i++) { > >>+ nc =&nic->ncs[i]; > >>+ if (nc->receive_disabled > >>+ || (nc->info->can_receive&& !nc->info->can_receive(nc))) { > >>+ continue; > >>+ } > >>+ qemu_flush_queued_packets(nc); > >>+ } > >>+} > >>+ > >> NICState *qemu_new_nic(NetClientInfo *info, > >> NICConf *conf, > >> const char *model, > >>@@ -259,6 +282,8 @@ NICState *qemu_new_nic(NetClientInfo *info, > >> nic->ncs = (void *)nic + info->size; > >> nic->conf = conf; > >> nic->opaque = opaque; > >>+ nic->vmstate = > >>qemu_add_vm_change_state_handler(nic_vmstate_change_handler, > >>+ nic); > >> > >> for (i = 0; i< queues; i++) { > >> qemu_net_client_setup(&nic->ncs[i], info, peers[i], model, name, > >>@@ -379,6 +404,7 @@ void qemu_del_nic(NICState *nic) > >> qemu_free_net_client(nc); > >> } > >> > >>+ qemu_del_vm_change_state_handler(nic->vmstate); > >> g_free(nic); > >> } > >> > >>@@ -452,6 +478,12 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len) > >> > >> int qemu_can_send_packet(NetClientState *sender) > >> { > >>+ int vmstat = runstate_is_running(); > >>+ > >>+ if (!vmstat) { > >>+ return 0; > >>+ } > >>+ > >> if (!sender->peer) { > >> return 1; > >> } > >>-- > >>1.7.12.4 > >> > >> > >> > >-- > >Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > >. > > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK