Hi wolfgang, I have done more test with drive mirror and nbd target.
It seem that the hang occur only if the target ip is unreacheable (no network reponse) # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout If the ip address exist and up, # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target Failed to connect socket: Connection refused I'm not sure, maybe it can hang too if pve-firewall do a drop instead a reject on target port. I think this come from in qemu net/socket.c, where we have an infinite loop. I'm not sure how to add a timeout here, help is welcome :) static int net_socket_connect_init(NetClientState *peer, const char *model, const char *name, const char *host_str) { NetSocketState *s; int fd, connected, ret; struct sockaddr_in saddr; if (parse_host_port(&saddr, host_str) < 0) return -1; fd = qemu_socket(PF_INET, SOCK_STREAM, 0); if (fd < 0) { perror("socket"); return -1; } qemu_set_nonblock(fd); connected = 0; for(;;) { ret = connect(fd, (struct sockaddr *)&saddr, sizeof(saddr)); if (ret < 0) { if (errno == EINTR || errno == EWOULDBLOCK) { /* continue */ } else if (errno == EINPROGRESS || errno == EALREADY || errno == EINVAL) { break; } else { perror("connect"); closesocket(fd); return -1; } } else { connected = 1; break; } } s = net_socket_fd_init(peer, model, name, fd, connected); if (!s) return -1; snprintf(s->nc.info_str, sizeof(s->nc.info_str), "socket: connect to %s:%d", inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port)); return 0; } ----- Mail original ----- De: "aderumier" <aderum...@odiso.com> À: "Wolfgang Bumiller" <w.bumil...@proxmox.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Mercredi 21 Décembre 2016 13:57:10 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs >>Then it can still hang if the destination disappears between tcp_ping() >>and the `drive-mirror` command, so I'd rather get better behavior on qemu's >>side. It needs a time-out or a way to cancel it or something. Yes sure! I'm currently looking at qemu code to see how nbd client works. ----- Mail original ----- De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com>, "dietmar" <diet...@proxmox.com> Envoyé: Mercredi 21 Décembre 2016 12:20:28 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > On December 21, 2016 at 10:51 AM Alexandre DERUMIER <aderum...@odiso.com> > wrote: > > > >>IIRC that was the only blocker. > >> > >>Basically the patchset has to work *without* tcp_ping() since it is an > >>unreliable check, and then we still have to catch failing connections > >>_correctly_. (There's no point in knowing that "some time in the past > >>you were able to connect to something which may or may not have been a > >>qemu nbd server", we need to know whether the drive-mirror job itself > >>was able to connect.) > > For me, the mirror job auto abort if connection is failing during the > migration. Do you see another behaviour ? That covers one problem. IIRC the disk-deletion problem was that due to wrong [] usage around an ipv6 address it could not connect in the first place and didn't error as I would have hoped. > > the tcp_ping was just before launching the drive mirror command, because it > was hanging in this case. Then it can still hang if the destination disappears between tcp_ping() and the `drive-mirror` command, so I'd rather get better behavior on qemu's side. It needs a time-out or a way to cancel it or something. _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel