Just has done small test with a socat tunnel
socat UNIX-LISTEN:/tmp/foo.sock,reuseaddr,fork TCP:8.8.8.8:1234
drive_mirror -n drive-scsi0 nbd:unix:/tmp/foo.sock:test
then it's hanging like before, because no reponse. (we can make a small timeout
to qmp response wait)
then,we can kill socat, which is dropping the unix socket, and drive-mirror
abort and freeing qmp commands.
I'll try to send a demo patch monday.
- Mail original -
De: "aderumier"
À: "Wolfgang Bumiller"
Cc: "pve-devel"
Envoyé: Vendredi 23 Décembre 2016 11:33:18
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>>Doesn't the job show up in query-block-jobs before the connection
>>finishes? If so then we could just assume if it's still at 0% after a
>>timeout the connection doesn't work? The question is though, what is a
>>good timeout?
the problem is that the qemu monitor is totally hanging...
>>Or we find (or add) a way to query the connection status, then we can
>>poll that before we start polling the block-job status.
>>(we already poll for stuff, so it's not worse than before ;p)
maybe can we mount a tunnel (it's possible to use unix socket like for live
migration).
but I would like to have something without ssh if possible.
maybe with socat or something like (it's possible to do native tls in qemu now
for migration and drive mirror)
Like this we can be sure that the tunnel is established before launching
drive-mirror ?
But yes, it could be great to avoid to maintain another qemu patch.
(and I'm pretty poor with C language and network programming)
- Mail original -
De: "Wolfgang Bumiller"
À: "aderumier"
Cc: "pve-devel"
Envoyé: Vendredi 23 Décembre 2016 10:53:12
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
On Fri, Dec 23, 2016 at 10:20:03AM +0100, Alexandre DERUMIER wrote:
> Hi wolfgang,
>
> I have done more test with drive mirror and nbd target.
>
> It seem that the hang occur only if the target ip is unreacheable (no network
> reponse)
>
> # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target
> ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout
>
> If the ip address exist and up,
> # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target
> Failed to connect socket: Connection refused
>
>
>
> I'm not sure, maybe it can hang too if pve-firewall do a drop instead a
> reject on target port.
>
>
>
> I think this come from in qemu net/socket.c,
>
> where we have an infinite loop.
>
> I'm not sure how to add a timeout here, help is welcome :)
Doesn't the job show up in query-block-jobs before the connection
finishes? If so then we could just assume if it's still at 0% after a
timeout the connection doesn't work? The question is though, what is a
good timeout?
Or we find (or add) a way to query the connection status, then we can
poll that before we start polling the block-job status.
(we already poll for stuff, so it's not worse than before ;p)
Then again, an actual timeout on qemu's side might even make it
upstream...? Maybe? (If it doesn't I'd prefer polling as described above
over maintaining yet another qemu patch ;-) )
But it won't be a simple select/poll timeout as they use non-blocking
sockets and register the fd with a callback for the connection
completion later (in net_socket_fd_init() it calls
qemu_set_fd_handler() where it ends up controlled by the aio handlers).
It might be possible to add a timer (qemu_timer_new() & friends
perhaps?) to NetSocketState (the NetSocktState cleanup would have to
remove it, and if it gets triggered it cancels the connection instead).
> static int net_socket_connect_init(NetClientState *peer,
> const char *model,
> const char *name,
> const char *host_str)
> {
> NetSocketState *s;
> int fd, connected, ret;
> struct sockaddr_in saddr;
>
> if (parse_host_port(, host_str) < 0)
> return -1;
>
> fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
> if (fd < 0) {
> perror("socket");
> return -1;
> }
> qemu_set_nonblock(fd);
>
> connected = 0;
> for(;;) {
> ret = connect(fd, (struct sockaddr *), sizeof(saddr));
> if (ret < 0) {
> if (errno == EINTR || errno == EWOULDBLOCK) {
> /* continue */
> } else if (errno == EINPROGRESS ||
> errno == EALREADY ||
> errno == EINVAL) {
> break;
> } else {
> perror("connect");
> closesocket(fd);
> return -1;
> }
> } else {
> connected = 1;
> break;
> }
> }
> s = net_socket_fd_init(peer, model, name, fd, connected);
> if (!s)
> return -1;
> snprintf(s->nc.info_str, sizeof(s->nc.info_str),
> "socket: connect to %s:%d",
> inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
> return 0;
> }
>
>
>
>
> - Mail original -
> De: "aderumier"
> À: "Wolfgang Bumiller"