Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-26 Thread Alexandre DERUMIER
>>it has a `connect-timeout` option (which is almost surprising). 

Oh, works even better.



#socat UNIX-LISTEN:/tmp/foo.sock  TCP:8.8.8.8:1234,connect-timeout=1


# drive_mirror -n drive-scsi0 nbd:unix:/tmp/foo.sock
Failed to read data

(and socat is killed)

- Mail original -
De: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
À: "aderumier" <aderum...@odiso.com>, "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Dimanche 25 Décembre 2016 20:00:07
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

> On December 24, 2016 at 10:12 AM Alexandre DERUMIER <aderum...@odiso.com> 
> wrote: 
> 
> 
> Just has done small test with a socat tunnel 
> 
> 
> 
> socat UNIX-LISTEN:/tmp/foo.sock,reuseaddr,fork TCP:8.8.8.8:1234 
> 
> drive_mirror -n drive-scsi0 nbd:unix:/tmp/foo.sock:test 
> 
> 
> then it's hanging like before, because no reponse. (we can make a small 
> timeout to qmp response wait) 
> 
> then,we can kill socat, which is dropping the unix socket, and drive-mirror 
> abort and freeing qmp commands. 

A nasty trick, but I suppose it gets the job done for now (until someone 
sends a patch upstream). Btw. before you go around killing socats: it has 
a `connect-timeout` option (which is almost surprising). 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-25 Thread Wolfgang Bumiller
> On December 24, 2016 at 10:12 AM Alexandre DERUMIER  
> wrote:
> 
> 
> Just has done small test with a socat tunnel
> 
> 
> 
> socat UNIX-LISTEN:/tmp/foo.sock,reuseaddr,fork  TCP:8.8.8.8:1234
> 
> drive_mirror -n drive-scsi0 nbd:unix:/tmp/foo.sock:test
> 
> 
> then it's hanging like before, because no reponse. (we can make a small 
> timeout to qmp response wait)
> 
> then,we can kill socat, which is dropping the unix socket, and drive-mirror 
> abort and freeing qmp commands.

A nasty trick, but I suppose it gets the job done for now (until someone
sends a patch upstream). Btw. before you go around killing socats: it has
a `connect-timeout` option (which is almost surprising).

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-24 Thread Alexandre DERUMIER
Just has done small test with a socat tunnel



socat UNIX-LISTEN:/tmp/foo.sock,reuseaddr,fork  TCP:8.8.8.8:1234

drive_mirror -n drive-scsi0 nbd:unix:/tmp/foo.sock:test


then it's hanging like before, because no reponse. (we can make a small timeout 
to qmp response wait)

then,we can kill socat, which is dropping the unix socket, and drive-mirror 
abort and freeing qmp commands.


I'll try to send a demo patch monday.


- Mail original -
De: "aderumier" <aderum...@odiso.com>
À: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
Cc: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Vendredi 23 Décembre 2016 11:33:18
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

>>Doesn't the job show up in query-block-jobs before the connection 
>>finishes? If so then we could just assume if it's still at 0% after a 
>>timeout the connection doesn't work? The question is though, what is a 
>>good timeout? 

the problem is that the qemu monitor is totally hanging... 


>>Or we find (or add) a way to query the connection status, then we can 
>>poll that before we start polling the block-job status. 
>>(we already poll for stuff, so it's not worse than before ;p) 

maybe can we mount a tunnel (it's possible to use unix socket like for live 
migration). 
but I would like to have something without ssh if possible. 
maybe with socat or something like (it's possible to do native tls in qemu now 
for migration and drive mirror) 
Like this we can be sure that the tunnel is established before launching 
drive-mirror ? 


But yes, it could be great to avoid to maintain another qemu patch. 
(and I'm pretty poor with C language and network programming) 


- Mail original - 
De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> 
À: "aderumier" <aderum...@odiso.com> 
Cc: "pve-devel" <pve-devel@pve.proxmox.com> 
Envoyé: Vendredi 23 Décembre 2016 10:53:12 
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 

On Fri, Dec 23, 2016 at 10:20:03AM +0100, Alexandre DERUMIER wrote: 
> Hi wolfgang, 
> 
> I have done more test with drive mirror and nbd target. 
> 
> It seem that the hang occur only if the target ip is unreacheable (no network 
> reponse) 
> 
> # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target 
> ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout 
> 
> If the ip address exist and up, 
> # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target 
> Failed to connect socket: Connection refused 
> 
> 
> 
> I'm not sure, maybe it can hang too if pve-firewall do a drop instead a 
> reject on target port. 
> 
> 
> 
> I think this come from in qemu net/socket.c, 
> 
> where we have an infinite loop. 
> 
> I'm not sure how to add a timeout here, help is welcome :) 

Doesn't the job show up in query-block-jobs before the connection 
finishes? If so then we could just assume if it's still at 0% after a 
timeout the connection doesn't work? The question is though, what is a 
good timeout? 

Or we find (or add) a way to query the connection status, then we can 
poll that before we start polling the block-job status. 
(we already poll for stuff, so it's not worse than before ;p) 

Then again, an actual timeout on qemu's side might even make it 
upstream...? Maybe? (If it doesn't I'd prefer polling as described above 
over maintaining yet another qemu patch ;-) ) 

But it won't be a simple select/poll timeout as they use non-blocking 
sockets and register the fd with a callback for the connection 
completion later (in net_socket_fd_init() it calls 
qemu_set_fd_handler() where it ends up controlled by the aio handlers). 

It might be possible to add a timer (qemu_timer_new() & friends 
perhaps?) to NetSocketState (the NetSocktState cleanup would have to 
remove it, and if it gets triggered it cancels the connection instead). 

> static int net_socket_connect_init(NetClientState *peer, 
> const char *model, 
> const char *name, 
> const char *host_str) 
> { 
> NetSocketState *s; 
> int fd, connected, ret; 
> struct sockaddr_in saddr; 
> 
> if (parse_host_port(, host_str) < 0) 
> return -1; 
> 
> fd = qemu_socket(PF_INET, SOCK_STREAM, 0); 
> if (fd < 0) { 
> perror("socket"); 
> return -1; 
> } 
> qemu_set_nonblock(fd); 
> 
> connected = 0; 
> for(;;) { 
> ret = connect(fd, (struct sockaddr *), sizeof(saddr)); 
> if (ret < 0) { 
> if (errno == EINTR || errno == EWOULDBLOCK) { 
> /* continue */ 
> } else if (errno == EINPROGRESS || 
> errno == EALREADY || 
> errno == EINVAL) { 
> break; 
> } else { 
> perror("connect"); 
> closesocket(fd); 
> return -1; 
> } 
> } else { 
> connected = 1; 
> break; 
> }

Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-23 Thread Alexandre DERUMIER
>>Doesn't the job show up in query-block-jobs before the connection
>>finishes? If so then we could just assume if it's still at 0% after a
>>timeout the connection doesn't work? The question is though, what is a
>>good timeout?

the problem is that the qemu monitor is totally hanging...


>>Or we find (or add) a way to query the connection status, then we can 
>>poll that before we start polling the block-job status. 
>>(we already poll for stuff, so it's not worse than before ;p) 

maybe can we mount a tunnel (it's possible to use unix socket like for live 
migration).
but I would like to have something without ssh if possible.
maybe with socat or something like (it's possible to do native tls in qemu now 
for migration and drive mirror)
Like this we can be sure that the tunnel is established before launching 
drive-mirror ?


But yes, it could be great to avoid to maintain another qemu patch.
(and I'm pretty poor with C language and network programming)


- Mail original -
De: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
À: "aderumier" <aderum...@odiso.com>
Cc: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Vendredi 23 Décembre 2016 10:53:12
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

On Fri, Dec 23, 2016 at 10:20:03AM +0100, Alexandre DERUMIER wrote: 
> Hi wolfgang, 
> 
> I have done more test with drive mirror and nbd target. 
> 
> It seem that the hang occur only if the target ip is unreacheable (no network 
> reponse) 
> 
> # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target 
> ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout 
> 
> If the ip address exist and up, 
> # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target 
> Failed to connect socket: Connection refused 
> 
> 
> 
> I'm not sure, maybe it can hang too if pve-firewall do a drop instead a 
> reject on target port. 
> 
> 
> 
> I think this come from in qemu net/socket.c, 
> 
> where we have an infinite loop. 
> 
> I'm not sure how to add a timeout here, help is welcome :) 

Doesn't the job show up in query-block-jobs before the connection 
finishes? If so then we could just assume if it's still at 0% after a 
timeout the connection doesn't work? The question is though, what is a 
good timeout? 

Or we find (or add) a way to query the connection status, then we can 
poll that before we start polling the block-job status. 
(we already poll for stuff, so it's not worse than before ;p) 

Then again, an actual timeout on qemu's side might even make it 
upstream...? Maybe? (If it doesn't I'd prefer polling as described above 
over maintaining yet another qemu patch ;-) ) 

But it won't be a simple select/poll timeout as they use non-blocking 
sockets and register the fd with a callback for the connection 
completion later (in net_socket_fd_init() it calls 
qemu_set_fd_handler() where it ends up controlled by the aio handlers). 

It might be possible to add a timer (qemu_timer_new() & friends 
perhaps?) to NetSocketState (the NetSocktState cleanup would have to 
remove it, and if it gets triggered it cancels the connection instead). 

> static int net_socket_connect_init(NetClientState *peer, 
> const char *model, 
> const char *name, 
> const char *host_str) 
> { 
> NetSocketState *s; 
> int fd, connected, ret; 
> struct sockaddr_in saddr; 
> 
> if (parse_host_port(, host_str) < 0) 
> return -1; 
> 
> fd = qemu_socket(PF_INET, SOCK_STREAM, 0); 
> if (fd < 0) { 
> perror("socket"); 
> return -1; 
> } 
> qemu_set_nonblock(fd); 
> 
> connected = 0; 
> for(;;) { 
> ret = connect(fd, (struct sockaddr *), sizeof(saddr)); 
> if (ret < 0) { 
> if (errno == EINTR || errno == EWOULDBLOCK) { 
> /* continue */ 
> } else if (errno == EINPROGRESS || 
> errno == EALREADY || 
> errno == EINVAL) { 
> break; 
> } else { 
> perror("connect"); 
> closesocket(fd); 
> return -1; 
> } 
> } else { 
> connected = 1; 
> break; 
> } 
> } 
> s = net_socket_fd_init(peer, model, name, fd, connected); 
> if (!s) 
> return -1; 
> snprintf(s->nc.info_str, sizeof(s->nc.info_str), 
> "socket: connect to %s:%d", 
> inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port)); 
> return 0; 
> } 
> 
> 
> 
> 
> - Mail original - 
> De: "aderumier" <aderum...@odiso.com> 
> À: "Wolfgang Bumiller" <w.bumil...@proxmox.com> 
> Cc: "pve-devel" <pve-devel@pve.proxmox.com> 
> Envoyé: Mercredi 21 Décembre 2016 13:57:10 
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 
> 
> >>Then it can still hang if the destination disappears between tcp_ping() 
> >

Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-23 Thread Wolfgang Bumiller
On Fri, Dec 23, 2016 at 10:20:03AM +0100, Alexandre DERUMIER wrote:
> Hi wolfgang,
> 
> I have done more test with drive mirror and nbd target.
> 
> It seem that the hang occur only if the target ip is unreacheable (no network 
> reponse) 
> 
> # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target 
> ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout
> 
> If the ip address exist and up,
> # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target 
> Failed to connect socket: Connection refused
> 
> 
> 
> I'm not sure, maybe it can hang too if pve-firewall do a drop instead a 
> reject on target port.
> 
> 
> 
> I think this come from in qemu net/socket.c,
> 
> where we have an infinite loop.
> 
> I'm not sure how to add a timeout here, help is welcome :)

Doesn't the job show up in query-block-jobs before the connection
finishes? If so then we could just assume if it's still at 0% after a
timeout the connection doesn't work? The question is though, what is a
good timeout?

Or we find (or add) a way to query the connection status, then we can
poll that before we start polling the block-job status.
(we already poll for stuff, so it's not worse than before ;p)

Then again, an actual timeout on qemu's side might even make it
upstream...? Maybe? (If it doesn't I'd prefer polling as described above
over maintaining yet another qemu patch ;-) )

But it won't be a simple select/poll timeout as they use non-blocking
sockets and register the fd with a callback for the connection
completion later (in net_socket_fd_init() it calls
qemu_set_fd_handler() where it ends up controlled by the aio handlers).

It might be possible to add a timer (qemu_timer_new() & friends
perhaps?) to NetSocketState (the NetSocktState cleanup would have to
remove it, and if it gets triggered it cancels the connection instead).

> static int net_socket_connect_init(NetClientState *peer,
>const char *model,
>const char *name,
>const char *host_str)
> {
> NetSocketState *s;
> int fd, connected, ret;
> struct sockaddr_in saddr;
> 
> if (parse_host_port(, host_str) < 0)
> return -1;
> 
> fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
> if (fd < 0) {
> perror("socket");
> return -1;
> }
> qemu_set_nonblock(fd);
> 
> connected = 0;
> for(;;) {
> ret = connect(fd, (struct sockaddr *), sizeof(saddr));
> if (ret < 0) {
> if (errno == EINTR || errno == EWOULDBLOCK) {
> /* continue */
> } else if (errno == EINPROGRESS ||
>errno == EALREADY ||
>errno == EINVAL) {
> break;
> } else {
> perror("connect");
> closesocket(fd);
> return -1;
> }
> } else {
> connected = 1;
> break;
> }
> }
> s = net_socket_fd_init(peer, model, name, fd, connected);
> if (!s)
> return -1;
> snprintf(s->nc.info_str, sizeof(s->nc.info_str),
>  "socket: connect to %s:%d",
>  inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
> return 0;
> }
> 
> 
> 
> 
> - Mail original -
> De: "aderumier" <aderum...@odiso.com>
> À: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
> Cc: "pve-devel" <pve-devel@pve.proxmox.com>
> Envoyé: Mercredi 21 Décembre 2016 13:57:10
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
> 
> >>Then it can still hang if the destination disappears between tcp_ping() 
> >>and the `drive-mirror` command, so I'd rather get better behavior on qemu's 
> >>side. It needs a time-out or a way to cancel it or something. 
> Yes sure! 
> 
> I'm currently looking at qemu code to see how nbd client works. 
> 
> - Mail original - 
> De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> 
> À: "aderumier" <aderum...@odiso.com> 
> Cc: "pve-devel" <pve-devel@pve.proxmox.com>, "dietmar" <diet...@proxmox.com> 
> Envoyé: Mercredi 21 Décembre 2016 12:20:28 
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 
> 
> > On December 21, 2016 at 10:51 AM Alexandre DERUMIER <aderum...@odiso.com> 
> > wrote: 
> > 
> > 
> > >>IIRC that was the only blocker. 
> > >> 
> > >>Basically the patchset has to work *without* tcp_ping() since it is an 

Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-23 Thread Alexandre DERUMIER
Hi wolfgang,

I have done more test with drive mirror and nbd target.

It seem that the hang occur only if the target ip is unreacheable (no network 
reponse) 

# drive_mirror -n drive-scsi0 nbd://66.66.66.66/target 
ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout

If the ip address exist and up,
# drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target 
Failed to connect socket: Connection refused



I'm not sure, maybe it can hang too if pve-firewall do a drop instead a reject 
on target port.



I think this come from in qemu net/socket.c,

where we have an infinite loop.

I'm not sure how to add a timeout here, help is welcome :)




static int net_socket_connect_init(NetClientState *peer,
   const char *model,
   const char *name,
   const char *host_str)
{
NetSocketState *s;
int fd, connected, ret;
struct sockaddr_in saddr;

if (parse_host_port(, host_str) < 0)
return -1;

fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
if (fd < 0) {
perror("socket");
return -1;
}
qemu_set_nonblock(fd);

connected = 0;
for(;;) {
ret = connect(fd, (struct sockaddr *), sizeof(saddr));
if (ret < 0) {
if (errno == EINTR || errno == EWOULDBLOCK) {
/* continue */
} else if (errno == EINPROGRESS ||
   errno == EALREADY ||
   errno == EINVAL) {
break;
} else {
perror("connect");
closesocket(fd);
return -1;
}
} else {
connected = 1;
break;
}
}
s = net_socket_fd_init(peer, model, name, fd, connected);
if (!s)
return -1;
snprintf(s->nc.info_str, sizeof(s->nc.info_str),
 "socket: connect to %s:%d",
 inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
return 0;
}




- Mail original -
De: "aderumier" <aderum...@odiso.com>
À: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
Cc: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Mercredi 21 Décembre 2016 13:57:10
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

>>Then it can still hang if the destination disappears between tcp_ping() 
>>and the `drive-mirror` command, so I'd rather get better behavior on qemu's 
>>side. It needs a time-out or a way to cancel it or something. 
Yes sure! 

I'm currently looking at qemu code to see how nbd client works. 

- Mail original - 
De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> 
À: "aderumier" <aderum...@odiso.com> 
Cc: "pve-devel" <pve-devel@pve.proxmox.com>, "dietmar" <diet...@proxmox.com> 
Envoyé: Mercredi 21 Décembre 2016 12:20:28 
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 

> On December 21, 2016 at 10:51 AM Alexandre DERUMIER <aderum...@odiso.com> 
> wrote: 
> 
> 
> >>IIRC that was the only blocker. 
> >> 
> >>Basically the patchset has to work *without* tcp_ping() since it is an 
> >>unreliable check, and then we still have to catch failing connections 
> >>_correctly_. (There's no point in knowing that "some time in the past 
> >>you were able to connect to something which may or may not have been a 
> >>qemu nbd server", we need to know whether the drive-mirror job itself 
> >>was able to connect.) 
> 
> For me, the mirror job auto abort if connection is failing during the 
> migration. Do you see another behaviour ? 

That covers one problem. IIRC the disk-deletion problem was that due 
to wrong [] usage around an ipv6 address it could not connect in the 
first place and didn't error as I would have hoped. 

> 
> the tcp_ping was just before launching the drive mirror command, because it 
> was hanging in this case. 

Then it can still hang if the destination disappears between tcp_ping() 
and the `drive-mirror` command, so I'd rather get better behavior on qemu's 
side. It needs a time-out or a way to cancel it or something. 

___ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-21 Thread Alexandre DERUMIER
>>Then it can still hang if the destination disappears between tcp_ping()
>>and the `drive-mirror` command, so I'd rather get better behavior on qemu's
>>side. It needs a time-out or a way to cancel it or something.
Yes sure!

I'm currently looking at qemu code to see how nbd client works.

- Mail original -
De: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
À: "aderumier" <aderum...@odiso.com>
Cc: "pve-devel" <pve-devel@pve.proxmox.com>, "dietmar" <diet...@proxmox.com>
Envoyé: Mercredi 21 Décembre 2016 12:20:28
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

> On December 21, 2016 at 10:51 AM Alexandre DERUMIER <aderum...@odiso.com> 
> wrote: 
> 
> 
> >>IIRC that was the only blocker. 
> >> 
> >>Basically the patchset has to work *without* tcp_ping() since it is an 
> >>unreliable check, and then we still have to catch failing connections 
> >>_correctly_. (There's no point in knowing that "some time in the past 
> >>you were able to connect to something which may or may not have been a 
> >>qemu nbd server", we need to know whether the drive-mirror job itself 
> >>was able to connect.) 
> 
> For me, the mirror job auto abort if connection is failing during the 
> migration. Do you see another behaviour ? 

That covers one problem. IIRC the disk-deletion problem was that due 
to wrong [] usage around an ipv6 address it could not connect in the 
first place and didn't error as I would have hoped. 

> 
> the tcp_ping was just before launching the drive mirror command, because it 
> was hanging in this case. 

Then it can still hang if the destination disappears between tcp_ping() 
and the `drive-mirror` command, so I'd rather get better behavior on qemu's 
side. It needs a time-out or a way to cancel it or something. 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-21 Thread Wolfgang Bumiller

> On December 21, 2016 at 10:51 AM Alexandre DERUMIER  
> wrote:
> 
> 
> >>IIRC that was the only blocker.
> >>
> >>Basically the patchset has to work *without* tcp_ping() since it is an
> >>unreliable check, and then we still have to catch failing connections
> >>_correctly_. (There's no point in knowing that "some time in the past
> >>you were able to connect to something which may or may not have been a
> >>qemu nbd server", we need to know whether the drive-mirror job itself
> >>was able to connect.)
> 
> For me, the mirror job auto abort if connection is failing during the 
> migration. Do you see another behaviour ?

That covers one problem. IIRC the disk-deletion problem was that due
to wrong [] usage around an ipv6 address it could not connect in the
first place and didn't error as I would have hoped.

> 
> the tcp_ping was just before launching the drive mirror command, because it 
> was hanging in this case.

Then it can still hang if the destination disappears between tcp_ping()
and the `drive-mirror` command, so I'd rather get better behavior on qemu's
side. It needs a time-out or a way to cancel it or something.

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-21 Thread Alexandre DERUMIER
I have seen a lot of improvement with nbd in qemu 2.8, maybe the connection 
hang has been fixed.

I'll try to do tests.


- Mail original -
De: "aderumier" <aderum...@odiso.com>
À: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
Cc: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Mercredi 21 Décembre 2016 10:51:55
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

>>IIRC that was the only blocker. 
>> 
>>Basically the patchset has to work *without* tcp_ping() since it is an 
>>unreliable check, and then we still have to catch failing connections 
>>_correctly_. (There's no point in knowing that "some time in the past 
>>you were able to connect to something which may or may not have been a 
>>qemu nbd server", we need to know whether the drive-mirror job itself 
>>was able to connect.) 

For me, the mirror job auto abort if connection is failing during the 
migration. Do you see another behaviour ? 

the tcp_ping was just before launching the drive mirror command, because it was 
hanging in this case. 


- Mail original - 
De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> 
À: "aderumier" <aderum...@odiso.com> 
Cc: "dietmar" <diet...@proxmox.com>, "pve-devel" <pve-devel@pve.proxmox.com> 
Envoyé: Mercredi 21 Décembre 2016 10:47:48 
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 

On Mon, Dec 19, 2016 at 07:05:58AM +0100, Alexandre DERUMIER wrote: 
> >>AFAIR the patch looks already quite good. We currently prepare for 
> >>the 4.4 release, but when that is done we can start adding new features 
> >>like local disk live migration. 
> 
> Hi, 
> Now that 4.4 has been release, could it be possible to apply live storage 
> migration patches ? 
> 
> I'll have some time until end of the year to polish them. 

So far they've been working, but iirc there's still the issue where 
breaking/failing connections _during_ the operation and/or the block 
jobs failing to establish the connection in the first place go unnoticed 
where we end up with possibly deleted disks instead of properly failing. 
(Which is why I said I didn't like the tcp_ping check.) 

IIRC that was the only blocker. 

Basically the patchset has to work *without* tcp_ping() since it is an 
unreliable check, and then we still have to catch failing connections 
_correctly_. (There's no point in knowing that "some time in the past 
you were able to connect to something which may or may not have been a 
qemu nbd server", we need to know whether the drive-mirror job itself 
was able to connect.) 

> Also, I would like to improve them later, to add another feature, live 
> migration to remote external proxmox cluster. (on different proxmox && 
> storage). 
> Vmware has this feature in entreprise premium, and I'll need it for some 
> customers next year where I can't have downtime. 

Sounds like a piece of work ;-) 

___ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-21 Thread Alexandre DERUMIER
>>IIRC that was the only blocker.
>>
>>Basically the patchset has to work *without* tcp_ping() since it is an
>>unreliable check, and then we still have to catch failing connections
>>_correctly_. (There's no point in knowing that "some time in the past
>>you were able to connect to something which may or may not have been a
>>qemu nbd server", we need to know whether the drive-mirror job itself
>>was able to connect.)

For me, the mirror job auto abort if connection is failing during the 
migration. Do you see another behaviour ?

the tcp_ping was just before launching the drive mirror command, because it was 
hanging in this case.


- Mail original -
De: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
À: "aderumier" <aderum...@odiso.com>
Cc: "dietmar" <diet...@proxmox.com>, "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Mercredi 21 Décembre 2016 10:47:48
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

On Mon, Dec 19, 2016 at 07:05:58AM +0100, Alexandre DERUMIER wrote: 
> >>AFAIR the patch looks already quite good. We currently prepare for 
> >>the 4.4 release, but when that is done we can start adding new features 
> >>like local disk live migration. 
> 
> Hi, 
> Now that 4.4 has been release, could it be possible to apply live storage 
> migration patches ? 
> 
> I'll have some time until end of the year to polish them. 

So far they've been working, but iirc there's still the issue where 
breaking/failing connections _during_ the operation and/or the block 
jobs failing to establish the connection in the first place go unnoticed 
where we end up with possibly deleted disks instead of properly failing. 
(Which is why I said I didn't like the tcp_ping check.) 

IIRC that was the only blocker. 

Basically the patchset has to work *without* tcp_ping() since it is an 
unreliable check, and then we still have to catch failing connections 
_correctly_. (There's no point in knowing that "some time in the past 
you were able to connect to something which may or may not have been a 
qemu nbd server", we need to know whether the drive-mirror job itself 
was able to connect.) 

> Also, I would like to improve them later, to add another feature, live 
> migration to remote external proxmox cluster. (on different proxmox && 
> storage). 
> Vmware has this feature in entreprise premium, and I'll need it for some 
> customers next year where I can't have downtime. 

Sounds like a piece of work ;-) 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-21 Thread Wolfgang Bumiller
On Mon, Dec 19, 2016 at 07:05:58AM +0100, Alexandre DERUMIER wrote:
> >>AFAIR the patch looks already quite good. We currently prepare for 
> >>the 4.4 release, but when that is done we can start adding new features 
> >>like local disk live migration. 
> 
> Hi,
> Now that 4.4 has been release, could it be possible to apply live storage 
> migration patches ?
> 
> I'll have some time until end of the year to polish them.

So far they've been working, but iirc there's still the issue where
breaking/failing connections _during_ the operation and/or the block
jobs failing to establish the connection in the first place go unnoticed
where we end up with possibly deleted disks instead of properly failing.
(Which is why I said I didn't like the tcp_ping check.)

IIRC that was the only blocker.

Basically the patchset has to work *without* tcp_ping() since it is an
unreliable check, and then we still have to catch failing connections
_correctly_. (There's no point in knowing that "some time in the past
you were able to connect to something which may or may not have been a
qemu nbd server", we need to know whether the drive-mirror job itself
was able to connect.)

> Also, I would like to improve them later, to add another feature, live 
> migration to remote external proxmox cluster. (on different proxmox && 
> storage).
> Vmware has this feature in entreprise premium, and I'll need it for some 
> customers next year where I can't have downtime.

Sounds like a piece of work ;-)

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-18 Thread Alexandre DERUMIER
>>AFAIR the patch looks already quite good. We currently prepare for 
>>the 4.4 release, but when that is done we can start adding new features 
>>like local disk live migration. 

Hi,
Now that 4.4 has been release, could it be possible to apply live storage 
migration patches ?

I'll have some time until end of the year to polish them.


Also, I would like to improve them later, to add another feature, live 
migration to remote external proxmox cluster. (on different proxmox && storage).
Vmware has this feature in entreprise premium, and I'll need it for some 
customers next year where I can't have downtime.


- Mail original -
De: "dietmar" <diet...@proxmox.com>
À: "Alexandre Derumier" <aderum...@odiso.com>, "pve-devel" 
<pve-devel@pve.proxmox.com>
Envoyé: Jeudi 1 Décembre 2016 16:30:22
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

> Any news for local disk live migration ? 
> 
> I have a time to work on it this month if cleanup is needed, but I'll be very 
> busy at the begin of next year. 

AFAIR the patch looks already quite good. We currently prepare for 
the 4.4 release, but when that is done we can start adding new features 
like local disk live migration. 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-01 Thread Alexandre DERUMIER
>>We currently prepare for 
>>the 4.4 release, but when that is done we can start adding new features 
>>like local disk live migration. 

Ok Great !


- Mail original -
De: "dietmar" <diet...@proxmox.com>
À: "aderumier" <aderum...@odiso.com>, "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Jeudi 1 Décembre 2016 16:30:22
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

> Any news for local disk live migration ? 
> 
> I have a time to work on it this month if cleanup is needed, but I'll be very 
> busy at the begin of next year. 

AFAIR the patch looks already quite good. We currently prepare for 
the 4.4 release, but when that is done we can start adding new features 
like local disk live migration. 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-01 Thread Dietmar Maurer
> Any news for local disk live migration ?
> 
> I have a time to work on it this month if cleanup is needed, but I'll be very
> busy at the begin of next year.

AFAIR the patch looks already quite good. We currently prepare for
the 4.4 release, but when that is done we can start adding new features
like local disk live migration.

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-01 Thread Alexandre DERUMIER
Any news for local disk live migration ?

I have a time to work on it this month if cleanup is needed, but I'll be very 
busy at the begin of next year.

- Mail original -
De: "aderumier" <aderum...@odiso.com>
À: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Mercredi 16 Novembre 2016 12:10:05
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

Wolfgang, any comment ? 

- Mail original - 
De: "aderumier" <aderum...@odiso.com> 
À: "Wolfgang Bumiller" <w.bumil...@proxmox.com> 
Cc: "pve-devel" <pve-devel@pve.proxmox.com> 
Envoyé: Jeudi 10 Novembre 2016 15:47:47 
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 

>>Well, yes, but what happens when the connection fails or gets interrupted? 
>>A vanishing connection (timeout) as well as when the connection gets killed 
>>for some reason (eg. tcpkill) need to be handled in the 
>>qemu_drive_mirror_monitor() functions properly. 

a running mirroring job auto abort if network fail. (or any write/read error on 
source/destination). 


This check was more for the connection. 
the drive-mirror hang when trying to establish the connection if the target not 
responding. 
(and I'm not sure about it they are a timeout, as I have wait some minutes 
without response) 



- Mail original - 
De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> 
À: "aderumier" <aderum...@odiso.com> 
Cc: "pve-devel" <pve-devel@pve.proxmox.com> 
Envoyé: Jeudi 10 Novembre 2016 14:43:15 
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 

> On November 10, 2016 at 1:54 PM Alexandre DERUMIER <aderum...@odiso.com> 
> wrote: 
> 
> 
> > + die "can't connect remote nbd server $server:$port" if 
> > !PVE::Network::tcp_ping($server, $port, 2); 
> >>I'm not all too happy about this check here as it is a TOCTOU race. 
> >>(I'm not happy about some the other uses of it as well, but some are 
> >>only for status quieries, iow. to display information, where it's fine.) 
> 
> >>However, in this case, if broken/missing connections can still not be 
> >>caught (like in my previous tests), then this only hides 99.99% of the 
> >>cases while still wrongly deleting disks in the other 0.01%, which is 
> >>unacceptable behavior. 
> 
> So, do you want that I remove the check ? 

Well, yes, but what happens when the connection fails or gets interrupted? 
A vanishing connection (timeout) as well as when the connection gets killed 
for some reason (eg. tcpkill) need to be handled in the 
qemu_drive_mirror_monitor() functions properly. 

> 
> 
> >>$jobs is still empty at this point. The assignment below should be moved 
> >>up. 
> 
> > + die "mirroring error: $err"; 
> > + } 
> > + 
> > + $jobs->{"drive-$drive"} = {}; 
> >>This one ^ 
> 
> Ok. 
> 
> Thanks for the review! 
> 
> 
> 
> - Mail original - 
> De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> 
> À: "aderumier" <aderum...@odiso.com> 
> Cc: "pve-devel" <pve-devel@pve.proxmox.com> 
> Envoyé: Jeudi 10 Novembre 2016 13:21:00 
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 
> 
> On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote: 
> > we can use multiple drive_mirror in parralel. 
> > 
> > block-job-complete can be skipped, if we want to add more mirror job later. 
> > 
> > also add support for nbd uri to qemu_drive_mirror 
> > 
> > Signed-off-by: Alexandre Derumier <aderum...@odiso.com> 
> > --- 
> > PVE/QemuServer.pm | 171 
> > +++--- 
> > 1 file changed, 123 insertions(+), 48 deletions(-) 
> > 
> > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm 
> > index 54edd96..e989670 100644 
> > --- a/PVE/QemuServer.pm 
> > +++ b/PVE/QemuServer.pm 
> > @@ -5824,91 +5824,165 @@ sub qemu_img_format { 
> > } 
> > 
> > sub qemu_drive_mirror { 
> > - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; 
> > + my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, 
> > $skipcomplete) = @_; 
> > 
> > - my $storecfg = PVE::Storage::config(); 
> > - my ($dst_storeid, $dst_volname) = 
> > PVE::Storage::parse_volume_id($dst_volid); 
> > + $jobs = {} if !$jobs; 
> > + 
> > + my $qemu_target; 
> > + my $format; 
> > 
> > - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); 
> > + if($dst_volid =~ 

Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-11-16 Thread Alexandre DERUMIER
Wolfgang, any comment ?

- Mail original -
De: "aderumier" <aderum...@odiso.com>
À: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
Cc: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Jeudi 10 Novembre 2016 15:47:47
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

>>Well, yes, but what happens when the connection fails or gets interrupted? 
>>A vanishing connection (timeout) as well as when the connection gets killed 
>>for some reason (eg. tcpkill) need to be handled in the 
>>qemu_drive_mirror_monitor() functions properly. 

a running mirroring job auto abort if network fail. (or any write/read error on 
source/destination). 


This check was more for the connection. 
the drive-mirror hang when trying to establish the connection if the target not 
responding. 
(and I'm not sure about it they are a timeout, as I have wait some minutes 
without response) 



- Mail original - 
De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> 
À: "aderumier" <aderum...@odiso.com> 
Cc: "pve-devel" <pve-devel@pve.proxmox.com> 
Envoyé: Jeudi 10 Novembre 2016 14:43:15 
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 

> On November 10, 2016 at 1:54 PM Alexandre DERUMIER <aderum...@odiso.com> 
> wrote: 
> 
> 
> > + die "can't connect remote nbd server $server:$port" if 
> > !PVE::Network::tcp_ping($server, $port, 2); 
> >>I'm not all too happy about this check here as it is a TOCTOU race. 
> >>(I'm not happy about some the other uses of it as well, but some are 
> >>only for status quieries, iow. to display information, where it's fine.) 
> 
> >>However, in this case, if broken/missing connections can still not be 
> >>caught (like in my previous tests), then this only hides 99.99% of the 
> >>cases while still wrongly deleting disks in the other 0.01%, which is 
> >>unacceptable behavior. 
> 
> So, do you want that I remove the check ? 

Well, yes, but what happens when the connection fails or gets interrupted? 
A vanishing connection (timeout) as well as when the connection gets killed 
for some reason (eg. tcpkill) need to be handled in the 
qemu_drive_mirror_monitor() functions properly. 

> 
> 
> >>$jobs is still empty at this point. The assignment below should be moved 
> >>up. 
> 
> > + die "mirroring error: $err"; 
> > + } 
> > + 
> > + $jobs->{"drive-$drive"} = {}; 
> >>This one ^ 
> 
> Ok. 
> 
> Thanks for the review! 
> 
> 
> 
> - Mail original - 
> De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> 
> À: "aderumier" <aderum...@odiso.com> 
> Cc: "pve-devel" <pve-devel@pve.proxmox.com> 
> Envoyé: Jeudi 10 Novembre 2016 13:21:00 
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 
> 
> On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote: 
> > we can use multiple drive_mirror in parralel. 
> > 
> > block-job-complete can be skipped, if we want to add more mirror job later. 
> > 
> > also add support for nbd uri to qemu_drive_mirror 
> > 
> > Signed-off-by: Alexandre Derumier <aderum...@odiso.com> 
> > --- 
> > PVE/QemuServer.pm | 171 
> > +++--- 
> > 1 file changed, 123 insertions(+), 48 deletions(-) 
> > 
> > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm 
> > index 54edd96..e989670 100644 
> > --- a/PVE/QemuServer.pm 
> > +++ b/PVE/QemuServer.pm 
> > @@ -5824,91 +5824,165 @@ sub qemu_img_format { 
> > } 
> > 
> > sub qemu_drive_mirror { 
> > - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; 
> > + my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, 
> > $skipcomplete) = @_; 
> > 
> > - my $storecfg = PVE::Storage::config(); 
> > - my ($dst_storeid, $dst_volname) = 
> > PVE::Storage::parse_volume_id($dst_volid); 
> > + $jobs = {} if !$jobs; 
> > + 
> > + my $qemu_target; 
> > + my $format; 
> > 
> > - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); 
> > + if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) { 
> > + $qemu_target = $dst_volid; 
> > + my $server = $1; 
> > + my $port = $2; 
> > + $format = "nbd"; 
> > + die "can't connect remote nbd server $server:$port" if 
> > !PVE::Network::tcp_ping($server, $port, 2); 
> 
> I'm not all too happy about this check here as it is a TOCTOU race. 
> (I'm n

Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-11-10 Thread Alexandre DERUMIER
>>Well, yes, but what happens when the connection fails or gets interrupted?
>>A vanishing connection (timeout) as well as when the connection gets killed
>>for some reason (eg. tcpkill) need to be handled in the
>>qemu_drive_mirror_monitor() functions properly.

a running mirroring job auto abort if network fail. (or any write/read error on 
source/destination).


This check was more for the connection.
the drive-mirror hang when trying to establish the connection if the target not 
responding.
(and I'm not sure about it they are a timeout, as I have wait some minutes 
without response)



- Mail original -
De: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
À: "aderumier" <aderum...@odiso.com>
Cc: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Jeudi 10 Novembre 2016 14:43:15
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

> On November 10, 2016 at 1:54 PM Alexandre DERUMIER <aderum...@odiso.com> 
> wrote: 
> 
> 
> > + die "can't connect remote nbd server $server:$port" if 
> > !PVE::Network::tcp_ping($server, $port, 2); 
> >>I'm not all too happy about this check here as it is a TOCTOU race. 
> >>(I'm not happy about some the other uses of it as well, but some are 
> >>only for status quieries, iow. to display information, where it's fine.) 
> 
> >>However, in this case, if broken/missing connections can still not be 
> >>caught (like in my previous tests), then this only hides 99.99% of the 
> >>cases while still wrongly deleting disks in the other 0.01%, which is 
> >>unacceptable behavior. 
> 
> So, do you want that I remove the check ? 

Well, yes, but what happens when the connection fails or gets interrupted? 
A vanishing connection (timeout) as well as when the connection gets killed 
for some reason (eg. tcpkill) need to be handled in the 
qemu_drive_mirror_monitor() functions properly. 

> 
> 
> >>$jobs is still empty at this point. The assignment below should be moved 
> >>up. 
> 
> > + die "mirroring error: $err"; 
> > + } 
> > + 
> > + $jobs->{"drive-$drive"} = {}; 
> >>This one ^ 
> 
> Ok. 
> 
> Thanks for the review! 
> 
> 
> 
> - Mail original - 
> De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> 
> À: "aderumier" <aderum...@odiso.com> 
> Cc: "pve-devel" <pve-devel@pve.proxmox.com> 
> Envoyé: Jeudi 10 Novembre 2016 13:21:00 
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 
> 
> On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote: 
> > we can use multiple drive_mirror in parralel. 
> > 
> > block-job-complete can be skipped, if we want to add more mirror job later. 
> > 
> > also add support for nbd uri to qemu_drive_mirror 
> > 
> > Signed-off-by: Alexandre Derumier <aderum...@odiso.com> 
> > --- 
> > PVE/QemuServer.pm | 171 
> > +++--- 
> > 1 file changed, 123 insertions(+), 48 deletions(-) 
> > 
> > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm 
> > index 54edd96..e989670 100644 
> > --- a/PVE/QemuServer.pm 
> > +++ b/PVE/QemuServer.pm 
> > @@ -5824,91 +5824,165 @@ sub qemu_img_format { 
> > } 
> > 
> > sub qemu_drive_mirror { 
> > - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; 
> > + my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, 
> > $skipcomplete) = @_; 
> > 
> > - my $storecfg = PVE::Storage::config(); 
> > - my ($dst_storeid, $dst_volname) = 
> > PVE::Storage::parse_volume_id($dst_volid); 
> > + $jobs = {} if !$jobs; 
> > + 
> > + my $qemu_target; 
> > + my $format; 
> > 
> > - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); 
> > + if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) { 
> > + $qemu_target = $dst_volid; 
> > + my $server = $1; 
> > + my $port = $2; 
> > + $format = "nbd"; 
> > + die "can't connect remote nbd server $server:$port" if 
> > !PVE::Network::tcp_ping($server, $port, 2); 
> 
> I'm not all too happy about this check here as it is a TOCTOU race. 
> (I'm not happy about some the other uses of it as well, but some are 
> only for status quieries, iow. to display information, where it's fine.) 
> 
> However, in this case, if broken/missing connections can still not be 
> caught (like in my previous tests), then this only hides 99.99% of the 
> cases while still wrongly deleting disks i

Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-11-10 Thread Wolfgang Bumiller

> On November 10, 2016 at 1:54 PM Alexandre DERUMIER <aderum...@odiso.com> 
> wrote:
> 
> 
> > + die "can't connect remote nbd server $server:$port" if 
> > !PVE::Network::tcp_ping($server, $port, 2); 
> >>I'm not all too happy about this check here as it is a TOCTOU race. 
> >>(I'm not happy about some the other uses of it as well, but some are 
> >>only for status quieries, iow. to display information, where it's fine.) 
> 
> >>However, in this case, if broken/missing connections can still not be 
> >>caught (like in my previous tests), then this only hides 99.99% of the 
> >>cases while still wrongly deleting disks in the other 0.01%, which is 
> >>unacceptable behavior. 
> 
> So, do you want that I remove the check ?

Well, yes, but what happens when the connection fails or gets interrupted?
A vanishing connection (timeout) as well as when the connection gets killed
for some reason (eg. tcpkill) need to be handled in the
qemu_drive_mirror_monitor() functions properly.

> 
> 
> >>$jobs is still empty at this point. The assignment below should be moved 
> >>up. 
> 
> > + die "mirroring error: $err"; 
> > + } 
> > + 
> > + $jobs->{"drive-$drive"} = {}; 
> >>This one ^ 
> 
> Ok.
> 
> Thanks for the review!
> 
> 
> 
> ----- Mail original -----
> De: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
> À: "aderumier" <aderum...@odiso.com>
> Cc: "pve-devel" <pve-devel@pve.proxmox.com>
> Envoyé: Jeudi 10 Novembre 2016 13:21:00
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
> 
> On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote: 
> > we can use multiple drive_mirror in parralel. 
> > 
> > block-job-complete can be skipped, if we want to add more mirror job later. 
> > 
> > also add support for nbd uri to qemu_drive_mirror 
> > 
> > Signed-off-by: Alexandre Derumier <aderum...@odiso.com> 
> > --- 
> > PVE/QemuServer.pm | 171 
> > +++--- 
> > 1 file changed, 123 insertions(+), 48 deletions(-) 
> > 
> > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm 
> > index 54edd96..e989670 100644 
> > --- a/PVE/QemuServer.pm 
> > +++ b/PVE/QemuServer.pm 
> > @@ -5824,91 +5824,165 @@ sub qemu_img_format { 
> > } 
> > 
> > sub qemu_drive_mirror { 
> > - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; 
> > + my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, 
> > $skipcomplete) = @_; 
> > 
> > - my $storecfg = PVE::Storage::config(); 
> > - my ($dst_storeid, $dst_volname) = 
> > PVE::Storage::parse_volume_id($dst_volid); 
> > + $jobs = {} if !$jobs; 
> > + 
> > + my $qemu_target; 
> > + my $format; 
> > 
> > - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); 
> > + if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) { 
> > + $qemu_target = $dst_volid; 
> > + my $server = $1; 
> > + my $port = $2; 
> > + $format = "nbd"; 
> > + die "can't connect remote nbd server $server:$port" if 
> > !PVE::Network::tcp_ping($server, $port, 2); 
> 
> I'm not all too happy about this check here as it is a TOCTOU race. 
> (I'm not happy about some the other uses of it as well, but some are 
> only for status quieries, iow. to display information, where it's fine.) 
> 
> However, in this case, if broken/missing connections can still not be 
> caught (like in my previous tests), then this only hides 99.99% of the 
> cases while still wrongly deleting disks in the other 0.01%, which is 
> unacceptable behavior. 
> 
> > + } else { 
> > + 
> > + my $storecfg = PVE::Storage::config(); 
> > + my ($dst_storeid, $dst_volname) = 
> > PVE::Storage::parse_volume_id($dst_volid); 
> > + 
> > + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); 
> > 
> > - my $format = qemu_img_format($dst_scfg, $dst_volname); 
> > + $format = qemu_img_format($dst_scfg, $dst_volname); 
> > 
> > - my $dst_path = PVE::Storage::path($storecfg, $dst_volid); 
> > + my $dst_path = PVE::Storage::path($storecfg, $dst_volid); 
> > 
> > - my $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : 
> > $dst_path; 
> > + $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; 
> > + } 
> > 
> > my $opts = { timeout => 10, device => "drive-$drive&qu

Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-11-10 Thread Alexandre DERUMIER
> + die "can't connect remote nbd server $server:$port" if 
> !PVE::Network::tcp_ping($server, $port, 2); 
>>I'm not all too happy about this check here as it is a TOCTOU race. 
>>(I'm not happy about some the other uses of it as well, but some are 
>>only for status quieries, iow. to display information, where it's fine.) 

>>However, in this case, if broken/missing connections can still not be 
>>caught (like in my previous tests), then this only hides 99.99% of the 
>>cases while still wrongly deleting disks in the other 0.01%, which is 
>>unacceptable behavior. 

So, do you want that I remove the check ?




>>$jobs is still empty at this point. The assignment below should be moved 
>>up. 

> + die "mirroring error: $err"; 
> + } 
> + 
> + $jobs->{"drive-$drive"} = {}; 
>>This one ^ 

Ok.

Thanks for the review!



- Mail original -
De: "Wolfgang Bumiller" <w.bumil...@proxmox.com>
À: "aderumier" <aderum...@odiso.com>
Cc: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Jeudi 10 Novembre 2016 13:21:00
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote: 
> we can use multiple drive_mirror in parralel. 
> 
> block-job-complete can be skipped, if we want to add more mirror job later. 
> 
> also add support for nbd uri to qemu_drive_mirror 
> 
> Signed-off-by: Alexandre Derumier <aderum...@odiso.com> 
> --- 
> PVE/QemuServer.pm | 171 
> +++--- 
> 1 file changed, 123 insertions(+), 48 deletions(-) 
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm 
> index 54edd96..e989670 100644 
> --- a/PVE/QemuServer.pm 
> +++ b/PVE/QemuServer.pm 
> @@ -5824,91 +5824,165 @@ sub qemu_img_format { 
> } 
> 
> sub qemu_drive_mirror { 
> - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; 
> + my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, 
> $skipcomplete) = @_; 
> 
> - my $storecfg = PVE::Storage::config(); 
> - my ($dst_storeid, $dst_volname) = 
> PVE::Storage::parse_volume_id($dst_volid); 
> + $jobs = {} if !$jobs; 
> + 
> + my $qemu_target; 
> + my $format; 
> 
> - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); 
> + if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) { 
> + $qemu_target = $dst_volid; 
> + my $server = $1; 
> + my $port = $2; 
> + $format = "nbd"; 
> + die "can't connect remote nbd server $server:$port" if 
> !PVE::Network::tcp_ping($server, $port, 2); 

I'm not all too happy about this check here as it is a TOCTOU race. 
(I'm not happy about some the other uses of it as well, but some are 
only for status quieries, iow. to display information, where it's fine.) 

However, in this case, if broken/missing connections can still not be 
caught (like in my previous tests), then this only hides 99.99% of the 
cases while still wrongly deleting disks in the other 0.01%, which is 
unacceptable behavior. 

> + } else { 
> + 
> + my $storecfg = PVE::Storage::config(); 
> + my ($dst_storeid, $dst_volname) = 
> PVE::Storage::parse_volume_id($dst_volid); 
> + 
> + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); 
> 
> - my $format = qemu_img_format($dst_scfg, $dst_volname); 
> + $format = qemu_img_format($dst_scfg, $dst_volname); 
> 
> - my $dst_path = PVE::Storage::path($storecfg, $dst_volid); 
> + my $dst_path = PVE::Storage::path($storecfg, $dst_volid); 
> 
> - my $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; 
> + $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; 
> + } 
> 
> my $opts = { timeout => 10, device => "drive-$drive", mode => "existing", 
> sync => "full", target => $qemu_target }; 
> $opts->{format} = $format if $format; 
> 
> - print "drive mirror is starting (scanning bitmap) : this step can take some 
> minutes/hours, depend of disk size and storage speed\n"; 
> + print "drive mirror is starting for drive-$drive\n"; 
> 
> - my $finish_job = sub { 
> - while (1) { 
> - my $stats = vm_mon_cmd($vmid, "query-block-jobs"); 
> - my $stat = @$stats[0]; 
> - last if !$stat; 
> - sleep 1; 
> - } 
> - }; 
> + eval { vm_mon_cmd($vmid, "drive-mirror", %$opts); }; #if a job already run 
> for this device,it's throw an error 
> + if (my $err = $@) { 
> + eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) }; 

$jobs is still empty at this point. The assignment below should be moved 

Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-11-10 Thread Wolfgang Bumiller
On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote:
> we can use multiple drive_mirror in parralel.
> 
> block-job-complete can be skipped, if we want to add more mirror job later.
> 
> also add support for nbd uri to qemu_drive_mirror
> 
> Signed-off-by: Alexandre Derumier 
> ---
>  PVE/QemuServer.pm | 171 
> +++---
>  1 file changed, 123 insertions(+), 48 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 54edd96..e989670 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -5824,91 +5824,165 @@ sub qemu_img_format {
>  }
>  
>  sub qemu_drive_mirror {
> -my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_;
> +my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, 
> $skipcomplete) = @_;
>  
> -my $storecfg = PVE::Storage::config();
> -my ($dst_storeid, $dst_volname) = 
> PVE::Storage::parse_volume_id($dst_volid);
> +$jobs = {} if !$jobs;
> +
> +my $qemu_target;
> +my $format;
>  
> -my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
> +if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) {
> + $qemu_target = $dst_volid;
> + my $server = $1;
> + my $port = $2;
> + $format = "nbd";
> + die "can't connect remote nbd server $server:$port" if 
> !PVE::Network::tcp_ping($server, $port, 2);

I'm not all too happy about this check here as it is a TOCTOU race.
(I'm not happy about some the other uses of it as well, but some are
only for status quieries, iow. to display information, where it's fine.)

However, in this case, if broken/missing connections can still not be
caught (like in my previous tests), then this only hides 99.99% of the
cases while still wrongly deleting disks in the other 0.01%, which is
unacceptable behavior.

> +} else {
> +
> + my $storecfg = PVE::Storage::config();
> + my ($dst_storeid, $dst_volname) = 
> PVE::Storage::parse_volume_id($dst_volid);
> +
> + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
>  
> -my $format = qemu_img_format($dst_scfg, $dst_volname);
> + $format = qemu_img_format($dst_scfg, $dst_volname);
>  
> -my $dst_path = PVE::Storage::path($storecfg, $dst_volid);
> + my $dst_path = PVE::Storage::path($storecfg, $dst_volid);
>  
> -my $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : 
> $dst_path;
> + $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path;
> +}
>  
>  my $opts = { timeout => 10, device => "drive-$drive", mode => 
> "existing", sync => "full", target => $qemu_target };
>  $opts->{format} = $format if $format;
>  
> -print "drive mirror is starting (scanning bitmap) : this step can take 
> some minutes/hours, depend of disk size and storage speed\n";
> +print "drive mirror is starting for drive-$drive\n";
>  
> -my $finish_job = sub {
> - while (1) {
> - my $stats = vm_mon_cmd($vmid, "query-block-jobs");
> - my $stat = @$stats[0];
> - last if !$stat;
> - sleep 1;
> - }
> -};
> +eval { vm_mon_cmd($vmid, "drive-mirror", %$opts); }; #if a job already 
> run for this device,it's throw an error
> +if (my $err = $@) {
> + eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) };

$jobs is still empty at this point. The assignment below should be moved
up.

> + die "mirroring error: $err";
> +}
> +
> +$jobs->{"drive-$drive"} = {};
This one ^

> +
> +qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $skipcomplete);
> +}
> +
> +sub qemu_drive_mirror_monitor {
> +my ($vmid, $vmiddst, $jobs, $skipcomplete) = @_;
>  
>  eval {
> -vm_mon_cmd($vmid, "drive-mirror", %$opts);
> +
> + my $err_complete = 0;
> +
>   while (1) {
> + die "storage migration timed out\n" if $err_complete > 300;
> +
>   my $stats = vm_mon_cmd($vmid, "query-block-jobs");
> - my $stat = @$stats[0];
> - die "mirroring job seem to have die. Maybe do you have bad 
> sectors?" if !$stat;
> - die "error job is not mirroring" if $stat->{type} ne "mirror";
>  
> - my $busy = $stat->{busy};
> - my $ready = $stat->{ready};
> + my $running_mirror_jobs = {};
> + foreach my $stat (@$stats) {
> + next if $stat->{type} ne 'mirror';
> + $running_mirror_jobs->{$stat->{device}} = $stat;
> + }
>  
> - if (my $total = $stat->{len}) {
> - my $transferred = $stat->{offset} || 0;
> - my $remaining = $total - $transferred;
> - my $percent = sprintf "%.2f", ($transferred * 100 / $total);
> + my $readycounter = 0;
>  
> - print "transferred: $transferred bytes remaining: $remaining 
> bytes total: $total bytes progression: $percent % busy: $busy ready: $ready 
> \n";
> - }
> + foreach my $job