Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-23 Thread Wolfgang Bumiller
On Fri, Dec 23, 2016 at 10:20:03AM +0100, Alexandre DERUMIER wrote:
> Hi wolfgang,
> 
> I have done more test with drive mirror and nbd target.
> 
> It seem that the hang occur only if the target ip is unreacheable (no network 
> reponse) 
> 
> # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target 
> ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout
> 
> If the ip address exist and up,
> # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target 
> Failed to connect socket: Connection refused
> 
> 
> 
> I'm not sure, maybe it can hang too if pve-firewall do a drop instead a 
> reject on target port.
> 
> 
> 
> I think this come from in qemu net/socket.c,
> 
> where we have an infinite loop.
> 
> I'm not sure how to add a timeout here, help is welcome :)

Doesn't the job show up in query-block-jobs before the connection
finishes? If so then we could just assume if it's still at 0% after a
timeout the connection doesn't work? The question is though, what is a
good timeout?

Or we find (or add) a way to query the connection status, then we can
poll that before we start polling the block-job status.
(we already poll for stuff, so it's not worse than before ;p)

Then again, an actual timeout on qemu's side might even make it
upstream...? Maybe? (If it doesn't I'd prefer polling as described above
over maintaining yet another qemu patch ;-) )

But it won't be a simple select/poll timeout as they use non-blocking
sockets and register the fd with a callback for the connection
completion later (in net_socket_fd_init() it calls
qemu_set_fd_handler() where it ends up controlled by the aio handlers).

It might be possible to add a timer (qemu_timer_new() & friends
perhaps?) to NetSocketState (the NetSocktState cleanup would have to
remove it, and if it gets triggered it cancels the connection instead).

> static int net_socket_connect_init(NetClientState *peer,
>const char *model,
>const char *name,
>const char *host_str)
> {
> NetSocketState *s;
> int fd, connected, ret;
> struct sockaddr_in saddr;
> 
> if (parse_host_port(, host_str) < 0)
> return -1;
> 
> fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
> if (fd < 0) {
> perror("socket");
> return -1;
> }
> qemu_set_nonblock(fd);
> 
> connected = 0;
> for(;;) {
> ret = connect(fd, (struct sockaddr *), sizeof(saddr));
> if (ret < 0) {
> if (errno == EINTR || errno == EWOULDBLOCK) {
> /* continue */
> } else if (errno == EINPROGRESS ||
>errno == EALREADY ||
>errno == EINVAL) {
> break;
> } else {
> perror("connect");
> closesocket(fd);
> return -1;
> }
> } else {
> connected = 1;
> break;
> }
> }
> s = net_socket_fd_init(peer, model, name, fd, connected);
> if (!s)
> return -1;
> snprintf(s->nc.info_str, sizeof(s->nc.info_str),
>  "socket: connect to %s:%d",
>  inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
> return 0;
> }
> 
> 
> 
> 
> - Mail original -
> De: "aderumier" 
> À: "Wolfgang Bumiller" 
> Cc: "pve-devel" 
> Envoyé: Mercredi 21 Décembre 2016 13:57:10
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
> 
> >>Then it can still hang if the destination disappears between tcp_ping() 
> >>and the `drive-mirror` command, so I'd rather get better behavior on qemu's 
> >>side. It needs a time-out or a way to cancel it or something. 
> Yes sure! 
> 
> I'm currently looking at qemu code to see how nbd client works. 
> 
> - Mail original - 
> De: "Wolfgang Bumiller"  
> À: "aderumier"  
> Cc: "pve-devel" , "dietmar"  
> Envoyé: Mercredi 21 Décembre 2016 12:20:28 
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 
> 
> > On December 21, 2016 at 10:51 AM Alexandre DERUMIER  
> > wrote: 
> > 
> > 
> > >>IIRC that was the only blocker. 
> > >> 
> > >>Basically the patchset has to work *without* tcp_ping() since it is an 
> > >>unreliable check, and then we still have to catch failing connections 
> > >>_correctly_. (There's no point in knowing that "some time in the past 
> > >>you were able to connect to something which may or may not have been a 
> > >>qemu nbd server", we need to know whether the drive-mirror job itself 
> > >>was able to connect.) 
> > 
> > For me, the mirror job auto abort if connection is failing during the 
> > migration. Do you see another behaviour ? 
> 
> That covers one problem. IIRC the disk-deletion problem was that due 
> 

Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-23 Thread Alexandre DERUMIER
Hi wolfgang,

I have done more test with drive mirror and nbd target.

It seem that the hang occur only if the target ip is unreacheable (no network 
reponse) 

# drive_mirror -n drive-scsi0 nbd://66.66.66.66/target 
ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout

If the ip address exist and up,
# drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target 
Failed to connect socket: Connection refused



I'm not sure, maybe it can hang too if pve-firewall do a drop instead a reject 
on target port.



I think this come from in qemu net/socket.c,

where we have an infinite loop.

I'm not sure how to add a timeout here, help is welcome :)




static int net_socket_connect_init(NetClientState *peer,
   const char *model,
   const char *name,
   const char *host_str)
{
NetSocketState *s;
int fd, connected, ret;
struct sockaddr_in saddr;

if (parse_host_port(, host_str) < 0)
return -1;

fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
if (fd < 0) {
perror("socket");
return -1;
}
qemu_set_nonblock(fd);

connected = 0;
for(;;) {
ret = connect(fd, (struct sockaddr *), sizeof(saddr));
if (ret < 0) {
if (errno == EINTR || errno == EWOULDBLOCK) {
/* continue */
} else if (errno == EINPROGRESS ||
   errno == EALREADY ||
   errno == EINVAL) {
break;
} else {
perror("connect");
closesocket(fd);
return -1;
}
} else {
connected = 1;
break;
}
}
s = net_socket_fd_init(peer, model, name, fd, connected);
if (!s)
return -1;
snprintf(s->nc.info_str, sizeof(s->nc.info_str),
 "socket: connect to %s:%d",
 inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
return 0;
}




- Mail original -
De: "aderumier" 
À: "Wolfgang Bumiller" 
Cc: "pve-devel" 
Envoyé: Mercredi 21 Décembre 2016 13:57:10
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

>>Then it can still hang if the destination disappears between tcp_ping() 
>>and the `drive-mirror` command, so I'd rather get better behavior on qemu's 
>>side. It needs a time-out or a way to cancel it or something. 
Yes sure! 

I'm currently looking at qemu code to see how nbd client works. 

- Mail original - 
De: "Wolfgang Bumiller"  
À: "aderumier"  
Cc: "pve-devel" , "dietmar"  
Envoyé: Mercredi 21 Décembre 2016 12:20:28 
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 

> On December 21, 2016 at 10:51 AM Alexandre DERUMIER  
> wrote: 
> 
> 
> >>IIRC that was the only blocker. 
> >> 
> >>Basically the patchset has to work *without* tcp_ping() since it is an 
> >>unreliable check, and then we still have to catch failing connections 
> >>_correctly_. (There's no point in knowing that "some time in the past 
> >>you were able to connect to something which may or may not have been a 
> >>qemu nbd server", we need to know whether the drive-mirror job itself 
> >>was able to connect.) 
> 
> For me, the mirror job auto abort if connection is failing during the 
> migration. Do you see another behaviour ? 

That covers one problem. IIRC the disk-deletion problem was that due 
to wrong [] usage around an ipv6 address it could not connect in the 
first place and didn't error as I would have hoped. 

> 
> the tcp_ping was just before launching the drive mirror command, because it 
> was hanging in this case. 

Then it can still hang if the destination disappears between tcp_ping() 
and the `drive-mirror` command, so I'd rather get better behavior on qemu's 
side. It needs a time-out or a way to cancel it or something. 

___ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

2016-12-23 Thread Alexandre DERUMIER
>>Doesn't the job show up in query-block-jobs before the connection
>>finishes? If so then we could just assume if it's still at 0% after a
>>timeout the connection doesn't work? The question is though, what is a
>>good timeout?

the problem is that the qemu monitor is totally hanging...


>>Or we find (or add) a way to query the connection status, then we can 
>>poll that before we start polling the block-job status. 
>>(we already poll for stuff, so it's not worse than before ;p) 

maybe can we mount a tunnel (it's possible to use unix socket like for live 
migration).
but I would like to have something without ssh if possible.
maybe with socat or something like (it's possible to do native tls in qemu now 
for migration and drive mirror)
Like this we can be sure that the tunnel is established before launching 
drive-mirror ?


But yes, it could be great to avoid to maintain another qemu patch.
(and I'm pretty poor with C language and network programming)


- Mail original -
De: "Wolfgang Bumiller" 
À: "aderumier" 
Cc: "pve-devel" 
Envoyé: Vendredi 23 Décembre 2016 10:53:12
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

On Fri, Dec 23, 2016 at 10:20:03AM +0100, Alexandre DERUMIER wrote: 
> Hi wolfgang, 
> 
> I have done more test with drive mirror and nbd target. 
> 
> It seem that the hang occur only if the target ip is unreacheable (no network 
> reponse) 
> 
> # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target 
> ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout 
> 
> If the ip address exist and up, 
> # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target 
> Failed to connect socket: Connection refused 
> 
> 
> 
> I'm not sure, maybe it can hang too if pve-firewall do a drop instead a 
> reject on target port. 
> 
> 
> 
> I think this come from in qemu net/socket.c, 
> 
> where we have an infinite loop. 
> 
> I'm not sure how to add a timeout here, help is welcome :) 

Doesn't the job show up in query-block-jobs before the connection 
finishes? If so then we could just assume if it's still at 0% after a 
timeout the connection doesn't work? The question is though, what is a 
good timeout? 

Or we find (or add) a way to query the connection status, then we can 
poll that before we start polling the block-job status. 
(we already poll for stuff, so it's not worse than before ;p) 

Then again, an actual timeout on qemu's side might even make it 
upstream...? Maybe? (If it doesn't I'd prefer polling as described above 
over maintaining yet another qemu patch ;-) ) 

But it won't be a simple select/poll timeout as they use non-blocking 
sockets and register the fd with a callback for the connection 
completion later (in net_socket_fd_init() it calls 
qemu_set_fd_handler() where it ends up controlled by the aio handlers). 

It might be possible to add a timer (qemu_timer_new() & friends 
perhaps?) to NetSocketState (the NetSocktState cleanup would have to 
remove it, and if it gets triggered it cancels the connection instead). 

> static int net_socket_connect_init(NetClientState *peer, 
> const char *model, 
> const char *name, 
> const char *host_str) 
> { 
> NetSocketState *s; 
> int fd, connected, ret; 
> struct sockaddr_in saddr; 
> 
> if (parse_host_port(, host_str) < 0) 
> return -1; 
> 
> fd = qemu_socket(PF_INET, SOCK_STREAM, 0); 
> if (fd < 0) { 
> perror("socket"); 
> return -1; 
> } 
> qemu_set_nonblock(fd); 
> 
> connected = 0; 
> for(;;) { 
> ret = connect(fd, (struct sockaddr *), sizeof(saddr)); 
> if (ret < 0) { 
> if (errno == EINTR || errno == EWOULDBLOCK) { 
> /* continue */ 
> } else if (errno == EINPROGRESS || 
> errno == EALREADY || 
> errno == EINVAL) { 
> break; 
> } else { 
> perror("connect"); 
> closesocket(fd); 
> return -1; 
> } 
> } else { 
> connected = 1; 
> break; 
> } 
> } 
> s = net_socket_fd_init(peer, model, name, fd, connected); 
> if (!s) 
> return -1; 
> snprintf(s->nc.info_str, sizeof(s->nc.info_str), 
> "socket: connect to %s:%d", 
> inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port)); 
> return 0; 
> } 
> 
> 
> 
> 
> - Mail original - 
> De: "aderumier"  
> À: "Wolfgang Bumiller"  
> Cc: "pve-devel"  
> Envoyé: Mercredi 21 Décembre 2016 13:57:10 
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 
> 
> >>Then it can still hang if the destination disappears between tcp_ping() 
> >>and the `drive-mirror` command, so I'd rather get better behavior on qemu's 
> >>side. It needs a time-out or a way to cancel it or something. 
> Yes sure! 
> 
> I'm currently looking at qemu code to see how nbd client works. 
> 
> - Mail original - 
> De: "Wolfgang Bumiller"  
> À: "aderumier"  
> Cc: "pve-devel" , "dietmar"  
> Envoyé: Mercredi 21 

[pve-devel] applied: [PATCH docs 2/2] small corrections and clarifications

2016-12-23 Thread Dietmar Maurer
applied

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH Storage] Boost LVM Migration Speed

2016-12-23 Thread jlavoy
See commit message.

jlavoy (1):
  Boost LVM Migration Speed. This speeds up host to host LVM migration
to be more in line with what you see using qcows or raw files.

 PVE/Storage.pm | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

-- 
2.10.2

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH Storage] Boot LVM Migration Speed

2016-12-23 Thread jlavoy
Boost LVM Migration Speed. This speeds up host to
 host LVM migration to be more in line with what you see using qcows or raw
 files.

Signed-off-by: jlavoy 
---
 PVE/Storage.pm | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/PVE/Storage.pm b/PVE/Storage.pm
index a904f4e..71b699c 100755
--- a/PVE/Storage.pm
+++ b/PVE/Storage.pm
@@ -613,11 +613,11 @@ sub storage_migrate {
 
eval {
if ($tcfg->{type} eq 'lvmthin') {
-   run_command([["dd", "if=$src"],["/usr/bin/ssh", 
"root\@${target_host}",
- "dd", 'conv=sparse', "of=$dst"]]);
+   run_command([["dd", "if=$src", "bs=4k"],["/usr/bin/ssh", 
"root\@${target_host}",
+ "dd", 'conv=sparse', "of=$dst", "bs=4k"]]);
} else {
-   run_command([["dd", "if=$src"],["/usr/bin/ssh", 
"root\@${target_host}",
- "dd", "of=$dst"]]);
+   run_command([["dd", "if=$src", "bs=4k"],["/usr/bin/ssh", 
"root\@${target_host}",
+ "dd", "of=$dst", "bs=4k"]]);
}
};
if (my $err = $@) {
-- 
2.10.2

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel