Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3
Hi, the difference is in sub vm_start in QemuServer.pm. After the kvm process is started PVE sends some human monitor commands. If i comment this line: # eval { vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); }; Everything works fine and everything is OK again. I also get this in my logs but i checked $migrate_downtime is 1 so it IS a NUMBER: Dec 26 13:09:27 cloud1-1202 qm[8726]: VM 105 qmp command failed - VM 105 qmp command 'migrate_set_downtime' failed - Invalid parameter type for 'value', expected: number Stefan Am 26.12.2012 07:45, schrieb Alexandre DERUMIER: I can even start it with daemonize from shell. Migration works fine. It just doesn't work when started from PVE. This is crazy I don't see any difference from starting it from shell or from pve And if your remove the balloon device, migration is 100% working, starting from pve ? just to be sure, can you try to do info balloon from human monitor console ? (I would like to see if the balloon driver is correctly working) - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: pve-devel@pve.proxmox.com Envoyé: Mardi 25 Décembre 2012 10:05:10 Objet: Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3 I can even start it with daemonize from shell. Migration works fine. It just doesn't work when started from PVE. Stefan Am 24.12.2012 15:48, schrieb Alexandre DERUMIER: does it work if you keep device virtio-balloon enabled, but comment in qemuserver.pm line 3005 vm_mon_cmd_nocheck($vmid, 'qom-set', path = machine/peripheral/balloon0, property = stats-polling-interval, value = 2); and line2081 $qmpclient-queue_cmd($vmid, $ballooncb, 'query-balloon'); ? - Mail original - De: Alexandre DERUMIER aderum...@odiso.com À: Stefan Priebe s.pri...@profihost.ag Cc: pve-devel@pve.proxmox.com Envoyé: Lundi 24 Décembre 2012 15:38:13 Objet: Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3 maybe it's related to qmp queries to balloon driver (for stats) during migration ? - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Dietmar Maurer diet...@proxmox.com Cc: Alexandre DERUMIER aderum...@odiso.com, pve-devel@pve.proxmox.com Envoyé: Lundi 24 Décembre 2012 15:32:52 Objet: Baloon Device is the problem! Re: [pve-devel] migration problems since qemu 1.3 Hello, it works fine / again if / when i remove the baloon pci device. If i remove this line everything is fine again! -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 Greets Stefan Am 24.12.2012 15:05, schrieb Stefan Priebe: Am 24.12.2012 14:08, schrieb Dietmar Maurer: virtio0: cephkvmpool1:vm-105-disk- 1,iops_rd=215,iops_wr=155,mbps_rd=130,mbps_wr=90,size=20G Please can you also test without ceph? The same. I now also tried a debian netboot cd (6.0.5) but then 32bit doesn't work too. I had no disks attached at all. I filled the tmpfs ramdisk under /dev with dd if=/dev/urandom of=/dev/myfile bs=1M count=900 Greets, Stefan ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3
I see that Dietmar have change recently vm_mon_cmd($vmid, migrate_set_downtime, value = $migrate_downtime); }; to vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); }; https://git.proxmox.com/?p=qemu-server.git;a=blobdiff;f=PVE/QemuServer.pm;h=165eaf6be6e5fe4b1c88454d28b113bc2b1f20af;hp=81a935176aca16e013fd6987f2ddbc72260092cf;hb=95381ce06cea266d40911a7129da6067a1640cbf;hpb=4bdb05142cfcef09495a45ffb256955f7b947caa so maybe before, the migrate_set_downtime was not apply (because the vm_mon_cmd check if the vm config file exist on the target). Do you have any migrate_downtime parameters in your vm config Because it shouldn't be sent if not my $migrate_downtime = $defaults-{migrate_downtime}; $migrate_downtime = $conf-{migrate_downtime} if defined($conf-{migrate_downtime}); if (defined($migrate_downtime)) { eval { vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); }; } ... - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: pve-devel@pve.proxmox.com Envoyé: Mercredi 26 Décembre 2012 13:18:33 Objet: Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3 Hi, the difference is in sub vm_start in QemuServer.pm. After the kvm process is started PVE sends some human monitor commands. If i comment this line: # eval { vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); }; Everything works fine and everything is OK again. I also get this in my logs but i checked $migrate_downtime is 1 so it IS a NUMBER: Dec 26 13:09:27 cloud1-1202 qm[8726]: VM 105 qmp command failed - VM 105 qmp command 'migrate_set_downtime' failed - Invalid parameter type for 'value', expected: number Stefan Am 26.12.2012 07:45, schrieb Alexandre DERUMIER: I can even start it with daemonize from shell. Migration works fine. It just doesn't work when started from PVE. This is crazy I don't see any difference from starting it from shell or from pve And if your remove the balloon device, migration is 100% working, starting from pve ? just to be sure, can you try to do info balloon from human monitor console ? (I would like to see if the balloon driver is correctly working) - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: pve-devel@pve.proxmox.com Envoyé: Mardi 25 Décembre 2012 10:05:10 Objet: Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3 I can even start it with daemonize from shell. Migration works fine. It just doesn't work when started from PVE. Stefan Am 24.12.2012 15:48, schrieb Alexandre DERUMIER: does it work if you keep device virtio-balloon enabled, but comment in qemuserver.pm line 3005 vm_mon_cmd_nocheck($vmid, 'qom-set', path = machine/peripheral/balloon0, property = stats-polling-interval, value = 2); and line2081 $qmpclient-queue_cmd($vmid, $ballooncb, 'query-balloon'); ? - Mail original - De: Alexandre DERUMIER aderum...@odiso.com À: Stefan Priebe s.pri...@profihost.ag Cc: pve-devel@pve.proxmox.com Envoyé: Lundi 24 Décembre 2012 15:38:13 Objet: Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3 maybe it's related to qmp queries to balloon driver (for stats) during migration ? - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Dietmar Maurer diet...@proxmox.com Cc: Alexandre DERUMIER aderum...@odiso.com, pve-devel@pve.proxmox.com Envoyé: Lundi 24 Décembre 2012 15:32:52 Objet: Baloon Device is the problem! Re: [pve-devel] migration problems since qemu 1.3 Hello, it works fine / again if / when i remove the baloon pci device. If i remove this line everything is fine again! -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 Greets Stefan Am 24.12.2012 15:05, schrieb Stefan Priebe: Am 24.12.2012 14:08, schrieb Dietmar Maurer: virtio0: cephkvmpool1:vm-105-disk- 1,iops_rd=215,iops_wr=155,mbps_rd=130,mbps_wr=90,size=20G Please can you also test without ceph? The same. I now also tried a debian netboot cd (6.0.5) but then 32bit doesn't work too. I had no disks attached at all. I filled the tmpfs ramdisk under /dev with dd if=/dev/urandom of=/dev/myfile bs=1M count=900 Greets, Stefan ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3
the migrate_downtime = 1, come from QemuServer.pm migrate_downtime = { optional = 1, type = 'integer', description = Set maximum tolerated downtime (in seconds) for migrations., minimum = 0, default = 1, DEFAULT VALUE }, I don't know if we really need a default value, because it's always setting migrate_downtime to 1. Now, I don't know what really happen to you, because recent changes can set migrate_downtime to the target vm (vm_mon_cmd_nocheck) But I don't think it's doing something because the migrate_downtime should be done one sourcevm. Can you try to replace vm_mon_cmd_nocheck by vm_mon_cmd ? (So it should works only at vm_start but not when live migrate occur on target vm) also migrate_downtime, should be set on sourcevm before the migration begin (QemuMigrate.pm). I don't know why we are setting it at vm start. - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: pve-devel@pve.proxmox.com Envoyé: Mercredi 26 Décembre 2012 13:18:33 Objet: Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3 Hi, the difference is in sub vm_start in QemuServer.pm. After the kvm process is started PVE sends some human monitor commands. If i comment this line: # eval { vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); }; Everything works fine and everything is OK again. I also get this in my logs but i checked $migrate_downtime is 1 so it IS a NUMBER: Dec 26 13:09:27 cloud1-1202 qm[8726]: VM 105 qmp command failed - VM 105 qmp command 'migrate_set_downtime' failed - Invalid parameter type for 'value', expected: number Stefan Am 26.12.2012 07:45, schrieb Alexandre DERUMIER: I can even start it with daemonize from shell. Migration works fine. It just doesn't work when started from PVE. This is crazy I don't see any difference from starting it from shell or from pve And if your remove the balloon device, migration is 100% working, starting from pve ? just to be sure, can you try to do info balloon from human monitor console ? (I would like to see if the balloon driver is correctly working) - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: pve-devel@pve.proxmox.com Envoyé: Mardi 25 Décembre 2012 10:05:10 Objet: Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3 I can even start it with daemonize from shell. Migration works fine. It just doesn't work when started from PVE. Stefan Am 24.12.2012 15:48, schrieb Alexandre DERUMIER: does it work if you keep device virtio-balloon enabled, but comment in qemuserver.pm line 3005 vm_mon_cmd_nocheck($vmid, 'qom-set', path = machine/peripheral/balloon0, property = stats-polling-interval, value = 2); and line2081 $qmpclient-queue_cmd($vmid, $ballooncb, 'query-balloon'); ? - Mail original - De: Alexandre DERUMIER aderum...@odiso.com À: Stefan Priebe s.pri...@profihost.ag Cc: pve-devel@pve.proxmox.com Envoyé: Lundi 24 Décembre 2012 15:38:13 Objet: Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3 maybe it's related to qmp queries to balloon driver (for stats) during migration ? - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Dietmar Maurer diet...@proxmox.com Cc: Alexandre DERUMIER aderum...@odiso.com, pve-devel@pve.proxmox.com Envoyé: Lundi 24 Décembre 2012 15:32:52 Objet: Baloon Device is the problem! Re: [pve-devel] migration problems since qemu 1.3 Hello, it works fine / again if / when i remove the baloon pci device. If i remove this line everything is fine again! -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 Greets Stefan Am 24.12.2012 15:05, schrieb Stefan Priebe: Am 24.12.2012 14:08, schrieb Dietmar Maurer: virtio0: cephkvmpool1:vm-105-disk- 1,iops_rd=215,iops_wr=155,mbps_rd=130,mbps_wr=90,size=20G Please can you also test without ceph? The same. I now also tried a debian netboot cd (6.0.5) but then 32bit doesn't work too. I had no disks attached at all. I filled the tmpfs ramdisk under /dev with dd if=/dev/urandom of=/dev/myfile bs=1M count=900 Greets, Stefan ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3
Hi, Am 26.12.2012 17:40, schrieb Alexandre DERUMIER: I don't know if we really need a default value, because it's always setting migrate_downtime to 1. It also isn't accepted you get the answer back that 1 isn't a number. Don't know what format a number needs? Now, I don't know what really happen to you, because recent changes can set migrate_downtime to the target vm (vm_mon_cmd_nocheck) But I don't think it's doing something because the migrate_downtime should be done one sourcevm. You get the error message that 1 isn't a number. If i get this message migration fails after. Can you try to replace vm_mon_cmd_nocheck by vm_mon_cmd ? (So it should works only at vm_start but not when live migrate occur on target vm) Done - works see my other post. Stefan ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH] move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2
Signed-off-by: Stefan Priebe s.pri...@profihost.ag --- PVE/QemuMigrate.pm | 28 PVE/QemuServer.pm | 15 --- 2 files changed, 24 insertions(+), 19 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index 0711681..de84fed 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -323,24 +323,44 @@ sub phase2 { $self-{tunnel} = $self-fork_tunnel($self-{nodeip}, $lport, $rport); $self-log('info', starting online/live migration on port $lport); -# start migration -my $start = time(); +# load_defaults +my $defaults = PVE::QemuServer::load_defaults(); + +# always set migrate speed (overwrite kvm default of 32m) +# we set a very hight default of 8192m which is basically unlimited +my $migrate_speed = $defaults-{migrate_speed} || 8192; +$migrate_speed = $conf-{migrate_speed} || $migrate_speed; +$migrate_speed = $migrate_speed * 1048576; +$self-log('info', migrate_set_speed: $migrate_speed); +eval { +PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed); +}; +$self-log('info', migrate_set_speed error: $@) if $@; + +my $migrate_downtime = $defaults-{migrate_downtime}; +$migrate_downtime = $conf-{migrate_downtime} if defined($conf-{migrate_downtime}); +$self-log('info', migrate_set_downtime: $migrate_downtime); +eval { +PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); +}; +$self-log('info', migrate_set_downtime error: $@) if $@; my $capabilities = {}; $capabilities-{capability} = xbzrle; $capabilities-{state} = JSON::false; - eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set-capabilities, capabilities = [$capabilities]); }; -#set cachesize 10% of the total memory +# set cachesize 10% of the total memory my $cachesize = int($conf-{memory}*1048576/10); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set-cache-size, value = $cachesize); }; +# start migration +my $start = time(); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate, uri = tcp:localhost:$lport); }; diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 1d4c275..d56fe65 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -2979,21 +2979,6 @@ sub vm_start { warn $@ if $@; } - # always set migrate speed (overwrite kvm default of 32m) - # we set a very hight default of 8192m which is basically unlimited - my $migrate_speed = $defaults-{migrate_speed} || 8192; - $migrate_speed = $conf-{migrate_speed} || $migrate_speed; - $migrate_speed = $migrate_speed * 1048576; - eval { - vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed); - }; - - my $migrate_downtime = $defaults-{migrate_downtime}; - $migrate_downtime = $conf-{migrate_downtime} if defined($conf-{migrate_downtime}); - if (defined($migrate_downtime)) { - eval { vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); }; - } - if($migratedfrom) { my $capabilities = {}; $capabilities-{capability} = xbzrle; -- 1.7.10.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH] move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2
Hello list, this patch fixes my migration problems. The strange thing is that there must be another bug too - NOT related to this patch. My output looks like this: Dec 26 21:42:24 migrate_set_speed: 8589934592 Dec 26 21:42:24 migrate_set_speed error: VM 105 qmp command 'migrate_set_speed' failed - Invalid parameter type for 'value', expected: int Dec 26 21:42:24 migrate_set_downtime: 1 Dec 26 21:42:24 migrate_set_downtime error: VM 105 qmp command 'migrate_set_downtime' failed - Invalid parameter type for 'value', expected: number To me these values and expected format seems to be OK. Greets, Stefan Am 26.12.2012 21:41, schrieb Stefan Priebe: Signed-off-by: Stefan Priebe s.pri...@profihost.ag --- PVE/QemuMigrate.pm | 28 PVE/QemuServer.pm | 15 --- 2 files changed, 24 insertions(+), 19 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index 0711681..de84fed 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -323,24 +323,44 @@ sub phase2 { $self-{tunnel} = $self-fork_tunnel($self-{nodeip}, $lport, $rport); $self-log('info', starting online/live migration on port $lport); -# start migration -my $start = time(); +# load_defaults +my $defaults = PVE::QemuServer::load_defaults(); + +# always set migrate speed (overwrite kvm default of 32m) +# we set a very hight default of 8192m which is basically unlimited +my $migrate_speed = $defaults-{migrate_speed} || 8192; +$migrate_speed = $conf-{migrate_speed} || $migrate_speed; +$migrate_speed = $migrate_speed * 1048576; +$self-log('info', migrate_set_speed: $migrate_speed); +eval { +PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed); +}; +$self-log('info', migrate_set_speed error: $@) if $@; + +my $migrate_downtime = $defaults-{migrate_downtime}; +$migrate_downtime = $conf-{migrate_downtime} if defined($conf-{migrate_downtime}); +$self-log('info', migrate_set_downtime: $migrate_downtime); +eval { +PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); +}; +$self-log('info', migrate_set_downtime error: $@) if $@; my $capabilities = {}; $capabilities-{capability} = xbzrle; $capabilities-{state} = JSON::false; - eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set-capabilities, capabilities = [$capabilities]); }; -#set cachesize 10% of the total memory +# set cachesize 10% of the total memory my $cachesize = int($conf-{memory}*1048576/10); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set-cache-size, value = $cachesize); }; +# start migration +my $start = time(); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate, uri = tcp:localhost:$lport); }; diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 1d4c275..d56fe65 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -2979,21 +2979,6 @@ sub vm_start { warn $@ if $@; } - # always set migrate speed (overwrite kvm default of 32m) - # we set a very hight default of 8192m which is basically unlimited - my $migrate_speed = $defaults-{migrate_speed} || 8192; - $migrate_speed = $conf-{migrate_speed} || $migrate_speed; - $migrate_speed = $migrate_speed * 1048576; - eval { - vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed); - }; - - my $migrate_downtime = $defaults-{migrate_downtime}; - $migrate_downtime = $conf-{migrate_downtime} if defined($conf-{migrate_downtime}); - if (defined($migrate_downtime)) { - eval { vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); }; - } - if($migratedfrom) { my $capabilities = {}; $capabilities-{capability} = xbzrle; ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3
It also isn't accepted you get the answer back that 1 isn't a number. Don't know what format a number needs? the default migrate_downtime is 30ms by default (if we doesn't send qmp command). I think we set 1 sec by default, because of infinite migration (30ms was too short in past with high memory change workload). I see that last migration code from qemu git (1.4), seem to improve a lot the downtime (from 500 - 30ms) with high memory change workload. Don't know if qemu 1.3 works fine without setting downtime to 1sec. I think we need to cast the value as int for the json vm_mon_cmd($vmid, migrate_set_downtime, value = $migrate_downtime); - vm_mon_cmd($vmid, migrate_set_downtime, value = int($migrate_downtime)); I remember same problem with qemu_block_set_io_throttle() vm_mon_cmd($vmid, block_set_io_throttle, device = $deviceid, bps = int($bps), bps_rd = int($bps_rd), bps_wr = int($bps_wr), iops = int($iops), iops_rd = int($iops_rd), iops_wr = int($iops_wr)); So maybe does it send crap if the value is not casted ? Also the value should not be int but float, qmp doc said that we can use 0.5, 0.30, as value. also query-migrate returns some new 2 cools values about downtime, I think we should display them in query migrate log - downtime: only present when migration has finished correctly total amount in ms for downtime that happened (json-int) - expected-downtime: only present while migration is active total amount in ms for downtime that was calculated on the last bitmap round (json-int) - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: pve-devel@pve.proxmox.com Envoyé: Mercredi 26 Décembre 2012 20:52:56 Objet: Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3 Hi, Am 26.12.2012 17:40, schrieb Alexandre DERUMIER: I don't know if we really need a default value, because it's always setting migrate_downtime to 1. It also isn't accepted you get the answer back that 1 isn't a number. Don't know what format a number needs? Now, I don't know what really happen to you, because recent changes can set migrate_downtime to the target vm (vm_mon_cmd_nocheck) But I don't think it's doing something because the migrate_downtime should be done one sourcevm. You get the error message that 1 isn't a number. If i get this message migration fails after. Can you try to replace vm_mon_cmd_nocheck by vm_mon_cmd ? (So it should works only at vm_start but not when live migrate occur on target vm) Done - works see my other post. Stefan ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH V2] - fix setting migration parameters
somes comments: - default = 1, + default = 0, Not sure about lower default migration downtime value to 0, because 0 downtime is nearly impossible to target. Default qemu value is 0.030, so maybe can we simply remove - default = 0, and don't send any qmp command by default. + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime*1); + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed*1); try + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = int($migrate_downtime)); + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = int($migrate_speed)); (more clean) - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: pve-devel@pve.proxmox.com Envoyé: Mercredi 26 Décembre 2012 23:17:56 Objet: [pve-devel] [PATCH V2] - fix setting migration parameters - move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2 - lower default migration downtime value to 0 --- PVE/QemuMigrate.pm | 30 ++ PVE/QemuServer.pm | 17 + 2 files changed, 27 insertions(+), 20 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index 0711681..af8813c 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -323,24 +323,46 @@ sub phase2 { $self-{tunnel} = $self-fork_tunnel($self-{nodeip}, $lport, $rport); $self-log('info', starting online/live migration on port $lport); - # start migration - my $start = time(); + # load_defaults + my $defaults = PVE::QemuServer::load_defaults(); + + # always set migrate speed (overwrite kvm default of 32m) + # we set a very hight default of 8192m which is basically unlimited + my $migrate_speed = $defaults-{migrate_speed} || 8192; + $migrate_speed = $conf-{migrate_speed} || $migrate_speed; + $migrate_speed = $migrate_speed * 1048576; + $self-log('info', migrate_set_speed: $migrate_speed); + eval { + # *1 ensures that JSON module convert the value to number + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed*1); + }; + $self-log('info', migrate_set_speed error: $@) if $@; + + my $migrate_downtime = $defaults-{migrate_downtime}; + $migrate_downtime = $conf-{migrate_downtime} if defined($conf-{migrate_downtime}); + $self-log('info', migrate_set_downtime: $migrate_downtime); + eval { + # *1 ensures that JSON module convert the value to number + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime*1); + }; + $self-log('info', migrate_set_downtime error: $@) if $@; my $capabilities = {}; $capabilities-{capability} = xbzrle; $capabilities-{state} = JSON::false; - eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set-capabilities, capabilities = [$capabilities]); }; - #set cachesize 10% of the total memory + # set cachesize 10% of the total memory my $cachesize = int($conf-{memory}*1048576/10); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set-cache-size, value = $cachesize); }; + # start migration + my $start = time(); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate, uri = tcp:localhost:$lport); }; diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 1d4c275..bb7fd16 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -385,7 +385,7 @@ EODESCR type = 'integer', description = Set maximum tolerated downtime (in seconds) for migrations., minimum = 0, - default = 1, + default = 0, }, cdrom = { optional = 1, @@ -2979,21 +2979,6 @@ sub vm_start { warn $@ if $@; } - # always set migrate speed (overwrite kvm default of 32m) - # we set a very hight default of 8192m which is basically unlimited - my $migrate_speed = $defaults-{migrate_speed} || 8192; - $migrate_speed = $conf-{migrate_speed} || $migrate_speed; - $migrate_speed = $migrate_speed * 1048576; - eval { - vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed); - }; - - my $migrate_downtime = $defaults-{migrate_downtime}; - $migrate_downtime = $conf-{migrate_downtime} if defined($conf-{migrate_downtime}); - if (defined($migrate_downtime)) { - eval { vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); }; - } - if($migratedfrom) { my $capabilities = {}; $capabilities-{capability} = xbzrle; -- 1.7.10.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH] - fix setting migration parameters
From: Stefan Priebe s.pri...@profihost.ag - move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2 - lower default migration downtime value to 0 changelog by aderumier - remove default value of 1s for migrate_downtime - add logs downtime and expected downtime migration stats - only send qmp migrate_downtime if migrate_downtime is defined - add errors logs on qm start of target vm. - cast int() for json values tested with youtube video playing, no qmp migrate_downtime (default of 30ms), the downtime is around 500-600ms. --- PVE/QemuMigrate.pm | 43 +++ PVE/QemuServer.pm | 16 2 files changed, 35 insertions(+), 24 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index 0711681..dbbeb69 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -312,7 +312,10 @@ sub phase2 { if ($line =~ m/^migration listens on port (\d+)$/) { $rport = $1; } -}, errfunc = sub {}); +}, errfunc = sub { + my $line = shift; + print $line.\n; +}); die unable to detect remote migration port\n if !$rport; @@ -323,24 +326,46 @@ sub phase2 { $self-{tunnel} = $self-fork_tunnel($self-{nodeip}, $lport, $rport); $self-log('info', starting online/live migration on port $lport); -# start migration -my $start = time(); +# load_defaults +my $defaults = PVE::QemuServer::load_defaults(); + +# always set migrate speed (overwrite kvm default of 32m) +# we set a very hight default of 8192m which is basically unlimited +my $migrate_speed = $defaults-{migrate_speed} || 8192; +$migrate_speed = $conf-{migrate_speed} || $migrate_speed; +$migrate_speed = $migrate_speed * 1048576; +$self-log('info', migrate_set_speed: $migrate_speed); +eval { +PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = int($migrate_speed)); +}; +$self-log('info', migrate_set_speed error: $@) if $@; + +my $migrate_downtime = $defaults-{migrate_downtime}; +$migrate_downtime = $conf-{migrate_downtime} if defined($conf-{migrate_downtime}); +if (defined($migrate_downtime)) { + $self-log('info', migrate_set_downtime: $migrate_downtime); + eval { + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = int($migrate_downtime)); + }; + $self-log('info', migrate_set_downtime error: $@) if $@; +} my $capabilities = {}; $capabilities-{capability} = xbzrle; $capabilities-{state} = JSON::false; - eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set-capabilities, capabilities = [$capabilities]); }; -#set cachesize 10% of the total memory +# set cachesize 10% of the total memory my $cachesize = int($conf-{memory}*1048576/10); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set-cache-size, value = $cachesize); }; +# start migration +my $start = time(); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate, uri = tcp:localhost:$lport); }; @@ -353,7 +378,6 @@ sub phase2 { while (1) { $i++; my $avglstat = $lstat/$i if $lstat; - usleep($usleep); my $stat; eval { @@ -375,7 +399,8 @@ sub phase2 { my $delay = time() - $start; if ($delay 0) { my $mbps = sprintf %.2f, $conf-{memory}/$delay; - $self-log('info', migration speed: $mbps MB/s); + $self-log('info', migration speed: $mbps MB/s - downtime $stat-{downtime} ms); + } } @@ -397,11 +422,13 @@ sub phase2 { my $xbzrlepages = $stat-{xbzrle-cache}-{pages} || 0; my $xbzrlecachemiss = $stat-{xbzrle-cache}-{cache-miss} || 0; my $xbzrleoverflow = $stat-{xbzrle-cache}-{overflow} || 0; + my $expected_downtime = $stat-{expected-downtime} || 0; + #reduce sleep if remainig memory if lower than the everage transfert $usleep = 30 if $avglstat $rem $avglstat; $self-log('info', migration status: $stat-{status} (transferred ${trans}, . - remaining ${rem}), total ${total})); + remaining ${rem}), total ${total}, expected downtime ${expected_downtime})); #$self-log('info', migration xbzrle cachesize: ${xbzrlecachesize} transferred ${xbzrlebytes} pages ${xbzrlepages} cachemiss ${xbzrlecachemiss} overflow ${xbzrleoverflow}); } diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 165eaf6..b168c74 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -385,7 +385,6 @@ EODESCR type = 'integer', description = Set maximum tolerated downtime (in seconds) for migrations., minimum = 0, - default =
Re: [pve-devel] [PATCH V2] - fix setting migration parameters
- move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2 what is the advantage? - lower default migration downtime value to 0 We do not want that, because this is known to cause long wait times on busy VMs. --- PVE/QemuMigrate.pm | 30 ++ PVE/QemuServer.pm | 17 + 2 files changed, 27 insertions(+), 20 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index 0711681..af8813c 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -323,24 +323,46 @@ sub phase2 { $self-{tunnel} = $self-fork_tunnel($self-{nodeip}, $lport, $rport); $self-log('info', starting online/live migration on port $lport); -# start migration -my $start = time(); +# load_defaults +my $defaults = PVE::QemuServer::load_defaults(); + +# always set migrate speed (overwrite kvm default of 32m) +# we set a very hight default of 8192m which is basically unlimited +my $migrate_speed = $defaults-{migrate_speed} || 8192; +$migrate_speed = $conf-{migrate_speed} || $migrate_speed; +$migrate_speed = $migrate_speed * 1048576; This makes sure that $migrate_speed is an integer. +$self-log('info', migrate_set_speed: $migrate_speed); +eval { +# *1 ensures that JSON module convert the value to number +PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed*1); so $migrate_speed*1 is not needed here. +}; +$self-log('info', migrate_set_speed error: $@) if $@; + +my $migrate_downtime = $defaults-{migrate_downtime}; +$migrate_downtime = $conf-{migrate_downtime} if defined($conf- {migrate_downtime}); +$self-log('info', migrate_set_downtime: $migrate_downtime); +eval { +# *1 ensures that JSON module convert the value to number +PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime*1); Please use: value = int($migrate_downtime) Or what is the advantage of '*1'? +# start migration +my $start = time(); what is this? eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate, uri = tcp:localhost:$lport); }; diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 1d4c275..bb7fd16 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -385,7 +385,7 @@ EODESCR type = 'integer', description = Set maximum tolerated downtime (in seconds) for migrations., minimum = 0, - default = 1, + default = 0, }, cdrom = { optional = 1, @@ -2979,21 +2979,6 @@ sub vm_start { warn $@ if $@; } - # always set migrate speed (overwrite kvm default of 32m) - # we set a very hight default of 8192m which is basically unlimited - my $migrate_speed = $defaults-{migrate_speed} || 8192; - $migrate_speed = $conf-{migrate_speed} || $migrate_speed; - $migrate_speed = $migrate_speed * 1048576; - eval { - vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed); - }; - - my $migrate_downtime = $defaults-{migrate_downtime}; - $migrate_downtime = $conf-{migrate_downtime} if defined($conf- {migrate_downtime}); - if (defined($migrate_downtime)) { - eval { vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); }; - } - if($migratedfrom) { my $capabilities = {}; $capabilities-{capability} = xbzrle; -- 1.7.10.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3
I remember same problem with qemu_block_set_io_throttle() vm_mon_cmd($vmid, block_set_io_throttle, device = $deviceid, bps = int($bps), bps_rd = int($bps_rd), bps_wr = int($bps_wr), iops = int($iops), iops_rd = int($iops_rd), iops_wr = int($iops_wr)); So maybe does it send crap if the value is not casted ? No, it send a string value instead (value = 1). Also the value should not be int but float, qmp doc said that we can use 0.5, 0.30, as value. honestly, I am glad if migration work at all ;-) What is the use case of setting it to 0.5 or 0.3? Note: The current time estimation is wrong anyways, and it will always be a rough estimation. also query-migrate returns some new 2 cools values about downtime, I think we should display them in query migrate log - downtime: only present when migration has finished correctly total amount in ms for downtime that happened (json-int) - expected-downtime: only present while migration is active total amount in ms for downtime that was calculated on the last bitmap round (json-int) Yes, that sounds interesting. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] new migration patches in qemu.git
I see that last migration code from qemu git (1.4), seem to improve a lot the downtime (from 500 - 30ms) with high memory change workload. This is this commit http://git.qemu.org/?p=qemu.git;a=commit;h=bb5801f551ee8591d576d87a9 290af297998e322 The changes are huge, but maybe can we put them in pve-qemu-kvm package ?(Dietmar ? any opinion ?) IMHO those changes are too large to include now. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH] move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2
try: value = int($migrate_downtime)); The downtime is a float so that won't work. -Original Message- From: pve-devel-boun...@pve.proxmox.com [mailto:pve-devel- boun...@pve.proxmox.com] On Behalf Of Stefan Priebe Sent: Mittwoch, 26. Dezember 2012 21:44 To: Stefan Priebe Cc: pve-devel@pve.proxmox.com Subject: Re: [pve-devel] [PATCH] move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2 Hello list, this patch fixes my migration problems. The strange thing is that there must be another bug too - NOT related to this patch. My output looks like this: Dec 26 21:42:24 migrate_set_speed: 8589934592 Dec 26 21:42:24 migrate_set_speed error: VM 105 qmp command 'migrate_set_speed' failed - Invalid parameter type for 'value', expected: int Dec 26 21:42:24 migrate_set_downtime: 1 Dec 26 21:42:24 migrate_set_downtime error: VM 105 qmp command 'migrate_set_downtime' failed - Invalid parameter type for 'value', expected: number To me these values and expected format seems to be OK. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] fix setting migration parameters V3
That works for me too. A value of 1 does not work. Whole vm stalls immediately and socket is unavailable. Migration than takes 5-10 minutes of an idle vm. Am 27.12.2012 um 06:45 schrieb Alexandre Derumier aderum...@odiso.com: this is a V3 rework of stefan patch. main change: remove default value of migrate_downtime, so will use qemu value of 30ms. tested with youtube video HD, downtime is around 500 ms, else if default target is 30ms. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH V2] - fix setting migration parameters
Hi, Am 27.12.2012 um 02:57 schrieb Alexandre DERUMIER aderum...@odiso.com: somes comments: - default = 1, + default = 0, Not sure about lower default migration downtime value to 0, because 0 downtime is nearly impossible to target. Default qemu value is 0.030, so maybe can we simply remove - default = 0, and don't send any qmp command by default. That's ok / fine and works for me too. + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime*1); + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed*1); try + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = int($migrate_downtime)); + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = int($migrate_speed)); (more clean) - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: pve-devel@pve.proxmox.com Envoyé: Mercredi 26 Décembre 2012 23:17:56 Objet: [pve-devel] [PATCH V2] - fix setting migration parameters - move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2 - lower default migration downtime value to 0 --- PVE/QemuMigrate.pm | 30 ++ PVE/QemuServer.pm | 17 + 2 files changed, 27 insertions(+), 20 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index 0711681..af8813c 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -323,24 +323,46 @@ sub phase2 { $self-{tunnel} = $self-fork_tunnel($self-{nodeip}, $lport, $rport); $self-log('info', starting online/live migration on port $lport); - # start migration - my $start = time(); + # load_defaults + my $defaults = PVE::QemuServer::load_defaults(); + + # always set migrate speed (overwrite kvm default of 32m) + # we set a very hight default of 8192m which is basically unlimited + my $migrate_speed = $defaults-{migrate_speed} || 8192; + $migrate_speed = $conf-{migrate_speed} || $migrate_speed; + $migrate_speed = $migrate_speed * 1048576; + $self-log('info', migrate_set_speed: $migrate_speed); + eval { + # *1 ensures that JSON module convert the value to number + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed*1); + }; + $self-log('info', migrate_set_speed error: $@) if $@; + + my $migrate_downtime = $defaults-{migrate_downtime}; + $migrate_downtime = $conf-{migrate_downtime} if defined($conf-{migrate_downtime}); + $self-log('info', migrate_set_downtime: $migrate_downtime); + eval { + # *1 ensures that JSON module convert the value to number + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime*1); + }; + $self-log('info', migrate_set_downtime error: $@) if $@; my $capabilities = {}; $capabilities-{capability} = xbzrle; $capabilities-{state} = JSON::false; - eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set-capabilities, capabilities = [$capabilities]); }; - #set cachesize 10% of the total memory + # set cachesize 10% of the total memory my $cachesize = int($conf-{memory}*1048576/10); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set-cache-size, value = $cachesize); }; + # start migration + my $start = time(); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate, uri = tcp:localhost:$lport); }; diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 1d4c275..bb7fd16 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -385,7 +385,7 @@ EODESCR type = 'integer', description = Set maximum tolerated downtime (in seconds) for migrations., minimum = 0, - default = 1, + default = 0, }, cdrom = { optional = 1, @@ -2979,21 +2979,6 @@ sub vm_start { warn $@ if $@; } - # always set migrate speed (overwrite kvm default of 32m) - # we set a very hight default of 8192m which is basically unlimited - my $migrate_speed = $defaults-{migrate_speed} || 8192; - $migrate_speed = $conf-{migrate_speed} || $migrate_speed; - $migrate_speed = $migrate_speed * 1048576; - eval { - vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed); - }; - - my $migrate_downtime = $defaults-{migrate_downtime}; - $migrate_downtime = $conf-{migrate_downtime} if defined($conf-{migrate_downtime}); - if (defined($migrate_downtime)) { - eval { vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); }; - } - if($migratedfrom) { my $capabilities = {}; $capabilities-{capability} = xbzrle; -- 1.7.10.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH] - fix setting migration parameters
- move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2 This is not needed - or why do we need that? - lower default migration downtime value to 0 changelog by aderumier - remove default value of 1s for migrate_downtime why? - we added that because the default cause seroius problems! - add logs downtime and expected downtime migration stats Please can you send an extra patch for that? - only send qmp migrate_downtime if migrate_downtime is defined I want to have that as default (set on startup). - add errors logs on qm start of target vm. great, but can we have an extra commit for that? Those changes seems to be unrelated. - cast int() for json values tested with youtube video playing, no qmp migrate_downtime (default of 30ms), the downtime is around 500-600ms. --- PVE/QemuMigrate.pm | 43 +++--- - PVE/QemuServer.pm | 16 2 files changed, 35 insertions(+), 24 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index 0711681..dbbeb69 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -312,7 +312,10 @@ sub phase2 { if ($line =~ m/^migration listens on port (\d+)$/) { $rport = $1; } -}, errfunc = sub {}); +}, errfunc = sub { + my $line = shift; + print $line.\n; +}); die unable to detect remote migration port\n if !$rport; @@ -323,24 +326,46 @@ sub phase2 { $self-{tunnel} = $self-fork_tunnel($self-{nodeip}, $lport, $rport); $self-log('info', starting online/live migration on port $lport); -# start migration -my $start = time(); +# load_defaults +my $defaults = PVE::QemuServer::load_defaults(); + +# always set migrate speed (overwrite kvm default of 32m) +# we set a very hight default of 8192m which is basically unlimited +my $migrate_speed = $defaults-{migrate_speed} || 8192; +$migrate_speed = $conf-{migrate_speed} || $migrate_speed; +$migrate_speed = $migrate_speed * 1048576; +$self-log('info', migrate_set_speed: $migrate_speed); +eval { +PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = int($migrate_speed)); +}; +$self-log('info', migrate_set_speed error: $@) if $@; + +my $migrate_downtime = $defaults-{migrate_downtime}; +$migrate_downtime = $conf-{migrate_downtime} if defined($conf- {migrate_downtime}); +if (defined($migrate_downtime)) { + $self-log('info', migrate_set_downtime: $migrate_downtime); + eval { + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = int($migrate_downtime)); + }; + $self-log('info', migrate_set_downtime error: $@) if $@; +} my $capabilities = {}; $capabilities-{capability} = xbzrle; $capabilities-{state} = JSON::false; - eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set- capabilities, capabilities = [$capabilities]); }; -#set cachesize 10% of the total memory +# set cachesize 10% of the total memory my $cachesize = int($conf-{memory}*1048576/10); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set- cache-size, value = $cachesize); }; +# start migration +my $start = time(); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate, uri = tcp:localhost:$lport); }; @@ -353,7 +378,6 @@ sub phase2 { while (1) { $i++; my $avglstat = $lstat/$i if $lstat; - usleep($usleep); my $stat; eval { @@ -375,7 +399,8 @@ sub phase2 { my $delay = time() - $start; if ($delay 0) { my $mbps = sprintf %.2f, $conf-{memory}/$delay; - $self-log('info', migration speed: $mbps MB/s); + $self-log('info', migration speed: $mbps MB/s - downtime +$stat-{downtime} ms); + } } @@ -397,11 +422,13 @@ sub phase2 { my $xbzrlepages = $stat-{xbzrle-cache}-{pages} || 0; my $xbzrlecachemiss = $stat-{xbzrle-cache}-{cache- miss} || 0; my $xbzrleoverflow = $stat-{xbzrle-cache}-{overflow} || 0; + my $expected_downtime = $stat-{expected-downtime} || 0; + #reduce sleep if remainig memory if lower than the everage transfert $usleep = 30 if $avglstat $rem $avglstat; $self-log('info', migration status: $stat-{status} (transferred ${trans}, . -remaining ${rem}), total ${total})); +remaining ${rem}), total ${total}, expected downtime +${expected_downtime})); #$self-log('info', migration xbzrle cachesize: ${xbzrlecachesize} transferred ${xbzrlebytes} pages ${xbzrlepages} cachemiss ${xbzrlecachemiss} overflow ${xbzrleoverflow}); } diff --git
Re: [pve-devel] [PATCH V2] - fix setting migration parameters
Hi, Am 27.12.2012 um 06:55 schrieb Dietmar Maurer diet...@proxmox.com: - move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2 what is the advantage? Cleaner code. Nobody expects that those values were set at vm start. - lower default migration downtime value to 0 We do not want that, because this is known to cause long wait times on busy VMs. A value of 1 does not work for me. Whole vm stalls immediately and socket is unavailable. Migration than takes 5-10 minutes of an idle vm. --- PVE/QemuMigrate.pm | 30 ++ PVE/QemuServer.pm | 17 + 2 files changed, 27 insertions(+), 20 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index 0711681..af8813c 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -323,24 +323,46 @@ sub phase2 { $self-{tunnel} = $self-fork_tunnel($self-{nodeip}, $lport, $rport); $self-log('info', starting online/live migration on port $lport); -# start migration -my $start = time(); +# load_defaults +my $defaults = PVE::QemuServer::load_defaults(); + +# always set migrate speed (overwrite kvm default of 32m) +# we set a very hight default of 8192m which is basically unlimited +my $migrate_speed = $defaults-{migrate_speed} || 8192; +$migrate_speed = $conf-{migrate_speed} || $migrate_speed; +$migrate_speed = $migrate_speed * 1048576; This makes sure that $migrate_speed is an integer. +$self-log('info', migrate_set_speed: $migrate_speed); +eval { +# *1 ensures that JSON module convert the value to number +PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed*1); so $migrate_speed*1 is not needed here. +}; +$self-log('info', migrate_set_speed error: $@) if $@; + +my $migrate_downtime = $defaults-{migrate_downtime}; +$migrate_downtime = $conf-{migrate_downtime} if defined($conf- {migrate_downtime}); +$self-log('info', migrate_set_downtime: $migrate_downtime); +eval { +# *1 ensures that JSON module convert the value to number +PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime*1); Please use: value = int($migrate_downtime) Or what is the advantage of '*1'? +# start migration +my $start = time(); what is this? eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate, uri = tcp:localhost:$lport); }; diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 1d4c275..bb7fd16 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -385,7 +385,7 @@ EODESCR type = 'integer', description = Set maximum tolerated downtime (in seconds) for migrations., minimum = 0, -default = 1, +default = 0, }, cdrom = { optional = 1, @@ -2979,21 +2979,6 @@ sub vm_start { warn $@ if $@; } -# always set migrate speed (overwrite kvm default of 32m) -# we set a very hight default of 8192m which is basically unlimited -my $migrate_speed = $defaults-{migrate_speed} || 8192; -$migrate_speed = $conf-{migrate_speed} || $migrate_speed; -$migrate_speed = $migrate_speed * 1048576; -eval { -vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = $migrate_speed); -}; - -my $migrate_downtime = $defaults-{migrate_downtime}; -$migrate_downtime = $conf-{migrate_downtime} if defined($conf- {migrate_downtime}); -if (defined($migrate_downtime)) { -eval { vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = $migrate_downtime); }; -} - if($migratedfrom) { my $capabilities = {}; $capabilities-{capability} = xbzrle; -- 1.7.10.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH V2] - fix setting migration parameters
Cleaner code. Nobody expects that those values were set at vm start. We set that since version 1.0 - so 'everybody' expect that this value is set at startup. - lower default migration downtime value to 0 We do not want that, because this is known to cause long wait times on busy VMs. A value of 1 does not work for me. Whole vm stalls immediately and socket is unavailable. Migration than takes 5-10 minutes of an idle vm. So this is a qemu bug - we should try to fix that. Else any VM which use migrate_downtime is broken! ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH] move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2
try: value = int($migrate_downtime)); The downtime is a float so that won't work. We defined that as integer in our framework: -migrate_downtime integer (0 - N) (default=1) Set maximum tolerated downtime (in seconds) for migrations. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH V2] - fix setting migration parameters
Cleaner code. Nobody expects that those values were set at vm start. We set that since version 1.0 - so 'everybody' expect that this value is set at startup. I agree with stefan, this value should be set before migration. currently, we can't change the migrate_downtime without restart the vm. (example: I have a migration which take to much time because of too low migrate_downtime value, I stop the migration, I increase the migrate_downtime, I restart the migration) - Mail original - De: Dietmar Maurer diet...@proxmox.com À: Stefan Priebe - Profihost AG s.pri...@profihost.ag Cc: pve-devel@pve.proxmox.com Envoyé: Jeudi 27 Décembre 2012 07:27:29 Objet: Re: [pve-devel] [PATCH V2] - fix setting migration parameters Cleaner code. Nobody expects that those values were set at vm start. We set that since version 1.0 - so 'everybody' expect that this value is set at startup. - lower default migration downtime value to 0 We do not want that, because this is known to cause long wait times on busy VMs. A value of 1 does not work for me. Whole vm stalls immediately and socket is unavailable. Migration than takes 5-10 minutes of an idle vm. So this is a qemu bug - we should try to fix that. Else any VM which use migrate_downtime is broken! ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH V2] - fix setting migration parameters
I agree with stefan, this value should be set before migration. currently, we can't change the migrate_downtime without restart the vm. (example: I have a migration which take to much time because of too low migrate_downtime value, I stop the migration, I increase the migrate_downtime, I restart the migration) I see - so that change is OK for me. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH V2] - fix setting migration parameters
Hi, Am 27.12.2012 um 07:27 schrieb Dietmar Maurer diet...@proxmox.com: A value of 1 does not work for me. Whole vm stalls immediately and socket is unavailable. Migration than takes 5-10 minutes of an idle vm. So this is a qemu bug - we should try to fix that. Fine but I can't ... Else any VM which use migrate_downtime is broken! If we use qemu defaults why is then every vm using migrate_downtime broken? The value is still set if somebody has it specified in vm conf. To me values bigger than 0.5 do not work at all no idea why. Maybe cause qemu guesses it can migrate the vm in 1s but than can't manage that. Stefan ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH] - fix setting migration parameters
changelog by aderumier - remove default value of 1s for migrate_downtime why? - we added that because the default cause seroius problems! This was because of stefan problem, and qemu 1.3 seem to doing migration fine with default 30ms. What block the monitor exactly ? too low migrate_downtime ? - add logs downtime and expected downtime migration stats Please can you send an extra patch for that? sure no problem - only send qmp migrate_downtime if migrate_downtime is defined I want to have that as default (set on startup). - add errors logs on qm start of target vm. great, but can we have an extra commit for that? Those changes seems to be unrelated. sure no problem - Mail original - De: Dietmar Maurer diet...@proxmox.com À: Alexandre Derumier aderum...@odiso.com, pve-devel@pve.proxmox.com Envoyé: Jeudi 27 Décembre 2012 07:21:38 Objet: RE: [pve-devel] [PATCH] - fix setting migration parameters - move migration speed/downtime from QemuServer vm_start to QemuMigrate phase2 This is not needed - or why do we need that? - lower default migration downtime value to 0 changelog by aderumier - remove default value of 1s for migrate_downtime why? - we added that because the default cause seroius problems! - add logs downtime and expected downtime migration stats Please can you send an extra patch for that? - only send qmp migrate_downtime if migrate_downtime is defined I want to have that as default (set on startup). - add errors logs on qm start of target vm. great, but can we have an extra commit for that? Those changes seems to be unrelated. - cast int() for json values tested with youtube video playing, no qmp migrate_downtime (default of 30ms), the downtime is around 500-600ms. --- PVE/QemuMigrate.pm | 43 +++--- - PVE/QemuServer.pm | 16 2 files changed, 35 insertions(+), 24 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index 0711681..dbbeb69 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -312,7 +312,10 @@ sub phase2 { if ($line =~ m/^migration listens on port (\d+)$/) { $rport = $1; } - }, errfunc = sub {}); + }, errfunc = sub { + my $line = shift; + print $line.\n; + }); die unable to detect remote migration port\n if !$rport; @@ -323,24 +326,46 @@ sub phase2 { $self-{tunnel} = $self-fork_tunnel($self-{nodeip}, $lport, $rport); $self-log('info', starting online/live migration on port $lport); - # start migration - my $start = time(); + # load_defaults + my $defaults = PVE::QemuServer::load_defaults(); + + # always set migrate speed (overwrite kvm default of 32m) + # we set a very hight default of 8192m which is basically unlimited + my $migrate_speed = $defaults-{migrate_speed} || 8192; + $migrate_speed = $conf-{migrate_speed} || $migrate_speed; + $migrate_speed = $migrate_speed * 1048576; + $self-log('info', migrate_set_speed: $migrate_speed); + eval { + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_speed, value = int($migrate_speed)); + }; + $self-log('info', migrate_set_speed error: $@) if $@; + + my $migrate_downtime = $defaults-{migrate_downtime}; + $migrate_downtime = $conf-{migrate_downtime} if defined($conf- {migrate_downtime}); + if (defined($migrate_downtime)) { + $self-log('info', migrate_set_downtime: $migrate_downtime); + eval { + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate_set_downtime, value = int($migrate_downtime)); + }; + $self-log('info', migrate_set_downtime error: $@) if $@; + } my $capabilities = {}; $capabilities-{capability} = xbzrle; $capabilities-{state} = JSON::false; - eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set- capabilities, capabilities = [$capabilities]); }; - #set cachesize 10% of the total memory + # set cachesize 10% of the total memory my $cachesize = int($conf-{memory}*1048576/10); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate-set- cache-size, value = $cachesize); }; + # start migration + my $start = time(); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, migrate, uri = tcp:localhost:$lport); }; @@ -353,7 +378,6 @@ sub phase2 { while (1) { $i++; my $avglstat = $lstat/$i if $lstat; - usleep($usleep); my $stat; eval { @@ -375,7 +399,8 @@ sub phase2 { my $delay = time() - $start; if ($delay 0) { my $mbps = sprintf %.2f, $conf-{memory}/$delay; - $self-log('info', migration speed: $mbps MB/s); + $self-log('info', migration speed: $mbps MB/s - downtime +$stat-{downtime} ms); + } } @@ -397,11 +422,13 @@ sub phase2 { my $xbzrlepages = $stat-{xbzrle-cache}-{pages} || 0; my $xbzrlecachemiss = $stat-{xbzrle-cache}-{cache- miss} || 0; my $xbzrleoverflow = $stat-{xbzrle-cache}-{overflow} || 0; + my $expected_downtime = $stat-{expected-downtime} || 0; +
Re: [pve-devel] [PATCH V2] - fix setting migration parameters
Maybe can we add an global option migrate_downtime in /etc/datacenter.cfg to override default value ? Can we please try to find the bug first? Nobody really want to set a default value for that - instead, migrations should simply work. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH V2] - fix setting migration parameters
Can we please try to find the bug first? Sure! but I can't reproduce it with 1sec. I also can't reproduce the bug. But if I test with migrate_downtime:20, I got timeout on query-migrate. (but migration works fine) Migration logs give me: Dec 27 07:57:56 migrate_set_downtime: 20 Dec 27 07:58:16 migration speed: 400.00 MB/s - downtime 20423 ms Dec 27 07:58:16 migration status: completed @stefan: how fast is your network? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH V2] - fix setting migration parameters
@stefan: - how fast is your network? 10gbe / 20gbe bonded as active/active. - how much RAM do you assign to the VM? In that case 4gb but it also happens with 1gb if more than 800mb/s are in use. What I don't understand is that Alexandre reports speeds with 400mb/s but I get only 40-80mb/s when memory is full but vm idle. I use jumbo frames? I do not know if you use jumbo frames ;-) Anyways, should be easy to test without jumbo frames. Does it help if you set/unset migrate_speed? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel