Re: [pve-devel] successfull migration but failed resume

2014-09-02 Thread Alexandre DERUMIER
Hi,

I have done test with both nodes with last pve-kernel 3.10, (without the 
specific xsave patch)

and good news, no more migration hang 63XX-61XX !

Could be great if you can could with it :)


- Mail original - 

De: Alexandre DERUMIER aderum...@odiso.com 
À: Michael Rasmussen m...@datanom.net 
Cc: pve-devel@pve.proxmox.com 
Envoyé: Vendredi 29 Août 2014 17:23:31 
Objet: Re: [pve-devel] successfull migration but failed resume 

From which CPU generation has AMD introduced the cpu flag xsave? 

I see it on Opteron 63XX , but not 61XX. 

BTW, does it work for you with current 3.10 kernel ? (which don't have the 
xsave patch yet) 


- Mail original - 

De: Michael Rasmussen m...@datanom.net 
À: pve-devel@pve.proxmox.com 
Envoyé: Vendredi 29 Août 2014 17:21:26 
Objet: Re: [pve-devel] successfull migration but failed resume 

On Fri, 29 Aug 2014 17:11:08 +0200 (CEST) 
Alexandre DERUMIER aderum...@odiso.com wrote: 

 Note, I just receive some new opteron servers, so I'll do tests next week :) 
 
As mentioned before I had the same problems migrating from Opteron to 
Phenom and Athlon II based CPUs. 

From which CPU generation has AMD introduced the cpu flag xsave? 

-- 
Hilsen/Regards 
Michael Rasmussen 

Get my public GnuPG keys: 
michael at rasmussen dot cc 
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E 
mir at datanom dot net 
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C 
mir at miras dot org 
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 
-- 
/usr/games/fortune -es says: 
The world is full of people who have never, since childhood, met an 
open doorway with an open mind. 
-- E. B. White 

___ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2014-08-29 Thread Alexandre DERUMIER
I might be able to do some tests but I have to take this E5-2640 server out 
from this production cluster and create a new test cluster. It takes some 
days until I rearrange things. If that’s fine Im okay.
Does this mean I have to re-install proxmox 3.1 on both cluster nodes?

If you remove node from a cluster, yes, it's better to reinstall it before join 
a new cluster.

(BTW: It's proxmox 3.2 right ? not 3.1 ?)


could be great to test with current 3.10 kernel.



- Mail original - 

De: Christian Tari christ...@zaark.com 
À: Alexandre DERUMIER aderum...@odiso.com 
Envoyé: Vendredi 29 Août 2014 15:19:10 
Objet: Re: [pve-devel] successfull migration but failed resume 

Good. At least we are on track. 

I might be able to do some tests but I have to take this E5-2640 server out 
from this production cluster and create a new test cluster. It takes some days 
until I rearrange things. If that’s fine Im okay. 
Does this mean I have to re-install proxmox 3.1 on both cluster nodes? 

//Christian 


On 29 Aug 2014, at 15:08, Alexandre DERUMIER aderum...@odiso.com wrote: 

 Can it lead issues if we migrate between two different arch? BTW the prior 
 is HP dL360G8 the latter is HP dl380G7. 
 
 I have same bug with amd opteron 63XX - 61XX, 
 
 I think because of a bug of kvm, with the cpuflags :xsave existing on 63XX 
 and not 61XX. 
 https://lkml.org/lkml/2014/2/22/58 
 
 
 It seem to be your case too, with 
 
 E5-2640 0 @ 2.50GHz : xsave 
 CPU E5645 @ 2.40GHz : no xsave. 
 
 
 Does the migration in the reverse way is working ? 
 
 
 I have a kernel 3.10 patch for this xsave bug, but don't have tested it yet. 
 Don't known if you could test it ? 
 
 
 
 
 - Mail original - 
 
 De: Christian Tari christ...@zaark.com 
 À: Alexandre DERUMIER aderum...@odiso.com 
 Envoyé: Vendredi 29 Août 2014 14:16:59 
 Objet: Re: [pve-devel] successfull migration but failed resume 
 
 Yes, the default, kvm64. 
 Can it lead issues if we migrate between two different arch? BTW the prior is 
 HP dL360G8 the latter is HP dl380G7. 
 The strange thing is that it doesn’t happen every time. Especially after a 
 failed migration the subsequent migrations always work. It happens often 
 instances with relatively higher memory usage (6-18GB). Can it be some 
 timeout while the content of the memory is being transferred? 
 Aug 29 11:37:42 ERROR: migration finished with problems (duration 00:04:23) 
 
 
 
 
 //Christian 
 
 
 
 On 29 Aug 2014, at 14:08, Alexandre DERUMIER  aderum...@odiso.com  wrote: 
 
 
 and you guest cpu is kvm64? 
 
 
 - Mail original - 
 
 De: Christian Tari  christ...@zaark.com  
 À: Alexandre DERUMIER  aderum...@odiso.com  
 Envoyé: Vendredi 29 Août 2014 13:02:15 
 Objet: Re: [pve-devel] successfull migration but failed resume 
 
 Source host: 
 processor : 11 
 vendor_id : GenuineIntel 
 cpu family : 6 
 model : 45 
 model name : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz 
 stepping : 7 
 cpu MHz : 2493.793 
 cache size : 15360 KB 
 physical id : 0 
 siblings : 12 
 core id : 5 
 cpu cores : 6 
 apicid : 11 
 initial apicid : 11 
 fpu : yes 
 fpu_exception : yes 
 cpuid level : 13 
 wp : yes 
 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
 pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc 
 aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 
 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave 
 avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority 
 ept vpid 
 bogomips : 4987.58 
 clflush size : 64 
 cache_alignment : 64 
 address sizes : 46 bits physical, 48 bits virtual 
 power management: 
 
 # pveversion 
 pve-manager/3.2-1/1933730b (running kernel: 2.6.32-27-pve) 
 
 Target host: 
 processor : 11 
 vendor_id : GenuineIntel 
 cpu family : 6 
 model : 44 
 model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz 
 stepping : 2 
 cpu MHz : 2399.404 
 cache size : 12288 KB 
 physical id : 1 
 siblings : 12 
 core id : 9 
 cpu cores : 6 
 apicid : 50 
 initial apicid : 50 
 fpu : yes 
 fpu_exception : yes 
 cpuid level : 11 
 wp : yes 
 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
 pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc 
 aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 
 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts 
 tpr_shadow vnmi flexpriority ept vpid 
 bogomips : 4798.17 
 clflush size : 64 
 cache_alignment : 64 
 address sizes : 40 bits physical, 48 bits virtual 
 power management: 
 
 # pveversion 
 pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve) 
 
 //Christian 
 
 
 On 29 Aug 2014, at 12:56, Alexandre DERUMIER  aderum...@odiso.com  wrote: 
 
 
  
 
 blockquote 
 
 blockquote 
 Aug 29 11:37:39 ERROR

Re: [pve-devel] successfull migration but failed resume

2014-08-29 Thread Alexandre DERUMIER
Note, I just receive some new opteron servers, so I'll do tests next week :)


- Mail original - 

De: Alexandre DERUMIER aderum...@odiso.com 
À: Christian Tari christ...@zaark.com 
Cc: pve-devel@pve.proxmox.com 
Envoyé: Vendredi 29 Août 2014 16:14:09 
Objet: Re: [pve-devel] successfull migration but failed resume 

I might be able to do some tests but I have to take this E5-2640 server out 
from this production cluster and create a new test cluster. It takes some 
days until I rearrange things. If that’s fine Im okay. 
Does this mean I have to re-install proxmox 3.1 on both cluster nodes? 

If you remove node from a cluster, yes, it's better to reinstall it before join 
a new cluster. 

(BTW: It's proxmox 3.2 right ? not 3.1 ?) 


could be great to test with current 3.10 kernel. 



- Mail original - 

De: Christian Tari christ...@zaark.com 
À: Alexandre DERUMIER aderum...@odiso.com 
Envoyé: Vendredi 29 Août 2014 15:19:10 
Objet: Re: [pve-devel] successfull migration but failed resume 

Good. At least we are on track. 

I might be able to do some tests but I have to take this E5-2640 server out 
from this production cluster and create a new test cluster. It takes some days 
until I rearrange things. If that’s fine Im okay. 
Does this mean I have to re-install proxmox 3.1 on both cluster nodes? 

//Christian 


On 29 Aug 2014, at 15:08, Alexandre DERUMIER aderum...@odiso.com wrote: 

 Can it lead issues if we migrate between two different arch? BTW the prior 
 is HP dL360G8 the latter is HP dl380G7. 
 
 I have same bug with amd opteron 63XX - 61XX, 
 
 I think because of a bug of kvm, with the cpuflags :xsave existing on 63XX 
 and not 61XX. 
 https://lkml.org/lkml/2014/2/22/58 
 
 
 It seem to be your case too, with 
 
 E5-2640 0 @ 2.50GHz : xsave 
 CPU E5645 @ 2.40GHz : no xsave. 
 
 
 Does the migration in the reverse way is working ? 
 
 
 I have a kernel 3.10 patch for this xsave bug, but don't have tested it yet. 
 Don't known if you could test it ? 
 
 
 
 
 - Mail original - 
 
 De: Christian Tari christ...@zaark.com 
 À: Alexandre DERUMIER aderum...@odiso.com 
 Envoyé: Vendredi 29 Août 2014 14:16:59 
 Objet: Re: [pve-devel] successfull migration but failed resume 
 
 Yes, the default, kvm64. 
 Can it lead issues if we migrate between two different arch? BTW the prior is 
 HP dL360G8 the latter is HP dl380G7. 
 The strange thing is that it doesn’t happen every time. Especially after a 
 failed migration the subsequent migrations always work. It happens often 
 instances with relatively higher memory usage (6-18GB). Can it be some 
 timeout while the content of the memory is being transferred? 
 Aug 29 11:37:42 ERROR: migration finished with problems (duration 00:04:23) 
 
 
 
 
 //Christian 
 
 
 
 On 29 Aug 2014, at 14:08, Alexandre DERUMIER  aderum...@odiso.com  wrote: 
 
 
 and you guest cpu is kvm64? 
 
 
 - Mail original - 
 
 De: Christian Tari  christ...@zaark.com  
 À: Alexandre DERUMIER  aderum...@odiso.com  
 Envoyé: Vendredi 29 Août 2014 13:02:15 
 Objet: Re: [pve-devel] successfull migration but failed resume 
 
 Source host: 
 processor : 11 
 vendor_id : GenuineIntel 
 cpu family : 6 
 model : 45 
 model name : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz 
 stepping : 7 
 cpu MHz : 2493.793 
 cache size : 15360 KB 
 physical id : 0 
 siblings : 12 
 core id : 5 
 cpu cores : 6 
 apicid : 11 
 initial apicid : 11 
 fpu : yes 
 fpu_exception : yes 
 cpuid level : 13 
 wp : yes 
 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
 pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc 
 aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 
 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave 
 avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority 
 ept vpid 
 bogomips : 4987.58 
 clflush size : 64 
 cache_alignment : 64 
 address sizes : 46 bits physical, 48 bits virtual 
 power management: 
 
 # pveversion 
 pve-manager/3.2-1/1933730b (running kernel: 2.6.32-27-pve) 
 
 Target host: 
 processor : 11 
 vendor_id : GenuineIntel 
 cpu family : 6 
 model : 44 
 model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz 
 stepping : 2 
 cpu MHz : 2399.404 
 cache size : 12288 KB 
 physical id : 1 
 siblings : 12 
 core id : 9 
 cpu cores : 6 
 apicid : 50 
 initial apicid : 50 
 fpu : yes 
 fpu_exception : yes 
 cpuid level : 11 
 wp : yes 
 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
 pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc 
 aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 
 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts 
 tpr_shadow vnmi flexpriority ept vpid 
 bogomips : 4798.17 
 clflush size : 64

Re: [pve-devel] successfull migration but failed resume

2014-08-29 Thread Michael Rasmussen
On Fri, 29 Aug 2014 17:11:08 +0200 (CEST)
Alexandre DERUMIER aderum...@odiso.com wrote:

 Note, I just receive some new opteron servers, so I'll do tests next week :)
 
As mentioned before I had the same problems migrating from Opteron to
Phenom and Athlon II based CPUs.

From which CPU generation has AMD introduced the cpu flag xsave?

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael at rasmussen dot cc
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E
mir at datanom dot net
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C
mir at miras dot org
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917
--
/usr/games/fortune -es says:
The world is full of people who have never, since childhood, met an
open doorway with an open mind.
-- E. B. White


pgpr4DpOOvHlQ.pgp
Description: OpenPGP digital signature
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2014-08-29 Thread Alexandre DERUMIER
From which CPU generation has AMD introduced the cpu flag xsave?

I see it on Opteron 63XX , but not 61XX.

BTW, does it work for you with current 3.10 kernel ? (which don't have the 
xsave patch yet)


- Mail original - 

De: Michael Rasmussen m...@datanom.net 
À: pve-devel@pve.proxmox.com 
Envoyé: Vendredi 29 Août 2014 17:21:26 
Objet: Re: [pve-devel] successfull migration but failed resume 

On Fri, 29 Aug 2014 17:11:08 +0200 (CEST) 
Alexandre DERUMIER aderum...@odiso.com wrote: 

 Note, I just receive some new opteron servers, so I'll do tests next week :) 
 
As mentioned before I had the same problems migrating from Opteron to 
Phenom and Athlon II based CPUs. 

From which CPU generation has AMD introduced the cpu flag xsave? 

-- 
Hilsen/Regards 
Michael Rasmussen 

Get my public GnuPG keys: 
michael at rasmussen dot cc 
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E 
mir at datanom dot net 
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C 
mir at miras dot org 
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 
-- 
/usr/games/fortune -es says: 
The world is full of people who have never, since childhood, met an 
open doorway with an open mind. 
-- E. B. White 

___ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2014-08-29 Thread Michael Rasmussen
On Fri, 29 Aug 2014 17:23:31 +0200 (CEST)
Alexandre DERUMIER aderum...@odiso.com wrote:

 From which CPU generation has AMD introduced the cpu flag xsave?
 
 I see it on Opteron 63XX , but not 61XX.
 
Just found it here: Family 15h and up.
https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2013-2076

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael at rasmussen dot cc
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E
mir at datanom dot net
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C
mir at miras dot org
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917
--
/usr/games/fortune -es says:
Writing free verse is like playing tennis with the net down.


pgp60Sr8yeDYf.pgp
Description: OpenPGP digital signature
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Alexandre DERUMIER
Yes, that is what I thought about. 

Another possibility, is a race condition with file rename.
If the file is rename on node1, but not yet on node2, the qmp resume will fail
unable to find configuration file for VM xxx - no such machine
(I don't known how pve clusterfs work)

I have send a patch to mailing to display qm result error is migration task log


The only safe thing is to stop both sides?
 
Well, it's already safe, because target process is in pause state, and source 
process goes in pause at the end of the migration.

So if qm resume fail, I think user simply need to resume it manually.



- Mail original - 

De: Dietmar Maurer diet...@proxmox.com 
À: Alexandre DERUMIER aderum...@odiso.com 
Cc: pve-devel@pve.proxmox.com 
Envoyé: Dimanche 24 Février 2013 08:43:18 
Objet: RE: [pve-devel] successfull migration but failed resume 

 Now, why the 'cont' fail,I really don't known, I can't reproduce it easily. 
 What we need to verify, is can we resume manually the target vm if the 
 'cont' 
 fail ? 
 maybe something bad has happen during the migration, and target vm is in 
 strange state and qmp fail ? 

Yes, that is what I thought about. The only safe thing is to stop both sides? 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Dietmar Maurer
 Another possibility, is a race condition with file rename.
 If the file is rename on node1, but not yet on node2, the qmp resume will fail
 unable to find configuration file for VM xxx - no such machine
 (I don't known how pve clusterfs work)
 
 I have send a patch to mailing to display qm result error is migration task 
 log

thanks.

 The only safe thing is to stop both sides?
 
 Well, it's already safe, because target process is in pause state, and 
 source
 process goes in pause at the end of the migration.
 
 So if qm resume fail, I think user simply need to resume it manually.

Ok
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Alexandre DERUMIER
I've seen this sometimes. Is there any way to see how the output of the 
ssh command was? 

Stefan, when you have this error, do you see a resume task in pve-manager 
task list ?

If not,that mean that it's hang before fork_worker, so it's not qmp cont 
command related

I have send a patch to display errors in migration task list, if an error occur 
before fork_woker


qm resume code is:


qm resume 
code = sub {
my ($param) = @_;

my $rpcenv = PVE::RPCEnvironment::get();

my $authuser = $rpcenv-get_user();

my $node = extract_param($param, 'node');

my $vmid = extract_param($param, 'vmid');

my $skiplock = extract_param($param, 'skiplock');
raise_param_exc({ skiplock = Only root may use this option. })
if $skiplock  $authuser ne 'root@pam';

die VM $vmid not running\n if !PVE::QemuServer::check_running($vmid);

my $realcmd = sub {
my $upid = shift;

syslog('info', resume VM $vmid: $upid\n);

PVE::QemuServer::vm_resume($vmid, $skiplock);

return;
};

return $rpcenv-fork_worker('qmresume', $vmid, $authuser, $realcmd);
}});




So it's possible that is hanging on 
die VM $vmid not running\n if !PVE::QemuServer::check_running($vmid);

because config file is not yet available

- Mail original - 

De: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
À: pve-devel@pve.proxmox.com 
Envoyé: Vendredi 22 Février 2013 15:01:25 
Objet: [pve-devel] successfull migration but failed resume 

Hello, 

I've seen this sometimes. Is there any way to see how the output of the 
ssh command was? 

Feb 22 14:48:05 migration speed: 819.20 MB/s - downtime 49 ms 
Feb 22 14:48:05 migration status: completed 
Feb 22 14:48:06 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' 
root@10.255.0.20 qm resume 129 --skiplock' failed: exit code 2 
Feb 22 14:48:07 ERROR: migration finished with problems (duration 00:00:10) 

Greets, 
Stefan 
___ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Stefan Priebe

Hi Alexandre,
Am 24.02.2013 09:34, schrieb Alexandre DERUMIER:

I've seen this sometimes. Is there any way to see how the output of the
ssh command was?

Stefan, when you have this error, do you see a resume task in pve-manager 
task list ?


Yes i have a resume task and this task show status OK. But the migration 
task says failed.


Stefan



- Mail original -

De: Stefan Priebe - Profihost AG s.pri...@profihost.ag
À: pve-devel@pve.proxmox.com
Envoyé: Vendredi 22 Février 2013 15:01:25
Objet: [pve-devel] successfull migration but failed resume

Hello,

I've seen this sometimes. Is there any way to see how the output of the
ssh command was?

Feb 22 14:48:05 migration speed: 819.20 MB/s - downtime 49 ms
Feb 22 14:48:05 migration status: completed
Feb 22 14:48:06 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes'
root@10.255.0.20 qm resume 129 --skiplock' failed: exit code 2
Feb 22 14:48:07 ERROR: migration finished with problems (duration 00:00:10)

Greets,
Stefan
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Stefan Priebe

Hi,

what is the problem / disadvantage of this way:

1.) don't use -S so the VM starts directly after being migrated (we 
minimize downtime by may be 1s for the ssh resume stuff)


2.) we move the config file at the beginning of the migration

3.) if the source host crashes while migration the source kvm process is 
dead anyways so starting on the new target won't be a problem


4.) if the target host crashes while migrating the source host will 
detect this and abort the migration + move the config back.


Greets,
Stefan

Am 24.02.2013 13:48, schrieb Stefan Priebe:

Hi Alexandre,
Am 24.02.2013 09:34, schrieb Alexandre DERUMIER:

I've seen this sometimes. Is there any way to see how the output of the
ssh command was?

Stefan, when you have this error, do you see a resume task in
pve-manager task list ?


Yes i have a resume task and this task show status OK. But the migration
task says failed.

Stefan



- Mail original -

De: Stefan Priebe - Profihost AG s.pri...@profihost.ag
À: pve-devel@pve.proxmox.com
Envoyé: Vendredi 22 Février 2013 15:01:25
Objet: [pve-devel] successfull migration but failed resume

Hello,

I've seen this sometimes. Is there any way to see how the output of the
ssh command was?

Feb 22 14:48:05 migration speed: 819.20 MB/s - downtime 49 ms
Feb 22 14:48:05 migration status: completed
Feb 22 14:48:06 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes'
root@10.255.0.20 qm resume 129 --skiplock' failed: exit code 2
Feb 22 14:48:07 ERROR: migration finished with problems (duration
00:00:10)

Greets,
Stefan
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Dietmar Maurer
 4.) if the target host crashes while migrating the source host will detect 
 this and
 abort the migration + move the config back.

This is technically not possible - how do you detect that?
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Alexandre DERUMIER
The config file should always be with the kvm running.

Or you'll lost graphs stats for example during the migration. (not everybody 
have 10gb link, so migration can take time)

and 4) if something bad happen and config is not moving back, you will have a 
phantom running kvm on source.

- Mail original - 

De: Stefan Priebe s.pri...@profihost.ag 
À: Alexandre DERUMIER aderum...@odiso.com, Dietmar Maurer 
diet...@proxmox.com 
Cc: pve-devel@pve.proxmox.com 
Envoyé: Dimanche 24 Février 2013 14:11:42 
Objet: Re: [pve-devel] successfull migration but failed resume 

Hi, 

what is the problem / disadvantage of this way: 

1.) don't use -S so the VM starts directly after being migrated (we 
minimize downtime by may be 1s for the ssh resume stuff) 

2.) we move the config file at the beginning of the migration 

3.) if the source host crashes while migration the source kvm process is 
dead anyways so starting on the new target won't be a problem 

4.) if the target host crashes while migrating the source host will 
detect this and abort the migration + move the config back. 

Greets, 
Stefan 

Am 24.02.2013 13:48, schrieb Stefan Priebe: 
 Hi Alexandre, 
 Am 24.02.2013 09:34, schrieb Alexandre DERUMIER: 
 I've seen this sometimes. Is there any way to see how the output of the 
 ssh command was? 
 Stefan, when you have this error, do you see a resume task in 
 pve-manager task list ? 
 
 Yes i have a resume task and this task show status OK. But the migration 
 task says failed. 
 
 Stefan 
 
 
 - Mail original - 
 
 De: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
 À: pve-devel@pve.proxmox.com 
 Envoyé: Vendredi 22 Février 2013 15:01:25 
 Objet: [pve-devel] successfull migration but failed resume 
 
 Hello, 
 
 I've seen this sometimes. Is there any way to see how the output of the 
 ssh command was? 
 
 Feb 22 14:48:05 migration speed: 819.20 MB/s - downtime 49 ms 
 Feb 22 14:48:05 migration status: completed 
 Feb 22 14:48:06 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' 
 root@10.255.0.20 qm resume 129 --skiplock' failed: exit code 2 
 Feb 22 14:48:07 ERROR: migration finished with problems (duration 
 00:00:10) 
 
 Greets, 
 Stefan 
 ___ 
 pve-devel mailing list 
 pve-devel@pve.proxmox.com 
 http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Alexandre DERUMIER
Yes i have a resume task and this task show status OK. But the migration 
task says failed.  
Damn, this is strange...
and how is the state of the target vm ? paused ? crashed ?

- Mail original - 

De: Stefan Priebe s.pri...@profihost.ag 
À: Alexandre DERUMIER aderum...@odiso.com 
Cc: pve-devel@pve.proxmox.com 
Envoyé: Dimanche 24 Février 2013 13:48:05 
Objet: Re: [pve-devel] successfull migration but failed resume 

Hi Alexandre, 
Am 24.02.2013 09:34, schrieb Alexandre DERUMIER: 
 I've seen this sometimes. Is there any way to see how the output of the 
 ssh command was? 
 Stefan, when you have this error, do you see a resume task in pve-manager 
 task list ? 

Yes i have a resume task and this task show status OK. But the migration 
task says failed. 

Stefan 

 
 - Mail original - 
 
 De: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
 À: pve-devel@pve.proxmox.com 
 Envoyé: Vendredi 22 Février 2013 15:01:25 
 Objet: [pve-devel] successfull migration but failed resume 
 
 Hello, 
 
 I've seen this sometimes. Is there any way to see how the output of the 
 ssh command was? 
 
 Feb 22 14:48:05 migration speed: 819.20 MB/s - downtime 49 ms 
 Feb 22 14:48:05 migration status: completed 
 Feb 22 14:48:06 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' 
 root@10.255.0.20 qm resume 129 --skiplock' failed: exit code 2 
 Feb 22 14:48:07 ERROR: migration finished with problems (duration 00:00:10) 
 
 Greets, 
 Stefan 
 ___ 
 pve-devel mailing list 
 pve-devel@pve.proxmox.com 
 http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Stefan Priebe

Am 24.02.2013 14:44, schrieb Dietmar Maurer:

4.) if the target host crashes while migrating the source host will detect this 
and
abort the migration + move the config back.


This is technically not possible - how do you detect that?


mhm good question... - was just a spontanious idea. I thought the source 
host won't acknowledge the migration finish via qmp.


Stefan
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Stefan Priebe

Hi,

Am 24.02.2013 14:44, schrieb Alexandre DERUMIER:

The config file should always be with the kvm running.

Or you'll lost graphs stats for example during the migration. (not everybody 
have 10gb link, so migration can take time)
and 4) if something bad happen and config is not moving back, you will have a 
phantom running kvm on source.


Yes sure. Make sense.

Stefan
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Stefan Priebe


Am 24.02.2013 14:51, schrieb Alexandre DERUMIER:

Yes i have a resume task and this task show status OK. But the migration
task says failed.

Damn, this is strange...
and how is the state of the target vm ? paused ? crashed ?


No idea as proxmox kills the target kvm proces if the migration fails. 
But not crashed i don't see a segfault. Most probably paused.


Stefan




- Mail original -

De: Stefan Priebe s.pri...@profihost.ag
À: Alexandre DERUMIER aderum...@odiso.com
Cc: pve-devel@pve.proxmox.com
Envoyé: Dimanche 24 Février 2013 13:48:05
Objet: Re: [pve-devel] successfull migration but failed resume

Hi Alexandre,
Am 24.02.2013 09:34, schrieb Alexandre DERUMIER:

I've seen this sometimes. Is there any way to see how the output of the
ssh command was?

Stefan, when you have this error, do you see a resume task in pve-manager 
task list ?


Yes i have a resume task and this task show status OK. But the migration
task says failed.

Stefan



- Mail original -

De: Stefan Priebe - Profihost AG s.pri...@profihost.ag
À: pve-devel@pve.proxmox.com
Envoyé: Vendredi 22 Février 2013 15:01:25
Objet: [pve-devel] successfull migration but failed resume

Hello,

I've seen this sometimes. Is there any way to see how the output of the
ssh command was?

Feb 22 14:48:05 migration speed: 819.20 MB/s - downtime 49 ms
Feb 22 14:48:05 migration status: completed
Feb 22 14:48:06 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes'
root@10.255.0.20 qm resume 129 --skiplock' failed: exit code 2
Feb 22 14:48:07 ERROR: migration finished with problems (duration 00:00:10)

Greets,
Stefan
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-24 Thread Alexandre DERUMIER
No idea as proxmox kills the target kvm proces if the migration fails. 
Not true for the last phase, resume is done is phase3_cleanup.

I have done the test, using a die instead qmp cont command in resume task, 
the migration task finish with error,
but the target vm is in pause. I just need to resume it.


But not crashed i don't see a segfault. Most probably paused. 

Maybe my patch will show more info if it's happen again...



- Mail original - 

De: Stefan Priebe s.pri...@profihost.ag 
À: Alexandre DERUMIER aderum...@odiso.com 
Cc: pve-devel@pve.proxmox.com 
Envoyé: Dimanche 24 Février 2013 20:09:31 
Objet: Re: [pve-devel] successfull migration but failed resume 


Am 24.02.2013 14:51, schrieb Alexandre DERUMIER: 
 Yes i have a resume task and this task show status OK. But the migration 
 task says failed. 
 Damn, this is strange... 
 and how is the state of the target vm ? paused ? crashed ? 

No idea as proxmox kills the target kvm proces if the migration fails. 
But not crashed i don't see a segfault. Most probably paused. 

Stefan 



 - Mail original - 
 
 De: Stefan Priebe s.pri...@profihost.ag 
 À: Alexandre DERUMIER aderum...@odiso.com 
 Cc: pve-devel@pve.proxmox.com 
 Envoyé: Dimanche 24 Février 2013 13:48:05 
 Objet: Re: [pve-devel] successfull migration but failed resume 
 
 Hi Alexandre, 
 Am 24.02.2013 09:34, schrieb Alexandre DERUMIER: 
 I've seen this sometimes. Is there any way to see how the output of the 
 ssh command was? 
 Stefan, when you have this error, do you see a resume task in pve-manager 
 task list ? 
 
 Yes i have a resume task and this task show status OK. But the migration 
 task says failed. 
 
 Stefan 
 
 
 - Mail original - 
 
 De: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
 À: pve-devel@pve.proxmox.com 
 Envoyé: Vendredi 22 Février 2013 15:01:25 
 Objet: [pve-devel] successfull migration but failed resume 
 
 Hello, 
 
 I've seen this sometimes. Is there any way to see how the output of the 
 ssh command was? 
 
 Feb 22 14:48:05 migration speed: 819.20 MB/s - downtime 49 ms 
 Feb 22 14:48:05 migration status: completed 
 Feb 22 14:48:06 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' 
 root@10.255.0.20 qm resume 129 --skiplock' failed: exit code 2 
 Feb 22 14:48:07 ERROR: migration finished with problems (duration 00:00:10) 
 
 Greets, 
 Stefan 
 ___ 
 pve-devel mailing list 
 pve-devel@pve.proxmox.com 
 http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-23 Thread Stefan Priebe - Profihost AG
But isn't it a simple rename right now? Under which circumstances this can fail?

Am 23.02.2013 um 08:05 schrieb Alexandre DERUMIER aderum...@odiso.com:

 yes
 
 Well, No really,
 
 if the migration fail, the target vm process is always killed, so it's not a 
 problem.
 
 The problem is when we  have the target vm correctly migrated, but the vm 
 config file that is keep on first node.(timeframe windows is very short,maybe 
 1s)
 In this case,you have a phantom kvm process on target, as user don't see 
 the vm on target node, and user can start again the vm on first node, and 
 boom.
 
 
 
 It was really a problem last year, If I remember the vm config file was moved 
 at the begin of the migration, and we killed the sourcevm when migration 
 failed. 
 But killing the sourcevm didn't always working,so we had a phantom process 
 on sourcevm and user can start again the vm on target vm and boom.
 
 
 So this is why we start in paused. But the risk currently is in the little 
 timeframe at the end of the migration, when we need to move the config file.
 
 Ideas are welcome to improve this  ;)
 
 
 
 
 
 - Mail original - 
 
 De: Dietmar Maurer diet...@proxmox.com 
 À: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
 Cc: Alexandre DERUMIER aderum...@odiso.com, pve-devel@pve.proxmox.com 
 Envoyé: Vendredi 22 Février 2013 20:04:37 
 Objet: RE: [pve-devel] successfull migration but failed resume 
 
 Mhm but in cases like Mine we have no running vm on both sides. So are you 
 sure that when migrating there could be a reason to have two vms running?
 
 yes 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-23 Thread Alexandre DERUMIER
But isn't it a simple rename right now? Under which circumstances this can 
fail? 
Yes, it's just a rename, chance to fail are very little.

A host crash between the end of the migration and the rename file can give us 
problem for example.



- Mail original - 

De: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
À: Alexandre DERUMIER aderum...@odiso.com 
Cc: Dietmar Maurer diet...@proxmox.com, pve-devel@pve.proxmox.com 
Envoyé: Samedi 23 Février 2013 09:29:00 
Objet: Re: [pve-devel] successfull migration but failed resume 

But isn't it a simple rename right now? Under which circumstances this can 
fail? 

Am 23.02.2013 um 08:05 schrieb Alexandre DERUMIER aderum...@odiso.com: 

 yes 
 
 Well, No really, 
 
 if the migration fail, the target vm process is always killed, so it's not a 
 problem. 
 
 The problem is when we have the target vm correctly migrated, but the vm 
 config file that is keep on first node.(timeframe windows is very short,maybe 
 1s) 
 In this case,you have a phantom kvm process on target, as user don't see 
 the vm on target node, and user can start again the vm on first node, and 
 boom. 
 
 
 
 It was really a problem last year, If I remember the vm config file was moved 
 at the begin of the migration, and we killed the sourcevm when migration 
 failed. 
 But killing the sourcevm didn't always working,so we had a phantom process 
 on sourcevm and user can start again the vm on target vm and boom. 
 
 
 So this is why we start in paused. But the risk currently is in the little 
 timeframe at the end of the migration, when we need to move the config file. 
 
 Ideas are welcome to improve this ;) 
 
 
 
 
 
 - Mail original - 
 
 De: Dietmar Maurer diet...@proxmox.com 
 À: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
 Cc: Alexandre DERUMIER aderum...@odiso.com, pve-devel@pve.proxmox.com 
 Envoyé: Vendredi 22 Février 2013 20:04:37 
 Objet: RE: [pve-devel] successfull migration but failed resume 
 
 Mhm but in cases like Mine we have no running vm on both sides. So are you 
 sure that when migrating there could be a reason to have two vms running? 
 
 yes 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-23 Thread Alexandre DERUMIER
Maybe can we hack qemu, and make the file rename from qemu, at the end of the 
migration ? 

in migration.c

static void migrate_fd_completed(MigrationState *s)
{
    DPRINTF(setting completed state\n);
    if (migrate_fd_cleanup(s)  0) {
        s-state = MIG_STATE_ERROR;
    } else {
        s-state = MIG_STATE_COMPLETED;

 move config file here

        runstate_set(RUN_STATE_POSTMIGRATE);
    }
    notifier_list_notify(migration_state_notifiers, s);
}




- Mail original - 

De: Alexandre DERUMIER aderum...@odiso.com 
À: Michael Rasmussen m...@datanom.net 
Cc: pve-devel@pve.proxmox.com 
Envoyé: Samedi 23 Février 2013 12:44:50 
Objet: Re: [pve-devel] successfull migration but failed resume 

I think the main problem is that qemu is doing the switch himself. 
(when all memory is transferred, the source process is paused and the target 
process continue. 

But we move check the config file, by checking the migration status in a loop 
with qmp command, with some ms sleep 
So it's possible that the migration is finished some milliseconds before we see 
it and move the file. 
Also qmp check command could fail. 


Maybe can we hack qemu, and make the file rename from qemu, at the end of the 
migration ? 
Dietmar, what do you think about this ? 


- Mail original - 

De: Michael Rasmussen m...@datanom.net 
À: pve-devel@pve.proxmox.com 
Envoyé: Samedi 23 Février 2013 11:21:35 
Objet: Re: [pve-devel] successfull migration but failed resume 

On Sat, 23 Feb 2013 08:05:35 +0100 (CET) 
Alexandre DERUMIER aderum...@odiso.com wrote: 

 
 Ideas are welcome to improve this ;) 
 
 
Since it will always be a deal between to nodes you could implement 
something like the TCP 3-way handshake (SYN,SYN-ACK,ACK). 

Node A sends a Migrate SYNchronize packet to Node B 

Node B receives A's SYN 

Node B sends a SYNchronize-ACKnowledgement 

Node A receives B's SYN-ACK 

Node A sends ACKnowledge 

Node B receives ACK. 

Migration is completed. 

-- 
Hilsen/Regards 
Michael Rasmussen 

Get my public GnuPG keys: 
michael at rasmussen dot cc 
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E 
mir at datanom dot net 
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C 
mir at miras dot org 
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 
-- 
She's so tough she won't take 'yes' for an answer. 

___ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
___ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-23 Thread Alexandre DERUMIER
Sound easy enough. And that solves the whole problem? 
I solve 99,99%. with this zait's safe to launch the target vm without -S 



last thing, is that just after end of the migration, the source kvm process is 
in pause (no more acess to disk),
but we stop it at the end of the phase 3.

So, if source host crash, it's not a problem.
If proxmox task crash (after the migration and before the stop), 
we can have a phantom kvm process on source node, but doing nothing
Note that is already like this now.

- Mail original - 

De: Dietmar Maurer diet...@proxmox.com 
À: Alexandre DERUMIER aderum...@odiso.com, Michael Rasmussen 
m...@datanom.net 
Cc: pve-devel@pve.proxmox.com 
Envoyé: Samedi 23 Février 2013 14:55:17 
Objet: RE: [pve-devel] successfull migration but failed resume 

 Maybe can we hack qemu, and make the file rename from qemu, at the end of 
 the migration ? 
 Dietmar, what do you think about this ? 

Sound easy enough. And that solves the whole problem? 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-23 Thread Dietmar Maurer
 I think this last point can be resolve by hacking qemu, to kill himself after 
 a
 timeout of X seconds when this migration is finished.
 Don't known if it's easy to implement.
 
 So if the migration task hang between the file move and qmp stop is send , we
 have a protection.

Why do we want to change anything (I guess I missed some mails)? 
If the 'cont' command fails, we should try to find out why?

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-23 Thread Alexandre DERUMIER
Why do we want to change anything (I guess I missed some mails)? 
If the 'cont' command fails, we should try to find out why? 

Yes sure, It was just a proposal to improve things.

Mainly if the source host crash at the end of the migration, or qmp 
migrate-status hang,...
, before the file is move or the resume command is sent.
And also to reduce from some ms the migration time.

But of course not for proxmox 2.3 ;)



Now, why the 'cont' fail,I really don't known, I can't reproduce it easily.
What we need to verify, is can we resume manually the target vm if the 'cont' 
fail ?
maybe something bad has happen during the migration, and target vm is in 
strange state and qmp fail ?
 



- Mail original - 

De: Dietmar Maurer diet...@proxmox.com 
À: Alexandre DERUMIER aderum...@odiso.com 
Cc: pve-devel@pve.proxmox.com 
Envoyé: Dimanche 24 Février 2013 08:12:03 
Objet: RE: [pve-devel] successfull migration but failed resume 

 I think this last point can be resolve by hacking qemu, to kill himself after 
 a 
 timeout of X seconds when this migration is finished. 
 Don't known if it's easy to implement. 
 
 So if the migration task hang between the file move and qmp stop is send , we 
 have a protection. 

Why do we want to change anything (I guess I missed some mails)? 
If the 'cont' command fails, we should try to find out why? 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-23 Thread Dietmar Maurer
 Now, why the 'cont' fail,I really don't known, I can't reproduce it easily.
 What we need to verify, is can we resume manually the target vm if the 'cont'
 fail ?
 maybe something bad has happen during the migration, and target vm is in
 strange state and qmp fail ?

Yes, that is what I thought about. The only safe thing is to stop both sides?

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-22 Thread Alexandre DERUMIER
root@10.255.0.20 qm resume 129 --skiplock' failed: exit code 2 
Feb 22 14:48:07 ERROR: migration finished with problems (duration 00:00:10) 

I have also see this bug sometimes.

I don't know how to display the output, but the command send cont command to 
qmp socket of migrate vm,
to resume it. So maybe it's fail to connect to qmp socket.
(maybe a retry can help ? )

We are starting with -S (to pause it), it is because we want to be sure to 
resume it after move the config file.



- Mail original - 

De: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
À: pve-devel@pve.proxmox.com 
Envoyé: Vendredi 22 Février 2013 15:01:25 
Objet: [pve-devel] successfull migration but failed resume 

Hello, 

I've seen this sometimes. Is there any way to see how the output of the 
ssh command was? 

Feb 22 14:48:05 migration speed: 819.20 MB/s - downtime 49 ms 
Feb 22 14:48:05 migration status: completed 
Feb 22 14:48:06 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' 
root@10.255.0.20 qm resume 129 --skiplock' failed: exit code 2 
Feb 22 14:48:07 ERROR: migration finished with problems (duration 00:00:10) 

Greets, 
Stefan 
___ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-22 Thread Stefan Priebe - Profihost AG
Dietmar why do we pause?

Stefan

Am 22.02.2013 um 15:37 schrieb Alexandre DERUMIER aderum...@odiso.com:

 root@10.255.0.20 qm resume 129 --skiplock' failed: exit code 2 
 Feb 22 14:48:07 ERROR: migration finished with problems (duration 00:00:10) 
 
 I have also see this bug sometimes.
 
 I don't know how to display the output, but the command send cont command 
 to qmp socket of migrate vm,
 to resume it. So maybe it's fail to connect to qmp socket.
 (maybe a retry can help ? )
 
 We are starting with -S (to pause it), it is because we want to be sure to 
 resume it after move the config file.
 
 
 
 - Mail original - 
 
 De: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
 À: pve-devel@pve.proxmox.com 
 Envoyé: Vendredi 22 Février 2013 15:01:25 
 Objet: [pve-devel] successfull migration but failed resume 
 
 Hello, 
 
 I've seen this sometimes. Is there any way to see how the output of the 
 ssh command was? 
 
 Feb 22 14:48:05 migration speed: 819.20 MB/s - downtime 49 ms 
 Feb 22 14:48:05 migration status: completed 
 Feb 22 14:48:06 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' 
 root@10.255.0.20 qm resume 129 --skiplock' failed: exit code 2 
 Feb 22 14:48:07 ERROR: migration finished with problems (duration 00:00:10) 
 
 Greets, 
 Stefan 
 ___ 
 pve-devel mailing list 
 pve-devel@pve.proxmox.com 
 http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-22 Thread Stefan Priebe - Profihost AG
Mhm but in cases like Mine we have no running vm on both sides. So are you sure 
that when migrating there could be a reason to have two vms running?

Stefan

Am 22.02.2013 um 18:51 schrieb Dietmar Maurer diet...@proxmox.com:

 Dietmar why do we pause?
 
 For safety reasons. We want to avoid the same VM running two times.
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] successfull migration but failed resume

2013-02-22 Thread Alexandre DERUMIER
yes

Well, No really,
 
if the migration fail, the target vm process is always killed, so it's not a 
problem.

The problem is when we  have the target vm correctly migrated, but the vm 
config file that is keep on first node.(timeframe windows is very short,maybe 
1s)
In this case,you have a phantom kvm process on target, as user don't see the 
vm on target node, and user can start again the vm on first node, and boom.



It was really a problem last year, If I remember the vm config file was moved 
at the begin of the migration, and we killed the sourcevm when migration 
failed. 
But killing the sourcevm didn't always working,so we had a phantom process on 
sourcevm and user can start again the vm on target vm and boom.


So this is why we start in paused. But the risk currently is in the little 
timeframe at the end of the migration, when we need to move the config file.

Ideas are welcome to improve this  ;)





- Mail original - 

De: Dietmar Maurer diet...@proxmox.com 
À: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
Cc: Alexandre DERUMIER aderum...@odiso.com, pve-devel@pve.proxmox.com 
Envoyé: Vendredi 22 Février 2013 20:04:37 
Objet: RE: [pve-devel] successfull migration but failed resume 

 Mhm but in cases like Mine we have no running vm on both sides. So are you 
 sure that when migrating there could be a reason to have two vms running? 

yes 
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel