Re: [one-users] VM life cycle - error handling

2012-03-28 Thread Carlos Martín Sánchez
Hi Danny,


On Thu, Mar 22, 2012 at 6:48 PM, Danny Sternkopf danny.sternk...@csc.fi
 wrote:

 1) onevm shutdown fails:
 [...]

However ONE already released the VMs IP and assigned it to another VM which
 of course cause a clash. I wonder if this is intended to work like this?
 Obviously ONE knows that the VM is still running so it should keep the
 associated IP allocated.


The network leases and the disk images are releases once the VM reaches the
DONE state only. If the shutdown timed out and the VM returned to RUNNING,
this should not happen. Are you sure the OpenNebula VM is in running state?
or did I misunderstand you?


 2) onevm delete fails:
 It is similar to 1). virsh destroy gives an error (ExitCode: 42), but the
 transfer manager is wiping the disks even though the VM is still running.
 (but might be not fully functional anymore.) I also wonder if this makes
 any sense? In this case neither the user nor the administrator realize that
 the VM is still running unless you check the physical host locally or you
 take a look at the VM's log file.


Yes, in this case OpenNebula assumes that the destroy action always
succeeds. Unlike the graceful shutdown action, the VM is not monitored
after the delete action.


As a workaround to this erratic virsh failures, you can set a retry in the
IM and VMM drivers in oned.conf, using the -r argument option [1]

IM_MAD = [
name   = im_kvm,
executable = one_im_ssh,
arguments  = *-r 3* -t 15 kvm ]

VM_MAD = [
name   = vmm_kvm,
executable = one_vmm_exec,
arguments  = -t 15 *-r 3* kvm,
default= vmm_exec/vmm_exec_kvm.conf,
type   = kvm ]

Regards

[1] http://opennebula.org/documentation:documentation:devel-vmm
--
Carlos Martín, MSc
Project Engineer
OpenNebula - The Open-source Solution for Data Center Virtualization
www.OpenNebula.org | cmar...@opennebula.org |
@OpenNebulahttp://twitter.com/opennebulacmar...@opennebula.org



On Thu, Mar 22, 2012 at 6:48 PM, Danny Sternkopf danny.sternk...@csc.fiwrote:

 Hi,

 I do encounter (very rarely as it seems) problems where VMs are not
 properly deleted or shut off by onevm commands. I use ONE 3.0, hosts
 running Fedora15 and KVM and libvirt.

 1) onevm shutdown fails:
 I can see in the VM log file that the shutdown operation timed out and the
 VM is still running. Unfortunately I don't see the reason why 'virsh
 shutdown' failed. There is nothing in the system or libvirt logs. It looks
 for me that virsh can't properly communicate to the libvirtd. That is still
 harmless. However ONE already released the VMs IP and assigned it to
 another VM which of course cause a clash. I wonder if this is intended to
 work like this? Obviously ONE knows that the VM is still running so it
 should keep the associated IP allocated.

 2) onevm delete fails:
 It is similar to 1). virsh destroy gives an error (ExitCode: 42), but the
 transfer manager is wiping the disks even though the VM is still running.
 (but might be not fully functional anymore.) I also wonder if this makes
 any sense? In this case neither the user nor the administrator realize that
 the VM is still running unless you check the physical host locally or you
 take a look at the VM's log file.

 I was not able to find out why virsh failed and could not reproduce it.
 The hosts are healthy, but might have a strange problem at the very moment
 when users requested a VM shutdown or deletion.

 For example I could manually run the same command ONE was executing later
 on and it worked. (/var/tmp/one/vmm/kvm/cancel one-20740 n020504 20740
 n020504)

 Any hints to libvirt issue?

 Regards,

 Danny
 __**_
 Users mailing list
 Users@lists.opennebula.org
 http://lists.opennebula.org/**listinfo.cgi/users-opennebula.**orghttp://lists.opennebula.org/listinfo.cgi/users-opennebula.org

___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] VM life cycle - error handling

2012-03-28 Thread Danny Sternkopf

Hi,

the shutdown issue might be related to the double Vnet IP allocation bug:
http://dev.opennebula.org/issues/1178

As you said, the VM doesn't reach the DONE state and therefore the IP 
doesn't get released. The user has probably assumed it is gone and 
requested the IP again.


Thanks for pointing to the retry argument for IM and VMM drivers! I'll 
give it a try.


Regards,

Danny

On 2012-03-28 12:44, Carlos Martín Sánchez wrote:

Hi Danny,


On Thu, Mar 22, 2012 at 6:48 PM, Danny 
Sternkopfdanny.sternk...@csc.fimailto:danny.sternk...@csc.fi  wrote:
1) onevm shutdown fails:
[...]
However ONE already released the VMs IP and assigned it to another VM which of 
course cause a clash. I wonder if this is intended to work like this? Obviously 
ONE knows that the VM is still running so it should keep the associated IP 
allocated.

The network leases and the disk images are releases once the VM reaches the 
DONE state only. If the shutdown timed out and the VM returned to RUNNING, this 
should not happen. Are you sure the OpenNebula VM is in running state? or did I 
misunderstand you?

2) onevm delete fails:
It is similar to 1). virsh destroy gives an error (ExitCode: 42), but the 
transfer manager is wiping the disks even though the VM is still running. (but 
might be not fully functional anymore.) I also wonder if this makes any sense? 
In this case neither the user nor the administrator realize that the VM is 
still running unless you check the physical host locally or you take a look at 
the VM's log file.

Yes, in this case OpenNebula assumes that the destroy action always succeeds. 
Unlike the graceful shutdown action, the VM is not monitored after the delete 
action.


As a workaround to this erratic virsh failures, you can set a retry in the IM 
and VMM drivers in oned.conf, using the -r argument option [1]

IM_MAD = [
 name   = im_kvm,
 executable = one_im_ssh,
 arguments  = -r 3 -t 15 kvm ]

VM_MAD = [
 name   = vmm_kvm,
 executable = one_vmm_exec,
 arguments  = -t 15 -r 3 kvm,
 default= vmm_exec/vmm_exec_kvm.conf,
 type   = kvm ]

Regards

[1] http://opennebula.org/documentation:documentation:devel-vmm
--
Carlos Martín, MSc
Project Engineer
OpenNebula - The Open-source Solution for Data Center Virtualization
www.OpenNebula.orghttp://www.OpenNebula.org  | 
cmar...@opennebula.orgmailto:cmar...@opennebula.org  | 
@OpenNebulahttp://twitter.com/opennebulamailto:cmar...@opennebula.org



On Thu, Mar 22, 2012 at 6:48 PM, Danny 
Sternkopfdanny.sternk...@csc.fimailto:danny.sternk...@csc.fi  wrote:
Hi,

I do encounter (very rarely as it seems) problems where VMs are not properly 
deleted or shut off by onevm commands. I use ONE 3.0, hosts running Fedora15 
and KVM and libvirt.

1) onevm shutdown fails:
I can see in the VM log file that the shutdown operation timed out and the VM 
is still running. Unfortunately I don't see the reason why 'virsh shutdown' 
failed. There is nothing in the system or libvirt logs. It looks for me that 
virsh can't properly communicate to the libvirtd. That is still harmless. 
However ONE already released the VMs IP and assigned it to another VM which of 
course cause a clash. I wonder if this is intended to work like this? Obviously 
ONE knows that the VM is still running so it should keep the associated IP 
allocated.

2) onevm delete fails:
It is similar to 1). virsh destroy gives an error (ExitCode: 42), but the 
transfer manager is wiping the disks even though the VM is still running. (but 
might be not fully functional anymore.) I also wonder if this makes any sense? 
In this case neither the user nor the administrator realize that the VM is 
still running unless you check the physical host locally or you take a look at 
the VM's log file.

I was not able to find out why virsh failed and could not reproduce it. The 
hosts are healthy, but might have a strange problem at the very moment when 
users requested a VM shutdown or deletion.

For example I could manually run the same command ONE was executing later on 
and it worked. (/var/tmp/one/vmm/kvm/cancel one-20740 n020504 20740 n020504)

Any hints to libvirt issue?

Regards,

Danny
___
Users mailing list
Users@lists.opennebula.orgmailto:Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org