Hi,

the shutdown issue might be related to the double Vnet IP allocation bug:
http://dev.opennebula.org/issues/1178

As you said, the VM doesn't reach the DONE state and therefore the IP doesn't get released. The user has probably assumed it is gone and requested the IP again.

Thanks for pointing to the retry argument for IM and VMM drivers! I'll give it a try.

Regards,

Danny

On 2012-03-28 12:44, Carlos Martín Sánchez wrote:
Hi Danny,


On Thu, Mar 22, 2012 at 6:48 PM, Danny 
Sternkopf<[email protected]<mailto:[email protected]>>  wrote:
1) onevm shutdown fails:
[...]
However ONE already released the VMs IP and assigned it to another VM which of 
course cause a clash. I wonder if this is intended to work like this? Obviously 
ONE knows that the VM is still running so it should keep the associated IP 
allocated.

The network leases and the disk images are releases once the VM reaches the 
DONE state only. If the shutdown timed out and the VM returned to RUNNING, this 
should not happen. Are you sure the OpenNebula VM is in running state? or did I 
misunderstand you?

2) onevm delete fails:
It is similar to 1). virsh destroy gives an error (ExitCode: 42), but the 
transfer manager is wiping the disks even though the VM is still running. (but 
might be not fully functional anymore.) I also wonder if this makes any sense? 
In this case neither the user nor the administrator realize that the VM is 
still running unless you check the physical host locally or you take a look at 
the VM's log file.

Yes, in this case OpenNebula assumes that the destroy action always succeeds. 
Unlike the graceful shutdown action, the VM is not monitored after the delete 
action.


As a workaround to this erratic virsh failures, you can set a retry in the IM 
and VMM drivers in oned.conf, using the -r argument option [1]

IM_MAD = [
     name       = "im_kvm",
     executable = "one_im_ssh",
     arguments  = "-r 3 -t 15 kvm" ]

VM_MAD = [
     name       = "vmm_kvm",
     executable = "one_vmm_exec",
     arguments  = "-t 15 -r 3 kvm",
     default    = "vmm_exec/vmm_exec_kvm.conf",
     type       = "kvm" ]

Regards

[1] http://opennebula.org/documentation:documentation:devel-vmm
--
Carlos Martín, MSc
Project Engineer
OpenNebula - The Open-source Solution for Data Center Virtualization
www.OpenNebula.org<http://www.OpenNebula.org>  | 
[email protected]<mailto:[email protected]>  | 
@OpenNebula<http://twitter.com/opennebula><mailto:[email protected]>



On Thu, Mar 22, 2012 at 6:48 PM, Danny 
Sternkopf<[email protected]<mailto:[email protected]>>  wrote:
Hi,

I do encounter (very rarely as it seems) problems where VMs are not properly 
deleted or shut off by onevm commands. I use ONE 3.0, hosts running Fedora15 
and KVM and libvirt.

1) onevm shutdown fails:
I can see in the VM log file that the shutdown operation timed out and the VM 
is still running. Unfortunately I don't see the reason why 'virsh shutdown' 
failed. There is nothing in the system or libvirt logs. It looks for me that 
virsh can't properly communicate to the libvirtd. That is still harmless. 
However ONE already released the VMs IP and assigned it to another VM which of 
course cause a clash. I wonder if this is intended to work like this? Obviously 
ONE knows that the VM is still running so it should keep the associated IP 
allocated.

2) onevm delete fails:
It is similar to 1). virsh destroy gives an error (ExitCode: 42), but the 
transfer manager is wiping the disks even though the VM is still running. (but 
might be not fully functional anymore.) I also wonder if this makes any sense? 
In this case neither the user nor the administrator realize that the VM is 
still running unless you check the physical host locally or you take a look at 
the VM's log file.

I was not able to find out why virsh failed and could not reproduce it. The 
hosts are healthy, but might have a strange problem at the very moment when 
users requested a VM shutdown or deletion.

For example I could manually run the same command ONE was executing later on 
and it worked. (/var/tmp/one/vmm/kvm/cancel one-20740 n020504 20740 n020504)

Any hints to libvirt issue?

Regards,

Danny
_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

_______________________________________________
Users mailing list
[email protected]
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to