Just a wild guess... The 'one-294' argument for the poll script is taken from VM/DEPLOYMENT_ID. Maybe a bug caused the core to lose that string?
Can you please check that attribute in the onevm show -x output? If is looks good, edit /var/tmp/one/vmm/kvm/poll and write the arguments somewhere, just to double check. Cheers -- Carlos Martín, MSc Project Engineer OpenNebula - The Open-source Solution for Data Center Virtualization www.OpenNebula.org | [email protected] | @OpenNebula<http://twitter.com/opennebula><[email protected]> On Thu, Apr 4, 2013 at 11:42 PM, Ruben S. Montero <[email protected]>wrote: > I've been thinking about this and I can't see anypoint where this > information is cached. It is executed and send right away to the core to > process it. In fact you should see the same line "... STATE=a" in the logs. > > Cheers > > > On Wed, Apr 3, 2013 at 4:18 PM, Duverne, Cyrille < > [email protected]> wrote: > >> Ok ok, that's indeed fun : >> >> ruby -wd /var/tmp/one/vmm/kvm/poll one-294 >> STATE=a NETTX=19039830 USEDCPU=0.1 USEDMEMORY=1121828 NETRX=416126660 >> >> Seems that the polling is correctly working. >> Possible that the state is still on cache or in the DB and not updated or >> something ? >> >> Cheers >> Cyrille >> >> >> >> At Wednesday, 03/04/2013 on 15:15 Ruben S. Montero wrote: >> >> Could you execute the vmm probe in the host >> >> /var/tmp/one/vmm/kvm/poll one-294 >> >> and check for errors, or try to debug the script... (maybe running it >> with ruby -wd) >> >> Ruben >> >> >> On Wed, Apr 3, 2013 at 10:42 AM, Duverne, Cyrille < >> [email protected]> wrote: >> >>> Hello, >>> >>> Indeed, state is still "d" , as you can see here : >>> >>> >>> 1. Wed Apr 3 10:34:13 2013 [VMM][I]: Monitoring VM 294. >>> 2. Wed Apr 3 10:34:13 2013 [VMM][D]: Message received: LOG I 294 >>> ExitCode: 0 >>> 3. Wed Apr 3 10:34:13 2013 [VMM][D]: Message received: POLL SUCCESS >>> 294 STATE=d >>> >>> 4. >>> >>> Any thought ? >>> By consciousness, I verified that all users etc... were still correct on >>> all machines, the oneadmin is able to ssh directly etc... >>> >>> Thanks in advance >>> Cyrille >>> >>> >>> >>> At Tuesday, 02/04/2013 on 22:31 Ruben S. Montero wrote: >>> >>> So the VMs are now running, and correctly reported by libvirt, but >>> OpenNebula does not move them from UNKNOWN to RUNNING?, Are the messages >>> still reporting STATE=d for these VMs in oned.log? >>> >>> Ruben >>> >>> >>> On Tue, Apr 2, 2013 at 3:57 PM, Duverne, Cyrille < >>> [email protected]> wrote: >>> >>>> Hello, >>>> >>>> Anything new on this ? >>>> >>>> Seems really weird to me... >>>> >>>> Thanks in advance >>>> Cyrille >>>> >>>> >>>> >>>> >>>> At Friday, 29/03/2013 on 10:06 Duverne, Cyrille wrote: >>>> >>>> Hello Ruben ! >>>> >>>> Thanks for this feedback. >>>> >>>> I tried to restart libvirt, which succeeded (WOW ! [image: :p]) >>>> >>>> >>>> But the VMs are still stuck on Unknown state. >>>> >>>> the 'virsh list' shows correctly the domains, which are running : >>>> >>>> virsh list >>>> Id Name State >>>> ---------------------------------- >>>> 1 one-294 running >>>> 2 one-304 running >>>> >>>> Any other thought ? I'm a bit confused by this behaviour and the >>>> workflow to monitor the VMs, it could be interesting to have a 'refresh >>>> monitoring' button or whatever on Sunstone to try to get fresh monitoring >>>> information. >>>> >>>> Thanks in advance >>>> Cyrille >>>> >>>> "Always do right. This will gratify some people and astonish the rest." >>>> Mark Twain >>>> >>>> >>>> >>>> At Thursday, 28/03/2013 on 0:56 Ruben S. Montero wrote: >>>> >>>> Ok >>>> >>>> So this is strange... >>>> >>>> On one hand you try to restart the VM and virsh says it is already >>>> defined (vm.log: main 'one-294' already exists) . And on the other hand >>>> when you monitor the VM virsh list does not show it (oned.log: POLL SUCCESS >>>> 294 STATE=d) >>>> >>>> Is the domain really defined at the host (virsh list)? Can this be a >>>> libvirt issue, any chance to restart libvirt and try again? >>>> >>>> >>>> Cheers >>>> >>>> Ruben >>>> >>>> >>>> >>>> On Tue, Mar 26, 2013 at 10:37 PM, Duverne, Cyrille < >>>> [email protected]> wrote: >>>> >>>>> Hello Ruben, >>>>> >>>>> Indeed this happens for some of them, but for some others they are >>>>> still in UNKNOWs state. >>>>> Here is an extract of the VM log : >>>>> >>>>> "Thu Mar 21 11:55:56 2013 [LCM][I]: New VM state is SAVE_SUSPEND >>>>> >>>>> Thu Mar 21 11:57:49 2013 [VMM][I]: ExitCode: 0 >>>>> Thu Mar 21 11:57:49 2013 [VMM][I]: Successfully execute virtualization >>>>> driver operation: save. >>>>> Thu Mar 21 11:57:50 2013 [VMM][I]: ExitCode: 0 >>>>> Thu Mar 21 11:57:50 2013 [VMM][I]: Successfully execute network driver >>>>> operation: clean. >>>>> Thu Mar 21 11:57:50 2013 [DiM][I]: New VM state is SUSPENDED >>>>> Tue Mar 26 17:27:48 2013 [DiM][I]: New VM state is ACTIVE. >>>>> Tue Mar 26 17:27:48 2013 [LCM][I]: Restoring VM >>>>> Tue Mar 26 17:27:48 2013 [LCM][I]: New state is BOOT_SUSPENDED >>>>> Tue Mar 26 17:27:49 2013 [VMM][I]: ExitCode: 0 >>>>> Tue Mar 26 17:27:49 2013 [VMM][I]: Successfully execute network driver >>>>> operation: pre. >>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: ExitCode: 0 >>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: Successfully execute virtualization >>>>> driver operation: restore. >>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: ExitCode: 0 >>>>> Tue Mar 26 17:28:37 2013 [VMM][I]: Successfully execute network driver >>>>> operation: post. >>>>> Tue Mar 26 17:28:38 2013 [LCM][I]: New VM state is RUNNING >>>>> Tue Mar 26 17:28:38 2013 [VMM][I]: ExitCode: 0 >>>>> Tue Mar 26 17:28:39 2013 [VMM][I]: VM running but it was not found. >>>>> Restart and delete actions available or try to recover it manually >>>>> Tue Mar 26 17:28:39 2013 [LCM][I]: New VM state is UNKNOWN >>>>> Tue Mar 26 17:36:48 2013 [LCM][I]: New VM state is BOOT_UNKNOWN >>>>> Tue Mar 26 17:36:48 2013 [VMM][I]: Generating deployment file: >>>>> /var/lib/one/294/deployment.1 >>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: ExitCode: 0 >>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Successfully execute network driver >>>>> operation: pre. >>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Command execution fail: cat << EOT | >>>>> /var/tmp/one/vmm/kvm/deploy /var/lib/one/datastores/0/294/deployment.1 >>>>> whitefall.local 294 whitefall.local >>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: error: Failed to create domain from >>>>> /var/lib/one/datastores/0/294/deployment.1 >>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: error: operation failed: domain >>>>> 'one-294' already exists with uuid >>>>> 326bc42b-1f8a-8984-e610-4c35f0bdd56fTue Mar 26 17:36:52 2013 [VMM][E]: >>>>> Could not create domain from /var/lib/one/datastores/0/294/deployment.1 >>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: ExitCode: 255 >>>>> Tue Mar 26 17:36:52 2013 [VMM][I]: Failed to execute virtualization >>>>> driver operation: deploy.Tue Mar 26 17:36:52 2013 [VMM][E]: Error >>>>> deploying virtual machine: Could not create domain from >>>>> /var/lib/one/datastores/0/294/deployment.1 >>>>> Tue Mar 26 17:36:52 2013 [LCM][I]: Fail to boot VM. New VM state is >>>>> UNKNOWN >>>>> Tue Mar 26 17:37:21 2013 [LCM][I]: New VM state is BOOT_UNKNOWN >>>>> Tue Mar 26 17:37:21 2013 [VMM][I]: Generating deployment file: >>>>> /var/lib/one/294/deployment.1 >>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: ExitCode: 0 >>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Successfully execute network driver >>>>> operation: pre. >>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Command execution fail: cat << EOT | >>>>> /var/tmp/one/vmm/kvm/deploy /var/lib/one/datastores/0/294/deployment.1 >>>>> whitefall.local 294 whitefall.local >>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: error: Failed to create domain from >>>>> /var/lib/one/datastores/0/294/deployment.1 >>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: error: operation failed: domain >>>>> 'one-294' already exists with uuid >>>>> 326bc42b-1f8a-8984-e610-4c35f0bdd56fTue Mar 26 17:37:22 2013 [VMM][E]: >>>>> Could not create domain from /var/lib/one/datastores/0/294/deployment.1 >>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: ExitCode: 255 >>>>> Tue Mar 26 17:37:22 2013 [VMM][I]: Failed to execute virtualization >>>>> driver operation: deploy.Tue Mar 26 17:37:22 2013 [VMM][E]: Error >>>>> deploying virtual machine: Could not create domain from >>>>> /var/lib/one/datastores/0/294/deployment.1 >>>>> Tue Mar 26 17:37:23 2013 [LCM][I]: Fail to boot VM. New VM state is >>>>> UNKNOWN >>>>> Tue Mar 26 17:38:39 2013 [VMM][I]: ExitCode: 0 >>>>> Tue Mar 26 17:38:41 2013 [VMM][I]: VM running but it was not found. >>>>> Restart and delete actions available or try to recover it manually >>>>> Tue Mar 26 17:48:45 2013 [VMM][I]: ExitCode: 0 >>>>> Tue Mar 26 17:48:45 2013 [VMM][I]: VM running but it was not found. >>>>> Restart and delete actions available or try to recover it manually >>>>> Tue Mar 26 17:58:45 2013 [VMM][I]: ExitCode: 0 >>>>> Tue Mar 26 17:58:45 2013 [VMM][I]: VM running but it was not found. >>>>> Restart and delete actions available or try to recover it manually >>>>> >>>>> Tue Mar 26 18:08:45 2013 [VMM][I]: ExitCode: 0" >>>>> >>>>> The RESTART didn't do anything. >>>>> >>>>> Here is the oned.log's extract for the same VM : >>>>> >>>>> "Tue Mar 26 22:18:45 2013 [VMM][I]: Monitoring VM 294. >>>>> Tue Mar 26 22:18:45 2013 [VMM][D]: Message received: LOG I 294 >>>>> ExitCode: 0 >>>>> Tue Mar 26 22:18:45 2013 [VMM][D]: Message received: POLL SUCCESS 294 >>>>> STATE=d" >>>>> >>>>> The VMs that are in UNKNOWN state are located on 2 different hosts. >>>>> All hosts are configurated in the same way. >>>>> >>>>> Thanks in advance >>>>> Cyrille >>>>> >>>>> >>>>> At Tuesday, 26/03/2013 on 18:53 Ruben S. Montero wrote: >>>>> >>>>> They should appear after a while, when the VM is monitored... Look for >>>>> messages Monitoring VM... in oned.log. >>>>> >>>>> Cheers >>>>> >>>>> Ruben >>>>> >>>>> >>>>> On Tue, Mar 26, 2013 at 5:39 PM, Duverne, Cyrille < >>>>> [email protected]> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I just finished the reboot of our lab after electric shutdown, >>>>>> everything went fine. >>>>>> >>>>>> But some of the VMs are stuck in UNKNOWN state after resuming them. >>>>>> I tried to restart them, but they are actually running on the >>>>>> Hypervisors, it's just that sunstone is displaying UNKNOWN. >>>>>> >>>>>> Any thought to solve this ? >>>>>> >>>>>> Thanks in advance >>>>>> Cyrille >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> [email protected] >>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ruben S. Montero, PhD >>>>> Project co-Lead and Chief Architect >>>>> OpenNebula - The Open Source Solution for Data Center Virtualization >>>>> www.OpenNebula.org | [email protected] | @OpenNebula >>>>> >>>>> >>>> >>>> >>>> -- >>>> Ruben S. Montero, PhD >>>> Project co-Lead and Chief Architect >>>> OpenNebula - The Open Source Solution for Data Center Virtualization >>>> www.OpenNebula.org | [email protected] | @OpenNebula >>>> >>>> >>> >>> >>> -- >>> Ruben S. Montero, PhD >>> Project co-Lead and Chief Architect >>> OpenNebula - The Open Source Solution for Data Center Virtualization >>> www.OpenNebula.org | [email protected] | @OpenNebula >>> >>> >> >> >> -- >> Ruben S. Montero, PhD >> Project co-Lead and Chief Architect >> OpenNebula - The Open Source Solution for Data Center Virtualization >> www.OpenNebula.org | [email protected] | @OpenNebula >> >> > > > -- > Ruben S. Montero, PhD > Project co-Lead and Chief Architect > OpenNebula - The Open Source Solution for Data Center Virtualization > www.OpenNebula.org | [email protected] | @OpenNebula > > _______________________________________________ > Users mailing list > [email protected] > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org > >
_______________________________________________ Users mailing list [email protected] http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
