Hi Ross, This issue may come from the sVirt model. Let's try disabling it for now.
-- To disable sVirt, and revert to the basic level of AppArmor protection (host protection only), the /etc/libvirt/qemu.conf file can be used to change the setting to security_driver="none". -- Regards, -Tino -- Constantino Vázquez Blanco | dsa-research.org/tinova Virtualization Technology Engineer / Researcher OpenNebula Toolkit | opennebula.org On Tue, Jul 27, 2010 at 5:46 PM, Ross Nordeen <[email protected]> wrote: > here is the out put from: > ~$ sudo /etc/init.d/apparmor status > > libvirt-cd735fe4-b5d9-f550-7576-bbac95b44d86 (enforce) > /usr/sbin/tcpdump (enforce) > /usr/sbin/libvirtd (enforce) > /usr/lib/libvirt/virt-aa-helper (enforce) > /usr/lib/connman/scripts/dhclient-script (enforce) > /usr/lib/NetworkManager/nm-dhcp-client.action (enforce) > /sbin/dhclient3 (enforce) > > > for one-35 (a vm that has been suspended and resumed): > $ virsh --connect qemu:///system dominfo one-35 > Id: 2 > Name: one-35 > UUID: 3450f5d0-e0c7-a118-7259-0664c02df8fc > OS Type: hvm > State: paused > CPU(s): 1 > CPU time: 1899.5s > Max memory: 524288 kB > Used memory: 524288 kB > Autostart: disable > Security model: apparmor > Security DOI: 0 > error: internal error Failed to get security label > > > yes for one of my running nodes i get: > $virsh --connect qemu:///system dominfo one-37 > Id: 5 > Name: one-37 > UUID: cd735fe4-b5d9-f550-7576-bbac95b44d86 > OS Type: hvm > State: running > CPU(s): 1 > CPU time: 0.7s > Max memory: 524288 kB > Used memory: 524288 kB > Autostart: disable > Security model: apparmor > Security DOI: 0 > Security label: libvirt-cd735fe4-b5d9-f550-7576-bbac95b44d86 (enforcing) > > > -Ross > > ----- Original Message ----- > From: "Tino Vazquez" <[email protected]> > To: "Ross Nordeen" <[email protected]> > Cc: "Jaime Melis" <[email protected]>, [email protected] > Sent: Tuesday, July 27, 2010 8:49:20 AM GMT -07:00 US/Canada Mountain > Subject: Re: [one-users] migration not working completely > > Dear Ross, > > This look like an issue with libvirt. What happens if you manually issue an > > $ virsh --connect qemu:///system dominfo one-35 > > in cn2? > > Regards, > > -Tino > > -- > Constantino Vázquez Blanco | dsa-research.org/tinova > Virtualization Technology Engineer / Researcher > OpenNebula Toolkit | opennebula.org > > > > On Tue, Jul 27, 2010 at 4:29 PM, Ross Nordeen <[email protected]> wrote: >> >> >> I added the lines to the end of the >> /etc/apparmor.d/abstractions/libvirt-qemu file and now the migration and >> suspension work! but now i get these errors in the oned.long file, >> "internal error Failed to get security label" >> >> Tue Jul 27 08:17:01 2010 [DiM][D]: Suspending VM 35 >> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachinePoolInfo method invoked >> Tue Jul 27 08:17:01 2010 [ReM][D]: HostPoolInfo method invoked >> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked >> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked >> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked >> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked >> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 Command >> execution fail: 'touch /srv/cloud/one/var//35/images/checkpoint;virsh >> --connect qemu:///system save one-35 >> /srv/cloud/one/var//35/images/checkpoint' >> >> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 STDERR follows. >> >> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 Warning: >> Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts. >> >> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 error: Failed >> to save domain one-35 to /srv/cloud/one/var//35/images/checkpoint >> >> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 error: >> operation failed: failed to create '/srv/cloud/one/var//35/images/checkpoint' >> >> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1 >> >> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: SAVE FAILURE 35 - >> >> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 Command >> execution fail: virsh --connect qemu:///system dominfo one-35 >> >> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 STDERR follows. >> >> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 Warning: >> Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts. >> >> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 error: >> internal error Failed to get security label >> >> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1 >> >> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: POLL FAILURE 35 - >> >> Tue Jul 27 08:17:04 2010 [VMM][I]: Monitoring VM 35. >> Tue Jul 27 08:17:04 2010 [VMM][I]: Monitoring VM 36. >> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: POLL SUCCESS 36 >> STATE=a USEDMEMORY=524288 >> >> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 Command >> execution fail: virsh --connect qemu:///system dominfo one-35 >> >> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 STDERR follows. >> >> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 Warning: >> Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts. >> >> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 error: >> internal error Failed to get security label >> >> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1 >> >> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: POLL FAILURE 35 - >> >> >> -- >> Ross Nordeen >> Computer Networking And Systems Administration >> Michigan Technological University >> http://www.linkedin.com/in/rjnordee >> >> ----- Original Message ----- >> From: "Ross Nordeen" <[email protected]> >> To: "Jaime Melis" <[email protected]> >> Cc: [email protected] >> Sent: Monday, July 26, 2010 9:54:59 AM GMT -07:00 US/Canada Mountain >> Subject: Re: [one-users] migration not working completely >> >> Tino, >> >> I am using ubuntu 10.04. >> >> Jaime, >> >> I will try that and let you know if it worked as soon as we can get our air >> conditioner fixed here. >> >> -- >> Ross Nordeen >> Computer Networking And Systems Administration >> Michigan Technological University >> http://www.linkedin.com/in/rjnordee >> >> ----- Original Message ----- >> From: "Jaime Melis" <[email protected]> >> To: "Tino Vazquez" <[email protected]> >> Cc: "Ross Nordeen" <[email protected]>, [email protected] >> Sent: Monday, July 26, 2010 9:45:15 AM GMT -07:00 US/Canada Mountain >> Subject: Re: [one-users] migration not working completly >> >> Hi Ross, >> >> >> actually in my experience disabling apparmor won't work either. You will >> have to modify one of its configuration files in order to make it work. >> >> Add this: >> -------8<-------- >> /srv/cloud/one/var/** rw, >> ------->8-------- >> (If you have a different VMDIR change the above line accordingly). >> To the end of /etc/apparmor.d/abstractions/libvirt-qemu >> And restart the apparmor service. >> >> >> Regards, >> Jaime >> >> >> >> >> >> >> >> >> On Mon, Jul 26, 2010 at 5:30 PM, Tino Vazquez < [email protected] > wrote: >> >> >> Hi Ross, >> >> Are you using Ubuntu per chance? It may be a issue with the apparmor >> service, try disabling it to see if that is the one to blame. In case >> it is, we can provide rules to disable this apparmor behavior. >> >> Regards, >> >> >> -Tino >> >> -- >> Constantino Vázquez Blanco | dsa-research.org/tinova >> Virtualization Technology Engineer / Researcher >> OpenNebula Toolkit | opennebula.org >> >> >> >> >> >> >> On Mon, Jul 26, 2010 at 5:13 PM, Ross Nordeen < [email protected] > wrote: >>> Tino, >>> >>> I figured out my live migrate problem which turned out to be a bad default >>> gw. As far as the migration and check pointing though I have the >>> /srv/cloud/one directory shared out to all nodes via nfs and full >>> permissions for oneadmin... I think it is /srv/cloud/one/var/18. I will >>> check the VM_DIR variable in the oned.conf file though and see if it is >>> right. Still if everything else is working it seems like the VM_DIR is >>> exported correctly and functioning for the running vm's. >>> >>> -Ross >>> >>> ----- Original Message ----- >>> From: "Tino Vazquez" < [email protected] > >>> To: "Ross Nordeen" < [email protected] > >>> Cc: [email protected] >>> Sent: Monday, July 26, 2010 8:41:37 AM GMT -07:00 US/Canada Mountain >>> Subject: Re: [one-users] migration not working completly >>> >>> Hi Ross, >>> >>> There seems to be two issues here: >>> >>> 1) Not live/migrate between cn2 and cn1 --> could it be that the >>> oneadmin user cannot passwordlessly ssh from cn2 to cn1, but it can >>> from cn1 to cn2? >>> >>> 2) The save problem seems to come from the impossibility to save the >>> checkpoint file. This may be due to the fact that /srv/cloud/one >>> directory doesn't exist in the remote nodes, in which case you will >>> need to use the VM_DIR variable in the oned.conf file. >>> >>> Hope it helps, >>> >>> -Tino >>> >>> -- >>> Constantino Vázquez Blanco | dsa-research.org/tinova >>> Virtualization Technology Engineer / Researcher >>> OpenNebula Toolkit | opennebula.org >>> >>> >>> >>> On Thu, Jul 22, 2010 at 11:39 PM, Ross Nordeen < [email protected] > wrote: >>>> I have open nebula deployed with one head node and 2 compute nodes, I have >>>> no problems live migrating from cn1 to cn2 but I get failures live/cold >>>> migrating from cn2 to cn1. is there any reason I would not able to a) not >>>> save the state of any of my machines and why live-migration works one way >>>> but not the other?? Thanks >>>> >>>> -Ross >>>> >>>> >>>> here is my vm.log file after a live-migration, migration, and than suspend: >>>> >>>> >>>> Thu Jul 22 11:40:22 2010 [LCM][I]: New VM state is MIGRATE >>>> Thu Jul 22 11:40:22 2010 [VMM][I]: Command execution fail: virsh --connect >>>> qemu:///system migrate --live one-18 qemu+ssh://cn1/session >>>> Thu Jul 22 11:40:22 2010 [VMM][I]: STDERR follows. >>>> Thu Jul 22 11:40:22 2010 [VMM][I]: Warning: Permanently added >>>> 'cn2,192.168.1.105' (RSA) to the list of known hosts. >>>> Thu Jul 22 11:40:22 2010 [VMM][I]: error: cannot recv data: Connection >>>> reset by peer >>>> Thu Jul 22 11:40:22 2010 [VMM][I]: ExitCode: 1 >>>> Thu Jul 22 11:40:22 2010 [VMM][E]: Error live-migrating VM, - >>>> Thu Jul 22 11:40:23 2010 [LCM][I]: Fail to life migrate VM. Assuming that >>>> the VM is still RUNNING (will poll VM). >>>> Thu Jul 22 11:40:23 2010 [VMM][D]: Monitor Information: >>>> . >>>> . >>>> . >>>> . >>>> . >>>> Thu Jul 22 15:09:04 2010 [LCM][I]: New VM state is MIGRATE >>>> Thu Jul 22 15:09:04 2010 [VMM][I]: Command execution fail: virsh --connect >>>> qemu:///system migrate --live one-18 qemu+ssh://cn1/session >>>> Thu Jul 22 15:09:04 2010 [VMM][I]: STDERR follows. >>>> Thu Jul 22 15:09:04 2010 [VMM][I]: Warning: Permanently added >>>> 'cn2,192.168.1.105' (RSA) to the list of known hosts. >>>> Thu Jul 22 15:09:04 2010 [VMM][I]: error: cannot recv data: Connection >>>> reset by peer >>>> Thu Jul 22 15:09:04 2010 [VMM][I]: ExitCode: 1 >>>> Thu Jul 22 15:09:04 2010 [VMM][E]: Error live-migrating VM, - >>>> Thu Jul 22 15:09:05 2010 [LCM][I]: Fail to life migrate VM. Assuming that >>>> the VM is still RUNNING (will poll VM). >>>> Thu Jul 22 15:09:05 2010 [VMM][D]: Monitor Information: >>>> . >>>> . >>>> . >>>> . >>>> . >>>> Thu Jul 22 15:11:25 2010 [LCM][I]: New VM state is SAVE_MIGRATE >>>> Thu Jul 22 15:11:25 2010 [VMM][I]: Command execution fail: 'touch >>>> /srv/cloud/one/var//18/images/checkpoint;virsh --connect qemu:///system >>>> save one-18 /srv/cloud/one/var//18/images/checkpoint' >>>> Thu Jul 22 15:11:25 2010 [VMM][I]: STDERR follows. >>>> Thu Jul 22 15:11:25 2010 [VMM][I]: Warning: Permanently added >>>> 'cn2,192.168.1.105' (RSA) to the list of known hosts. >>>> Thu Jul 22 15:11:25 2010 [VMM][I]: error: Failed to save domain one-18 to >>>> /srv/cloud/one/var//18/images/checkpoint >>>> Thu Jul 22 15:11:25 2010 [VMM][I]: error: operation failed: failed to >>>> create '/srv/cloud/one/var//18/images/checkpoint' >>>> Thu Jul 22 15:11:25 2010 [VMM][I]: ExitCode: 1 >>>> Thu Jul 22 15:11:25 2010 [VMM][E]: Error saving VM state, - >>>> Thu Jul 22 15:11:25 2010 [LCM][I]: Fail to save VM state while migrating. >>>> Assuming that the VM is still RUNNING (will poll VM). >>>> Thu Jul 22 15:11:26 2010 [VMM][I]: VM running but new state from monitor >>>> is PAUSED. >>>> Thu Jul 22 15:11:26 2010 [LCM][I]: VM is suspended. >>>> Thu Jul 22 15:11:26 2010 [DiM][I]: New VM state is SUSPENDED >>>> Thu Jul 22 15:13:20 2010 [DiM][I]: New VM state is ACTIVE. >>>> Thu Jul 22 15:13:20 2010 [LCM][I]: Restoring VM >>>> Thu Jul 22 15:13:20 2010 [LCM][I]: New state is BOOT >>>> Thu Jul 22 15:13:21 2010 [VMM][I]: Command execution fail: virsh --connect >>>> qemu:///system restore /srv/cloud/one/var//18/images/checkpoint >>>> Thu Jul 22 15:13:21 2010 [VMM][I]: STDERR follows. >>>> Thu Jul 22 15:13:21 2010 [VMM][I]: Warning: Permanently added >>>> 'cn2,192.168.1.105' (RSA) to the list of known hosts. >>>> Thu Jul 22 15:13:21 2010 [VMM][I]: error: Failed to restore domain from >>>> /srv/cloud/one/var//18/images/checkpoint >>>> Thu Jul 22 15:13:21 2010 [VMM][I]: error: operation failed: cannot read >>>> domain image >>>> Thu Jul 22 15:13:21 2010 [VMM][I]: ExitCode: 1 >>>> Thu Jul 22 15:13:21 2010 [VMM][E]: Error restoring VM, - >>>> Thu Jul 22 15:13:21 2010 [DiM][I]: New VM state is FAILED >>>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: LOG - 18 tm_delete.sh: Deleting >>>> /srv/cloud/one/var//18/images >>>> >>>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: LOG - 18 tm_delete.sh: Executed >>>> "rm -rf /srv/cloud/one/var//18/images". >>>> >>>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: TRANSFER SUCCESS 18 - >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] >>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org >>>> >>> >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org >> >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org >> > _______________________________________________ Users mailing list [email protected] http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
