Re: [ovirt-users] Cannot run specific VM in one node
- Original Message - > From: "Diego Remolina" > To: "Omer Frenkel" > Cc: Users@ovirt.org > Sent: Tuesday, July 21, 2015 12:48:31 AM > Subject: Re: [ovirt-users] Cannot run specific VM in one node > > Well... never mind..., after leaving the VM running in the other host > for a few days, today I shut it down today, then attempted to start it > back in ysmha02 and it booted up just fine. I was gonna collect more > logs, but seems like the issue cleared itself so I guess no point in > looking at this issue anymore. > > Diego sorry for the slow response, i was out for couple of days.. im happy it is fixed for you, i did look at the logs anyway, i can see that: first vm was migrating for a long time and during this time it jumped between not-responding/paused/migrating-from statuses after 7 mins, there was an attempt to cancel the migration that failed with "Timed out during operation: cannot acquire state change lock, code = -32603" then multiple attempts to stop the vm that also fails ( in vdsm log i can see one is: "libvirtError: Failed to terminate process 12479 with SIGTERM: Device or resource busy" also an attempt to resume the vm (when it was paused for some time) also fails with "Timed out during operation: cannot acquire state change lock, code = -32603" then vdsm restarted and the vm was finally down, but somehow libvirt still thinks the vm is running on this host, because on start we get: libvirtError: Requested operation is not valid: domain 'ysmad02' is already active i assume that restarting vdsm+libvirt cleared this funky state so now it works i wonder about the error that started all this ("cannot acquire state change lock") not sure if related to any storage issue maybe? i cannot see this in the vdsm log because it starts after the issue. > > On Mon, Jul 20, 2015 at 7:58 AM, Diego Remolina wrote: > > Omer et all, > > > > I had uploaded the logs to: > > > > https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0 > > https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0 > > > > Do you have any recommendations for me or need me to provide more info? > > > > I will be able to re-run and experiment with this in the evening, so I > > can collect specific logs with times, etc if you have something in > > particular you want me to try. > > > > Diego > > > > On Thu, Jul 16, 2015 at 8:20 AM, Diego Remolina wrote: > >> These are the links to the files, if there is other better/preffered > >> way to post them, let me know: > >> > >> https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0 > >> https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0 > >> > >> A bit more of an explanation on the infrastructure: > >> > >> I have two virtualization/storage servers, ysmha01 and ysmha02 running > >> Ovirt hosted engine on top of glusterfs storage. I have two Windows > >> server vms called ysmad01 and ysmad02. The current problem is that > >> ysmad02 will *not* start on ysmha02 any more. > >> > >> > >> Timeline > >> > >> My problems started at around 8:30PM 7/15/2015 when migrating > >> everything to ysmha01 after having patched and rebooted the server. > >> > >> I got things back up at around 10:30PM after rebooting servers, etc. > >> The hosted engine running on ysmha02. I got ysmad01 running on > >> ysmha01, but ysmad02 just would not start at all on ysmha02. I did a > >> run once and set ysmad02 to start on ysmha01 and that works. > >> > >> When attempting to start or migrate ysmad02 on ysmha02, if I do a > >> virsh -r list on ysmha02, I just see the state as: "Shut off" and the > >> VM just does not run on that hypervisor. > >> > >> Diego > >> > >> > >> > >> On Thu, Jul 16, 2015 at 3:01 AM, Omer Frenkel wrote: > >>> > >>> > >>> - Original Message - > >>>> From: "Diego Remolina" > >>>> To: Users@ovirt.org > >>>> Sent: Thursday, July 16, 2015 7:45:43 AM > >>>> Subject: [ovirt-users] Cannot run specific VM in one node > >>>> > >>>> Hi, > >>>> > >>>> Was wondering if I can get some help with this particular situation. I > >>>> have two ovirt cluster nodes. I had a VM running in node2 and tried to > >>>> move it to node1. The move failed and the machine was created and > >>>> paused in both nodes. I tried stopping migration, shutti
Re: [ovirt-users] Cannot run specific VM in one node
Well... never mind..., after leaving the VM running in the other host for a few days, today I shut it down today, then attempted to start it back in ysmha02 and it booted up just fine. I was gonna collect more logs, but seems like the issue cleared itself so I guess no point in looking at this issue anymore. Diego On Mon, Jul 20, 2015 at 7:58 AM, Diego Remolina wrote: > Omer et all, > > I had uploaded the logs to: > > https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0 > https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0 > > Do you have any recommendations for me or need me to provide more info? > > I will be able to re-run and experiment with this in the evening, so I > can collect specific logs with times, etc if you have something in > particular you want me to try. > > Diego > > On Thu, Jul 16, 2015 at 8:20 AM, Diego Remolina wrote: >> These are the links to the files, if there is other better/preffered >> way to post them, let me know: >> >> https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0 >> https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0 >> >> A bit more of an explanation on the infrastructure: >> >> I have two virtualization/storage servers, ysmha01 and ysmha02 running >> Ovirt hosted engine on top of glusterfs storage. I have two Windows >> server vms called ysmad01 and ysmad02. The current problem is that >> ysmad02 will *not* start on ysmha02 any more. >> >> >> Timeline >> >> My problems started at around 8:30PM 7/15/2015 when migrating >> everything to ysmha01 after having patched and rebooted the server. >> >> I got things back up at around 10:30PM after rebooting servers, etc. >> The hosted engine running on ysmha02. I got ysmad01 running on >> ysmha01, but ysmad02 just would not start at all on ysmha02. I did a >> run once and set ysmad02 to start on ysmha01 and that works. >> >> When attempting to start or migrate ysmad02 on ysmha02, if I do a >> virsh -r list on ysmha02, I just see the state as: "Shut off" and the >> VM just does not run on that hypervisor. >> >> Diego >> >> >> >> On Thu, Jul 16, 2015 at 3:01 AM, Omer Frenkel wrote: >>> >>> >>> - Original Message - >>>> From: "Diego Remolina" >>>> To: Users@ovirt.org >>>> Sent: Thursday, July 16, 2015 7:45:43 AM >>>> Subject: [ovirt-users] Cannot run specific VM in one node >>>> >>>> Hi, >>>> >>>> Was wondering if I can get some help with this particular situation. I >>>> have two ovirt cluster nodes. I had a VM running in node2 and tried to >>>> move it to node1. The move failed and the machine was created and >>>> paused in both nodes. I tried stopping migration, shutting down the >>>> machine, etc but none of that worked. >>>> >>>> So I decided to simply look for the process number and I killed it for >>>> that VM. After that, I was not able to get the VM to run in any of the >>>> nodes, so I rebooted them both. >>>> >>>> At this point, the vm will *not* start in node2 at all. When I try to >>>> start it, it just sits there and if I do: >>>> >>>> virsh -r list >>>> >>>> from the command line, the output says the vm state is "shut off". >>>> >>>> I am able to user Run Once to fire up the VM in node 1, but I cannot >>>> migrate it to node2. >>>> >>>> How can I clear this problematic state for node 2? >>> >>> please attach engine + vdsm logs for the time of the failure >>> >>>> >>>> Thanks, >>>> >>>> Diego >>>> ___ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Cannot run specific VM in one node
Omer et all, I had uploaded the logs to: https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0 https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0 Do you have any recommendations for me or need me to provide more info? I will be able to re-run and experiment with this in the evening, so I can collect specific logs with times, etc if you have something in particular you want me to try. Diego On Thu, Jul 16, 2015 at 8:20 AM, Diego Remolina wrote: > These are the links to the files, if there is other better/preffered > way to post them, let me know: > > https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0 > https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0 > > A bit more of an explanation on the infrastructure: > > I have two virtualization/storage servers, ysmha01 and ysmha02 running > Ovirt hosted engine on top of glusterfs storage. I have two Windows > server vms called ysmad01 and ysmad02. The current problem is that > ysmad02 will *not* start on ysmha02 any more. > > > Timeline > > My problems started at around 8:30PM 7/15/2015 when migrating > everything to ysmha01 after having patched and rebooted the server. > > I got things back up at around 10:30PM after rebooting servers, etc. > The hosted engine running on ysmha02. I got ysmad01 running on > ysmha01, but ysmad02 just would not start at all on ysmha02. I did a > run once and set ysmad02 to start on ysmha01 and that works. > > When attempting to start or migrate ysmad02 on ysmha02, if I do a > virsh -r list on ysmha02, I just see the state as: "Shut off" and the > VM just does not run on that hypervisor. > > Diego > > > > On Thu, Jul 16, 2015 at 3:01 AM, Omer Frenkel wrote: >> >> >> - Original Message ----- >>> From: "Diego Remolina" >>> To: Users@ovirt.org >>> Sent: Thursday, July 16, 2015 7:45:43 AM >>> Subject: [ovirt-users] Cannot run specific VM in one node >>> >>> Hi, >>> >>> Was wondering if I can get some help with this particular situation. I >>> have two ovirt cluster nodes. I had a VM running in node2 and tried to >>> move it to node1. The move failed and the machine was created and >>> paused in both nodes. I tried stopping migration, shutting down the >>> machine, etc but none of that worked. >>> >>> So I decided to simply look for the process number and I killed it for >>> that VM. After that, I was not able to get the VM to run in any of the >>> nodes, so I rebooted them both. >>> >>> At this point, the vm will *not* start in node2 at all. When I try to >>> start it, it just sits there and if I do: >>> >>> virsh -r list >>> >>> from the command line, the output says the vm state is "shut off". >>> >>> I am able to user Run Once to fire up the VM in node 1, but I cannot >>> migrate it to node2. >>> >>> How can I clear this problematic state for node 2? >> >> please attach engine + vdsm logs for the time of the failure >> >>> >>> Thanks, >>> >>> Diego >>> ___ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Cannot run specific VM in one node
Hi Joop, There is currently no split brain in my gluster file systems. The virtualization setup is a two node hypervisor (ysmha01 and ysmha02), but I have a 3 node gluster where one node has no bricks (10.0.1.6->ysmha01, 10.0.1.7->ysmha02 and 10.0.1.5 no bricks), but helps define quorum, see below: [root@ysmha01 ~]# gluster volume status engine Status of volume: engine Gluster process PortOnline Pid -- Brick 10.0.1.6:/bricks/she/brick49152 Y 4620 NFS Server on localhost 2049Y 4637 Self-heal Daemon on localhost N/A Y 4648 NFS Server on 10.0.1.5 N/A N N/A Self-heal Daemon on 10.0.1.5N/A Y 14563 Task Status of Volume engine -- There are no active volume tasks [root@ysmha01 ~]# gluster volume heal engine info split-brain Gathering list of split brain entries on volume engine has been successful Brick 10.0.1.7:/bricks/she/brick Number of entries: 0 Brick 10.0.1.6:/bricks/she/brick Number of entries: 0 [root@ysmha01 ~]# gluster volume heal vmstorage info split-brain Gathering list of split brain entries on volume vmstorage has been successful Brick 10.0.1.7:/bricks/vmstorage/brick Number of entries: 0 Brick 10.0.1.6:/bricks/vmstorage/brick Number of entries: 0 [root@ysmha01 ~]# gluster volume heal export info split-brain Gathering list of split brain entries on volume export has been successful Brick 10.0.1.7:/bricks/hdds/brick Number of entries: 0 Brick 10.0.1.6:/bricks/hdds/brick Number of entries: 0 Diego On Thu, Jul 16, 2015 at 8:25 AM, Joop wrote: > On 16-7-2015 14:20, Diego Remolina wrote: >> I have two virtualization/storage servers, ysmha01 and ysmha02 running >> Ovirt hosted engine on top of glusterfs storage. I have two Windows >> server vms called ysmad01 and ysmad02. The current problem is that >> ysmad02 will *not* start on ysmha02 any more. >> > I might have missed it but did you check for a split-brain situation > since you're using a 2-node gluster? > > Regards, > > Joop > > > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Cannot run specific VM in one node
On 16-7-2015 14:20, Diego Remolina wrote: > I have two virtualization/storage servers, ysmha01 and ysmha02 running > Ovirt hosted engine on top of glusterfs storage. I have two Windows > server vms called ysmad01 and ysmad02. The current problem is that > ysmad02 will *not* start on ysmha02 any more. > I might have missed it but did you check for a split-brain situation since you're using a 2-node gluster? Regards, Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Cannot run specific VM in one node
These are the links to the files, if there is other better/preffered way to post them, let me know: https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0 https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0 A bit more of an explanation on the infrastructure: I have two virtualization/storage servers, ysmha01 and ysmha02 running Ovirt hosted engine on top of glusterfs storage. I have two Windows server vms called ysmad01 and ysmad02. The current problem is that ysmad02 will *not* start on ysmha02 any more. Timeline My problems started at around 8:30PM 7/15/2015 when migrating everything to ysmha01 after having patched and rebooted the server. I got things back up at around 10:30PM after rebooting servers, etc. The hosted engine running on ysmha02. I got ysmad01 running on ysmha01, but ysmad02 just would not start at all on ysmha02. I did a run once and set ysmad02 to start on ysmha01 and that works. When attempting to start or migrate ysmad02 on ysmha02, if I do a virsh -r list on ysmha02, I just see the state as: "Shut off" and the VM just does not run on that hypervisor. Diego On Thu, Jul 16, 2015 at 3:01 AM, Omer Frenkel wrote: > > > - Original Message - >> From: "Diego Remolina" >> To: Users@ovirt.org >> Sent: Thursday, July 16, 2015 7:45:43 AM >> Subject: [ovirt-users] Cannot run specific VM in one node >> >> Hi, >> >> Was wondering if I can get some help with this particular situation. I >> have two ovirt cluster nodes. I had a VM running in node2 and tried to >> move it to node1. The move failed and the machine was created and >> paused in both nodes. I tried stopping migration, shutting down the >> machine, etc but none of that worked. >> >> So I decided to simply look for the process number and I killed it for >> that VM. After that, I was not able to get the VM to run in any of the >> nodes, so I rebooted them both. >> >> At this point, the vm will *not* start in node2 at all. When I try to >> start it, it just sits there and if I do: >> >> virsh -r list >> >> from the command line, the output says the vm state is "shut off". >> >> I am able to user Run Once to fire up the VM in node 1, but I cannot >> migrate it to node2. >> >> How can I clear this problematic state for node 2? > > please attach engine + vdsm logs for the time of the failure > >> >> Thanks, >> >> Diego >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Cannot run specific VM in one node
- Original Message - > From: "Diego Remolina" > To: Users@ovirt.org > Sent: Thursday, July 16, 2015 7:45:43 AM > Subject: [ovirt-users] Cannot run specific VM in one node > > Hi, > > Was wondering if I can get some help with this particular situation. I > have two ovirt cluster nodes. I had a VM running in node2 and tried to > move it to node1. The move failed and the machine was created and > paused in both nodes. I tried stopping migration, shutting down the > machine, etc but none of that worked. > > So I decided to simply look for the process number and I killed it for > that VM. After that, I was not able to get the VM to run in any of the > nodes, so I rebooted them both. > > At this point, the vm will *not* start in node2 at all. When I try to > start it, it just sits there and if I do: > > virsh -r list > > from the command line, the output says the vm state is "shut off". > > I am able to user Run Once to fire up the VM in node 1, but I cannot > migrate it to node2. > > How can I clear this problematic state for node 2? please attach engine + vdsm logs for the time of the failure > > Thanks, > > Diego > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Cannot run specific VM in one node
Hi, Was wondering if I can get some help with this particular situation. I have two ovirt cluster nodes. I had a VM running in node2 and tried to move it to node1. The move failed and the machine was created and paused in both nodes. I tried stopping migration, shutting down the machine, etc but none of that worked. So I decided to simply look for the process number and I killed it for that VM. After that, I was not able to get the VM to run in any of the nodes, so I rebooted them both. At this point, the vm will *not* start in node2 at all. When I try to start it, it just sits there and if I do: virsh -r list from the command line, the output says the vm state is "shut off". I am able to user Run Once to fire up the VM in node 1, but I cannot migrate it to node2. How can I clear this problematic state for node 2? Thanks, Diego ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users