Re: [ovirt-users] Cannot run specific VM in one node

2015-07-21 Thread Omer Frenkel


- Original Message -
> From: "Diego Remolina" 
> To: "Omer Frenkel" 
> Cc: Users@ovirt.org
> Sent: Tuesday, July 21, 2015 12:48:31 AM
> Subject: Re: [ovirt-users] Cannot run specific VM in one node
> 
> Well... never mind..., after leaving the VM running in the other host
> for a few days, today I shut it down today, then attempted to start it
> back in ysmha02 and it booted up just fine. I was gonna collect more
> logs, but seems like the issue cleared itself so I guess no point in
> looking at this issue anymore.
> 
> Diego

sorry for the slow response, i was out for couple of days..

im happy it is fixed for you, i did look at the logs anyway, i can see that:
first vm was migrating for a long time and during this time it jumped between 
not-responding/paused/migrating-from  statuses
after 7 mins, there was an attempt to cancel the migration that failed with 
"Timed out during operation: cannot acquire state change lock, code = -32603"
then multiple attempts to stop the vm that also fails ( in vdsm log i can see 
one is: 
"libvirtError: Failed to terminate process 12479 with SIGTERM: Device or 
resource busy"
also an attempt to resume the vm (when it was paused for some time) also fails 
with 
"Timed out during operation: cannot acquire state change lock, code = -32603"

then vdsm restarted and the vm was finally down, but somehow libvirt still 
thinks the vm is running on this host,
because on start we get:
libvirtError: Requested operation is not valid: domain 'ysmad02' is already 
active

i assume that restarting vdsm+libvirt cleared this funky state so now it works
i wonder about the error that started all this ("cannot acquire state change 
lock") not sure if related to any storage issue maybe?
i cannot see this in the vdsm log because it starts after the issue.


> 
> On Mon, Jul 20, 2015 at 7:58 AM, Diego Remolina  wrote:
> > Omer et all,
> >
> > I had uploaded the logs to:
> >
> > https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0
> > https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0
> >
> > Do you have any recommendations for me or need me to provide more info?
> >
> > I will be able to re-run and experiment with this in the evening, so I
> > can collect specific logs with times, etc if you have something in
> > particular you want me to try.
> >
> > Diego
> >
> > On Thu, Jul 16, 2015 at 8:20 AM, Diego Remolina  wrote:
> >> These are the links to the files, if there is other better/preffered
> >> way to post them, let me know:
> >>
> >> https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0
> >> https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0
> >>
> >> A bit more of an explanation on the infrastructure:
> >>
> >> I have two virtualization/storage servers, ysmha01 and ysmha02 running
> >> Ovirt hosted engine on top of glusterfs storage. I have two Windows
> >> server vms called ysmad01 and ysmad02. The current problem is that
> >> ysmad02 will *not* start on ysmha02 any more.
> >>
> >>
> >> Timeline
> >>
> >> My problems started at around 8:30PM 7/15/2015 when migrating
> >> everything to ysmha01 after having patched and rebooted the server.
> >>
> >> I got things back up at around 10:30PM after rebooting servers, etc.
> >> The hosted engine running on ysmha02. I got ysmad01 running on
> >> ysmha01, but ysmad02 just would not start at all on ysmha02. I did a
> >> run once and set ysmad02 to start on ysmha01 and that works.
> >>
> >> When attempting to start or migrate ysmad02 on ysmha02, if I do a
> >> virsh -r list on ysmha02, I just see the state as: "Shut off" and the
> >> VM just does not run on that hypervisor.
> >>
> >> Diego
> >>
> >>
> >>
> >> On Thu, Jul 16, 2015 at 3:01 AM, Omer Frenkel  wrote:
> >>>
> >>>
> >>> - Original Message -
> >>>> From: "Diego Remolina" 
> >>>> To: Users@ovirt.org
> >>>> Sent: Thursday, July 16, 2015 7:45:43 AM
> >>>> Subject: [ovirt-users] Cannot run specific VM in one node
> >>>>
> >>>> Hi,
> >>>>
> >>>> Was wondering if I can get some help with this particular situation. I
> >>>> have two ovirt cluster nodes. I had a VM running in node2 and tried to
> >>>> move it to node1. The move failed and the machine was created and
> >>>> paused in both nodes. I tried stopping migration, shutti

Re: [ovirt-users] Cannot run specific VM in one node

2015-07-20 Thread Diego Remolina
Well... never mind..., after leaving the VM running in the other host
for a few days, today I shut it down today, then attempted to start it
back in ysmha02 and it booted up just fine. I was gonna collect more
logs, but seems like the issue cleared itself so I guess no point in
looking at this issue anymore.

Diego

On Mon, Jul 20, 2015 at 7:58 AM, Diego Remolina  wrote:
> Omer et all,
>
> I had uploaded the logs to:
>
> https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0
> https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0
>
> Do you have any recommendations for me or need me to provide more info?
>
> I will be able to re-run and experiment with this in the evening, so I
> can collect specific logs with times, etc if you have something in
> particular you want me to try.
>
> Diego
>
> On Thu, Jul 16, 2015 at 8:20 AM, Diego Remolina  wrote:
>> These are the links to the files, if there is other better/preffered
>> way to post them, let me know:
>>
>> https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0
>> https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0
>>
>> A bit more of an explanation on the infrastructure:
>>
>> I have two virtualization/storage servers, ysmha01 and ysmha02 running
>> Ovirt hosted engine on top of glusterfs storage. I have two Windows
>> server vms called ysmad01 and ysmad02. The current problem is that
>> ysmad02 will *not* start on ysmha02 any more.
>>
>>
>> Timeline
>>
>> My problems started at around 8:30PM 7/15/2015 when migrating
>> everything to ysmha01 after having patched and rebooted the server.
>>
>> I got things back up at around 10:30PM after rebooting servers, etc.
>> The hosted engine running on ysmha02. I got ysmad01 running on
>> ysmha01, but ysmad02 just would not start at all on ysmha02. I did a
>> run once and set ysmad02 to start on ysmha01 and that works.
>>
>> When attempting to start or migrate ysmad02 on ysmha02, if I do a
>> virsh -r list on ysmha02, I just see the state as: "Shut off" and the
>> VM just does not run on that hypervisor.
>>
>> Diego
>>
>>
>>
>> On Thu, Jul 16, 2015 at 3:01 AM, Omer Frenkel  wrote:
>>>
>>>
>>> - Original Message -
>>>> From: "Diego Remolina" 
>>>> To: Users@ovirt.org
>>>> Sent: Thursday, July 16, 2015 7:45:43 AM
>>>> Subject: [ovirt-users] Cannot run specific VM in one node
>>>>
>>>> Hi,
>>>>
>>>> Was wondering if I can get some help with this particular situation. I
>>>> have two ovirt cluster nodes. I had a VM running in node2 and tried to
>>>> move it to node1. The move failed and the machine was created and
>>>> paused in both nodes. I tried stopping migration, shutting down the
>>>> machine, etc but none of that worked.
>>>>
>>>> So I decided to simply look for the process number and I killed it for
>>>> that VM. After that, I was not able to get the VM to run in any of the
>>>> nodes, so I rebooted them both.
>>>>
>>>> At this point, the vm will *not* start in node2 at all. When I try to
>>>> start it, it just sits there and if I do:
>>>>
>>>> virsh -r list
>>>>
>>>> from the command line, the output says the vm state is "shut off".
>>>>
>>>> I am able to user Run Once to fire up the VM in node 1, but I cannot
>>>> migrate it to node2.
>>>>
>>>> How can I clear this problematic state for node 2?
>>>
>>> please attach engine + vdsm logs for the time of the failure
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Diego
>>>> ___
>>>> Users mailing list
>>>> Users@ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Cannot run specific VM in one node

2015-07-20 Thread Diego Remolina
Omer et all,

I had uploaded the logs to:

https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0
https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0

Do you have any recommendations for me or need me to provide more info?

I will be able to re-run and experiment with this in the evening, so I
can collect specific logs with times, etc if you have something in
particular you want me to try.

Diego

On Thu, Jul 16, 2015 at 8:20 AM, Diego Remolina  wrote:
> These are the links to the files, if there is other better/preffered
> way to post them, let me know:
>
> https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0
> https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0
>
> A bit more of an explanation on the infrastructure:
>
> I have two virtualization/storage servers, ysmha01 and ysmha02 running
> Ovirt hosted engine on top of glusterfs storage. I have two Windows
> server vms called ysmad01 and ysmad02. The current problem is that
> ysmad02 will *not* start on ysmha02 any more.
>
>
> Timeline
>
> My problems started at around 8:30PM 7/15/2015 when migrating
> everything to ysmha01 after having patched and rebooted the server.
>
> I got things back up at around 10:30PM after rebooting servers, etc.
> The hosted engine running on ysmha02. I got ysmad01 running on
> ysmha01, but ysmad02 just would not start at all on ysmha02. I did a
> run once and set ysmad02 to start on ysmha01 and that works.
>
> When attempting to start or migrate ysmad02 on ysmha02, if I do a
> virsh -r list on ysmha02, I just see the state as: "Shut off" and the
> VM just does not run on that hypervisor.
>
> Diego
>
>
>
> On Thu, Jul 16, 2015 at 3:01 AM, Omer Frenkel  wrote:
>>
>>
>> - Original Message -----
>>> From: "Diego Remolina" 
>>> To: Users@ovirt.org
>>> Sent: Thursday, July 16, 2015 7:45:43 AM
>>> Subject: [ovirt-users] Cannot run specific VM in one node
>>>
>>> Hi,
>>>
>>> Was wondering if I can get some help with this particular situation. I
>>> have two ovirt cluster nodes. I had a VM running in node2 and tried to
>>> move it to node1. The move failed and the machine was created and
>>> paused in both nodes. I tried stopping migration, shutting down the
>>> machine, etc but none of that worked.
>>>
>>> So I decided to simply look for the process number and I killed it for
>>> that VM. After that, I was not able to get the VM to run in any of the
>>> nodes, so I rebooted them both.
>>>
>>> At this point, the vm will *not* start in node2 at all. When I try to
>>> start it, it just sits there and if I do:
>>>
>>> virsh -r list
>>>
>>> from the command line, the output says the vm state is "shut off".
>>>
>>> I am able to user Run Once to fire up the VM in node 1, but I cannot
>>> migrate it to node2.
>>>
>>> How can I clear this problematic state for node 2?
>>
>> please attach engine + vdsm logs for the time of the failure
>>
>>>
>>> Thanks,
>>>
>>> Diego
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Cannot run specific VM in one node

2015-07-16 Thread Diego Remolina
Hi Joop,

There is currently no split brain in my gluster file systems. The
virtualization setup is a two node hypervisor (ysmha01 and ysmha02),
but I have a 3 node gluster where one node has no bricks
(10.0.1.6->ysmha01, 10.0.1.7->ysmha02 and 10.0.1.5 no bricks), but
helps define quorum, see below:

[root@ysmha01 ~]# gluster volume status engine
Status of volume: engine
Gluster process PortOnline  Pid
--
Brick 10.0.1.6:/bricks/she/brick49152   Y   4620
NFS Server on localhost 2049Y   4637
Self-heal Daemon on localhost   N/A Y   4648
NFS Server on 10.0.1.5  N/A N   N/A
Self-heal Daemon on 10.0.1.5N/A Y   14563

Task Status of Volume engine
--
There are no active volume tasks

[root@ysmha01 ~]# gluster volume heal engine info split-brain
Gathering list of split brain entries on volume engine has been successful

Brick 10.0.1.7:/bricks/she/brick
Number of entries: 0

Brick 10.0.1.6:/bricks/she/brick
Number of entries: 0
[root@ysmha01 ~]# gluster volume heal vmstorage info split-brain
Gathering list of split brain entries on volume vmstorage has been successful

Brick 10.0.1.7:/bricks/vmstorage/brick
Number of entries: 0

Brick 10.0.1.6:/bricks/vmstorage/brick
Number of entries: 0
[root@ysmha01 ~]# gluster volume heal export info split-brain
Gathering list of split brain entries on volume export has been successful

Brick 10.0.1.7:/bricks/hdds/brick
Number of entries: 0

Brick 10.0.1.6:/bricks/hdds/brick
Number of entries: 0

Diego

On Thu, Jul 16, 2015 at 8:25 AM, Joop  wrote:
> On 16-7-2015 14:20, Diego Remolina wrote:
>> I have two virtualization/storage servers, ysmha01 and ysmha02 running
>> Ovirt hosted engine on top of glusterfs storage. I have two Windows
>> server vms called ysmad01 and ysmad02. The current problem is that
>> ysmad02 will *not* start on ysmha02 any more.
>>
> I might have missed it but did you check for a split-brain situation
> since you're using a 2-node gluster?
>
> Regards,
>
> Joop
>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Cannot run specific VM in one node

2015-07-16 Thread Joop
On 16-7-2015 14:20, Diego Remolina wrote:
> I have two virtualization/storage servers, ysmha01 and ysmha02 running
> Ovirt hosted engine on top of glusterfs storage. I have two Windows
> server vms called ysmad01 and ysmad02. The current problem is that
> ysmad02 will *not* start on ysmha02 any more.
>
I might have missed it but did you check for a split-brain situation
since you're using a 2-node gluster?

Regards,

Joop



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Cannot run specific VM in one node

2015-07-16 Thread Diego Remolina
These are the links to the files, if there is other better/preffered
way to post them, let me know:

https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0
https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0

A bit more of an explanation on the infrastructure:

I have two virtualization/storage servers, ysmha01 and ysmha02 running
Ovirt hosted engine on top of glusterfs storage. I have two Windows
server vms called ysmad01 and ysmad02. The current problem is that
ysmad02 will *not* start on ysmha02 any more.


Timeline

My problems started at around 8:30PM 7/15/2015 when migrating
everything to ysmha01 after having patched and rebooted the server.

I got things back up at around 10:30PM after rebooting servers, etc.
The hosted engine running on ysmha02. I got ysmad01 running on
ysmha01, but ysmad02 just would not start at all on ysmha02. I did a
run once and set ysmad02 to start on ysmha01 and that works.

When attempting to start or migrate ysmad02 on ysmha02, if I do a
virsh -r list on ysmha02, I just see the state as: "Shut off" and the
VM just does not run on that hypervisor.

Diego



On Thu, Jul 16, 2015 at 3:01 AM, Omer Frenkel  wrote:
>
>
> - Original Message -
>> From: "Diego Remolina" 
>> To: Users@ovirt.org
>> Sent: Thursday, July 16, 2015 7:45:43 AM
>> Subject: [ovirt-users] Cannot run specific VM in one node
>>
>> Hi,
>>
>> Was wondering if I can get some help with this particular situation. I
>> have two ovirt cluster nodes. I had a VM running in node2 and tried to
>> move it to node1. The move failed and the machine was created and
>> paused in both nodes. I tried stopping migration, shutting down the
>> machine, etc but none of that worked.
>>
>> So I decided to simply look for the process number and I killed it for
>> that VM. After that, I was not able to get the VM to run in any of the
>> nodes, so I rebooted them both.
>>
>> At this point, the vm will *not* start in node2 at all. When I try to
>> start it, it just sits there and if I do:
>>
>> virsh -r list
>>
>> from the command line, the output says the vm state is "shut off".
>>
>> I am able to user Run Once to fire up the VM in node 1, but I cannot
>> migrate it to node2.
>>
>> How can I clear this problematic state for node 2?
>
> please attach engine + vdsm logs for the time of the failure
>
>>
>> Thanks,
>>
>> Diego
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Cannot run specific VM in one node

2015-07-16 Thread Omer Frenkel


- Original Message -
> From: "Diego Remolina" 
> To: Users@ovirt.org
> Sent: Thursday, July 16, 2015 7:45:43 AM
> Subject: [ovirt-users] Cannot run specific VM in one node
> 
> Hi,
> 
> Was wondering if I can get some help with this particular situation. I
> have two ovirt cluster nodes. I had a VM running in node2 and tried to
> move it to node1. The move failed and the machine was created and
> paused in both nodes. I tried stopping migration, shutting down the
> machine, etc but none of that worked.
> 
> So I decided to simply look for the process number and I killed it for
> that VM. After that, I was not able to get the VM to run in any of the
> nodes, so I rebooted them both.
> 
> At this point, the vm will *not* start in node2 at all. When I try to
> start it, it just sits there and if I do:
> 
> virsh -r list
> 
> from the command line, the output says the vm state is "shut off".
> 
> I am able to user Run Once to fire up the VM in node 1, but I cannot
> migrate it to node2.
> 
> How can I clear this problematic state for node 2?

please attach engine + vdsm logs for the time of the failure

> 
> Thanks,
> 
> Diego
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Cannot run specific VM in one node

2015-07-15 Thread Diego Remolina
Hi,

Was wondering if I can get some help with this particular situation. I
have two ovirt cluster nodes. I had a VM running in node2 and tried to
move it to node1. The move failed and the machine was created and
paused in both nodes. I tried stopping migration, shutting down the
machine, etc but none of that worked.

So I decided to simply look for the process number and I killed it for
that VM. After that, I was not able to get the VM to run in any of the
nodes, so I rebooted them both.

At this point, the vm will *not* start in node2 at all. When I try to
start it, it just sits there and if I do:

virsh -r list

from the command line, the output says the vm state is "shut off".

I am able to user Run Once to fire up the VM in node 1, but I cannot
migrate it to node2.

How can I clear this problematic state for node 2?

Thanks,

Diego
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users