Re: [ovirt-users] non-operational host issues following 4.2 upgrade

2017-12-22 Thread Jason Brooks
I was able to get my hosts active. During the upgrade, by master data
domain's metadata was corrupted -- I had duplicates of some of the
dom_md files, and my metadata file was corrupt. Vdsm was looking at
that metadata file and throwing up its hands. I added a new data
domain but it couldn't take over as master because my old data domain
was messed up. I ended up creating a new metadata file in that domain,
and my hosts came up. I might be nice to have some way of resetting
corrupt metadata or at least of making the error clearer.

I did have a gluster hiccup during the upgrade -- the upgrade brought
my gluster version from 3.8 to 3.12, and the other peers in the
cluster refused connections from my first upgraded host. I upgraded
all the others, and got them all talking to each other again, but it
may have been during that time that my master data domain metadata
became corrupted. I haven't noticed any issues w/ my vms yet, and all
through the migration travail, I was able to keep 5 important VMs
running. They kept chugging away, even though their host and
surrounding hosts were unhealthy.

Anyway, I'm back ;)

Jason

On Thu, Dec 21, 2017 at 9:42 AM, Jason Brooks  wrote:
> On Wed, Dec 20, 2017 at 11:47 PM, Sandro Bonazzola  
> wrote:
>>
>>
>> 2017-12-21 4:26 GMT+01:00 Jason Brooks :
>>>
>>> Hi all, I upgraded my 4 host converged gluster/ovirt lab setup to 4.2
>>> yesterday, and now 3 of my hosts won't connect to my main data domain,
>>> so they're non-operational when I try to activate them.
>>>
>>> Here's what seems like a relevant passage of vdsm.log:
>>> https://paste.fedoraproject.org/paste/JZuxul6-HZjjl8uHzgqL-w
>>
>>
>>
>> Adding some relevant developers.
>> Jason, do you mind opening a bug on
>> https://bugzilla.redhat.com/enter_bug.cgi?product=vdsm to track this?
>
> I filed an issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1528391
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] non-operational host issues following 4.2 upgrade

2017-12-21 Thread Jason Brooks
On Wed, Dec 20, 2017 at 11:47 PM, Sandro Bonazzola  wrote:
>
>
> 2017-12-21 4:26 GMT+01:00 Jason Brooks :
>>
>> Hi all, I upgraded my 4 host converged gluster/ovirt lab setup to 4.2
>> yesterday, and now 3 of my hosts won't connect to my main data domain,
>> so they're non-operational when I try to activate them.
>>
>> Here's what seems like a relevant passage of vdsm.log:
>> https://paste.fedoraproject.org/paste/JZuxul6-HZjjl8uHzgqL-w
>
>
>
> Adding some relevant developers.
> Jason, do you mind opening a bug on
> https://bugzilla.redhat.com/enter_bug.cgi?product=vdsm to track this?

I filed an issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1528391
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] non-operational host issues following 4.2 upgrade

2017-12-20 Thread Sandro Bonazzola
2017-12-21 4:26 GMT+01:00 Jason Brooks :

> Hi all, I upgraded my 4 host converged gluster/ovirt lab setup to 4.2
> yesterday, and now 3 of my hosts won't connect to my main data domain,
> so they're non-operational when I try to activate them.
>
> Here's what seems like a relevant passage of vdsm.log:
> https://paste.fedoraproject.org/paste/JZuxul6-HZjjl8uHzgqL-w



Adding some relevant developers.
Jason, do you mind opening a bug on
https://bugzilla.redhat.com/enter_bug.cgi?product=vdsm to track this?


>
>
> The hosts can mount the gluster storage just fine, I can mount to a
> test location on the hosts, and I can see that the hosts are mounting
> the storage in the usual place when they attempt to activate.
> Permissions look normal, too.
>
> I undeployed the hosted engine from the three problem machines, in
> case that was causing an issue.
>
> The hosts are running centos 7.
>
> Does any of this ring a bell for anyone?
>
> Thanks, Jason
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



-- 

SANDRO BONAZZOLA

ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D

Red Hat EMEA 

TRIED. TESTED. TRUSTED. 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] non-operational host issues following 4.2 upgrade

2017-12-20 Thread Jason Brooks
Hi all, I upgraded my 4 host converged gluster/ovirt lab setup to 4.2
yesterday, and now 3 of my hosts won't connect to my main data domain,
so they're non-operational when I try to activate them.

Here's what seems like a relevant passage of vdsm.log:
https://paste.fedoraproject.org/paste/JZuxul6-HZjjl8uHzgqL-w

The hosts can mount the gluster storage just fine, I can mount to a
test location on the hosts, and I can see that the hosts are mounting
the storage in the usual place when they attempt to activate.
Permissions look normal, too.

I undeployed the hosted engine from the three problem machines, in
case that was causing an issue.

The hosts are running centos 7.

Does any of this ring a bell for anyone?

Thanks, Jason
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users