Re: [ovirt-users] Fwd: Having issues with Hosted Engine

2016-04-29 Thread Luiz Claudio Prazeres Goncalves
Got it. It should be included until 3.6.6 GA

Thanks
Luiz

Em sex, 29 de abr de 2016 04:26, Simone Tiraboschi 
escreveu:

> On Fri, Apr 29, 2016 at 4:44 AM, Luiz Claudio Prazeres Goncalves
>  wrote:
> > Hi Simone, I was reviewing the changelog of 3.6.6, on the link below,
> but i
> > was not able to find the bug (https://bugzilla.redhat.com/1327516) as
> fixed
> > on the list. According to Bugzilla the target is really 3.6.6, so what's
> > wrong?
> >
> >
> > http://www.ovirt.org/release/3.6.6/
>
> ' oVirt 3.6.6 first release candidate' so it's still not the GA.
>
> > Thanks
> > Luiz
> >
> > Em qui, 28 de abr de 2016 11:33, Luiz Claudio Prazeres Goncalves
> >  escreveu:
> >>
> >> Nice!... so, I'll survive a bit more with these issues until the version
> >> 3.6.6 gets released...
> >>
> >>
> >> Thanks
> >> -Luiz
> >>
> >> 2016-04-28 4:50 GMT-03:00 Simone Tiraboschi :
> >>>
> >>> On Thu, Apr 28, 2016 at 8:32 AM, Sahina Bose 
> wrote:
> >>> > This seems like issue reported in
> >>> > https://bugzilla.redhat.com/show_bug.cgi?id=1327121
> >>> >
> >>> > Nir, Simone?
> >>>
> >>> The issue is here:
> >>> MainThread::INFO::2016-04-27
> >>>
> >>>
> 03:26:27,185::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)
> >>> Disconnecting storage server
> >>> MainThread::INFO::2016-04-27
> >>>
> >>>
> 03:26:27,816::upgrade::983::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(fix_storage_path)
> >>> Fixing storage path in conf file
> >>>
> >>> And it's tracked here: https://bugzilla.redhat.com/1327516
> >>>
> >>> We already have a patch, it will be fixed with 3.6.6
> >>>
> >>> As far as I saw this issue will only cause a lot of mess in the logs
> >>> and some false alert but it's basically harmless
> >>>
> >>> > On 04/28/2016 05:35 AM, Luiz Claudio Prazeres Goncalves wrote:
> >>> >
> >>> >
> >>> > Hi everyone,
> >>> >
> >>> > Until today my environment was fully updated (3.6.5+centos7.2) with 3
> >>> > nodes
> >>> > (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster nodes
> >>> > (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which the
> >>> > engine
> >>> > storage domain is sitting on top (3.7.11 fully updated+centos7.2)
> >>> >
> >>> > For some weird reason i've been receiving emails from oVirt with
> >>> > EngineUnexpectedDown (attached picture) on a daily basis more or
> less,
> >>> > but
> >>> > the engine seems to be working fine and my vm's are up and running
> >>> > normally.
> >>> > I've never had any issue to access the User Interface to manage the
> >>> > vm's
> >>> >
> >>> > Today I run "yum update" on the nodes and realised that vdsm was
> >>> > outdated,
> >>> > so I updated the kvm hosts and they are now , again, fully updated.
> >>> >
> >>> >
> >>> > Reviewing the logs It seems to be an intermittent connectivity issue
> >>> > when
> >>> > trying to access the gluster engine storage domain as you can see
> >>> > below. I
> >>> > don't have any network issue in place and I'm 100% sure about it. I
> >>> > have
> >>> > another oVirt Cluster using the same network and using a engine
> storage
> >>> > domain on top of an iSCSI Storage Array with no issues.
> >>> >
> >>> > Here seems to be the issue:
> >>> >
> >>> > Thread-::INFO::2016-04-27
> >>> > 23:01:27,864::fileSD::357::Storage.StorageDomain::(validate)
> >>> > sdUUID=03926733-1872-4f85-bb21-18dc320560db
> >>> >
> >>> > Thread-::DEBUG::2016-04-27
> >>> > 23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh)
> >>> > read
> >>> > lines (FileMetadataRW)=[]
> >>> >
> >>> > Thread-::DEBUG::2016-04-27
> >>> > 23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh)
> >>> > Empty
> >>> > metadata
> >>> >
> >>> > Thread-::ERROR::2016-04-27
> >>> > 23:01:27,865::task::866::Storage.TaskManager.Task::(_setError)
> >>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error
> >>> >
> >>> > Traceback (most recent call last):
> >>> >
> >>> >   File "/usr/share/vdsm/storage/task.py", line 873, in _run
> >>> >
> >>> > return fn(*args, **kargs)
> >>> >
> >>> >   File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
> >>> >
> >>> > res = f(*args, **kwargs)
> >>> >
> >>> >   File "/usr/share/vdsm/storage/hsm.py", line 2835, in
> >>> > getStorageDomainInfo
> >>> >
> >>> > dom = self.validateSdUUID(sdUUID)
> >>> >
> >>> >   File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
> >>> >
> >>> > sdDom.validate()
> >>> >
> >>> >   File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
> >>> >
> >>> > raise se.StorageDomainAccessError(self.sdUUID)
> >>> >
> >>> > StorageDomainAccessError: Domain is either partially accessible or
> >>> > entirely
> >>> > inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)
> >>> >
> >>> > Thread-::DEBUG::2016-04-27
> >>> > 

Re: [ovirt-users] Fwd: Having issues with Hosted Engine

2016-04-29 Thread Simone Tiraboschi
On Fri, Apr 29, 2016 at 4:44 AM, Luiz Claudio Prazeres Goncalves
 wrote:
> Hi Simone, I was reviewing the changelog of 3.6.6, on the link below, but i
> was not able to find the bug (https://bugzilla.redhat.com/1327516) as fixed
> on the list. According to Bugzilla the target is really 3.6.6, so what's
> wrong?
>
>
> http://www.ovirt.org/release/3.6.6/

' oVirt 3.6.6 first release candidate' so it's still not the GA.

> Thanks
> Luiz
>
> Em qui, 28 de abr de 2016 11:33, Luiz Claudio Prazeres Goncalves
>  escreveu:
>>
>> Nice!... so, I'll survive a bit more with these issues until the version
>> 3.6.6 gets released...
>>
>>
>> Thanks
>> -Luiz
>>
>> 2016-04-28 4:50 GMT-03:00 Simone Tiraboschi :
>>>
>>> On Thu, Apr 28, 2016 at 8:32 AM, Sahina Bose  wrote:
>>> > This seems like issue reported in
>>> > https://bugzilla.redhat.com/show_bug.cgi?id=1327121
>>> >
>>> > Nir, Simone?
>>>
>>> The issue is here:
>>> MainThread::INFO::2016-04-27
>>>
>>> 03:26:27,185::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)
>>> Disconnecting storage server
>>> MainThread::INFO::2016-04-27
>>>
>>> 03:26:27,816::upgrade::983::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(fix_storage_path)
>>> Fixing storage path in conf file
>>>
>>> And it's tracked here: https://bugzilla.redhat.com/1327516
>>>
>>> We already have a patch, it will be fixed with 3.6.6
>>>
>>> As far as I saw this issue will only cause a lot of mess in the logs
>>> and some false alert but it's basically harmless
>>>
>>> > On 04/28/2016 05:35 AM, Luiz Claudio Prazeres Goncalves wrote:
>>> >
>>> >
>>> > Hi everyone,
>>> >
>>> > Until today my environment was fully updated (3.6.5+centos7.2) with 3
>>> > nodes
>>> > (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster nodes
>>> > (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which the
>>> > engine
>>> > storage domain is sitting on top (3.7.11 fully updated+centos7.2)
>>> >
>>> > For some weird reason i've been receiving emails from oVirt with
>>> > EngineUnexpectedDown (attached picture) on a daily basis more or less,
>>> > but
>>> > the engine seems to be working fine and my vm's are up and running
>>> > normally.
>>> > I've never had any issue to access the User Interface to manage the
>>> > vm's
>>> >
>>> > Today I run "yum update" on the nodes and realised that vdsm was
>>> > outdated,
>>> > so I updated the kvm hosts and they are now , again, fully updated.
>>> >
>>> >
>>> > Reviewing the logs It seems to be an intermittent connectivity issue
>>> > when
>>> > trying to access the gluster engine storage domain as you can see
>>> > below. I
>>> > don't have any network issue in place and I'm 100% sure about it. I
>>> > have
>>> > another oVirt Cluster using the same network and using a engine storage
>>> > domain on top of an iSCSI Storage Array with no issues.
>>> >
>>> > Here seems to be the issue:
>>> >
>>> > Thread-::INFO::2016-04-27
>>> > 23:01:27,864::fileSD::357::Storage.StorageDomain::(validate)
>>> > sdUUID=03926733-1872-4f85-bb21-18dc320560db
>>> >
>>> > Thread-::DEBUG::2016-04-27
>>> > 23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh)
>>> > read
>>> > lines (FileMetadataRW)=[]
>>> >
>>> > Thread-::DEBUG::2016-04-27
>>> > 23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh)
>>> > Empty
>>> > metadata
>>> >
>>> > Thread-::ERROR::2016-04-27
>>> > 23:01:27,865::task::866::Storage.TaskManager.Task::(_setError)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error
>>> >
>>> > Traceback (most recent call last):
>>> >
>>> >   File "/usr/share/vdsm/storage/task.py", line 873, in _run
>>> >
>>> > return fn(*args, **kargs)
>>> >
>>> >   File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
>>> >
>>> > res = f(*args, **kwargs)
>>> >
>>> >   File "/usr/share/vdsm/storage/hsm.py", line 2835, in
>>> > getStorageDomainInfo
>>> >
>>> > dom = self.validateSdUUID(sdUUID)
>>> >
>>> >   File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
>>> >
>>> > sdDom.validate()
>>> >
>>> >   File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
>>> >
>>> > raise se.StorageDomainAccessError(self.sdUUID)
>>> >
>>> > StorageDomainAccessError: Domain is either partially accessible or
>>> > entirely
>>> > inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)
>>> >
>>> > Thread-::DEBUG::2016-04-27
>>> > 23:01:27,865::task::885::Storage.TaskManager.Task::(_run)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Task._run:
>>> > d2acf575-1a60-4fa0-a5bb-cd4363636b94
>>> > ('03926733-1872-4f85-bb21-18dc320560db',) {} failed - stopping task
>>> >
>>> > Thread-::DEBUG::2016-04-27
>>> > 23:01:27,865::task::1246::Storage.TaskManager.Task::(stop)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::stopping in state
>>> > preparing
>>> > (force False)
>>> >
>>> > 

Re: [ovirt-users] Fwd: Having issues with Hosted Engine

2016-04-28 Thread Luiz Claudio Prazeres Goncalves
Hi Simone, I was reviewing the changelog of 3.6.6, on the link below, but i
was not able to find the bug (https://bugzilla.redhat.com/1327516) as fixed
on the list. According to Bugzilla the target is really 3.6.6, so what's
wrong?


http://www.ovirt.org/release/3.6.6/


Thanks
Luiz

Em qui, 28 de abr de 2016 11:33, Luiz Claudio Prazeres Goncalves <
luiz...@gmail.com> escreveu:

> Nice!... so, I'll survive a bit more with these issues until the version
> 3.6.6 gets released...
>
>
> Thanks
> -Luiz
>
> 2016-04-28 4:50 GMT-03:00 Simone Tiraboschi :
>
>> On Thu, Apr 28, 2016 at 8:32 AM, Sahina Bose  wrote:
>> > This seems like issue reported in
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1327121
>> >
>> > Nir, Simone?
>>
>> The issue is here:
>> MainThread::INFO::2016-04-27
>>
>> 03:26:27,185::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)
>> Disconnecting storage server
>> MainThread::INFO::2016-04-27
>>
>> 03:26:27,816::upgrade::983::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(fix_storage_path)
>> Fixing storage path in conf file
>>
>> And it's tracked here: https://bugzilla.redhat.com/1327516
>>
>> We already have a patch, it will be fixed with 3.6.6
>>
>> As far as I saw this issue will only cause a lot of mess in the logs
>> and some false alert but it's basically harmless
>>
>> > On 04/28/2016 05:35 AM, Luiz Claudio Prazeres Goncalves wrote:
>> >
>> >
>> > Hi everyone,
>> >
>> > Until today my environment was fully updated (3.6.5+centos7.2) with 3
>> nodes
>> > (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster nodes
>> > (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which the
>> engine
>> > storage domain is sitting on top (3.7.11 fully updated+centos7.2)
>> >
>> > For some weird reason i've been receiving emails from oVirt with
>> > EngineUnexpectedDown (attached picture) on a daily basis more or less,
>> but
>> > the engine seems to be working fine and my vm's are up and running
>> normally.
>> > I've never had any issue to access the User Interface to manage the vm's
>> >
>> > Today I run "yum update" on the nodes and realised that vdsm was
>> outdated,
>> > so I updated the kvm hosts and they are now , again, fully updated.
>> >
>> >
>> > Reviewing the logs It seems to be an intermittent connectivity issue
>> when
>> > trying to access the gluster engine storage domain as you can see
>> below. I
>> > don't have any network issue in place and I'm 100% sure about it. I have
>> > another oVirt Cluster using the same network and using a engine storage
>> > domain on top of an iSCSI Storage Array with no issues.
>> >
>> > Here seems to be the issue:
>> >
>> > Thread-::INFO::2016-04-27
>> > 23:01:27,864::fileSD::357::Storage.StorageDomain::(validate)
>> > sdUUID=03926733-1872-4f85-bb21-18dc320560db
>> >
>> > Thread-::DEBUG::2016-04-27
>> > 23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh)
>> read
>> > lines (FileMetadataRW)=[]
>> >
>> > Thread-::DEBUG::2016-04-27
>> > 23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh)
>> Empty
>> > metadata
>> >
>> > Thread-::ERROR::2016-04-27
>> > 23:01:27,865::task::866::Storage.TaskManager.Task::(_setError)
>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error
>> >
>> > Traceback (most recent call last):
>> >
>> >   File "/usr/share/vdsm/storage/task.py", line 873, in _run
>> >
>> > return fn(*args, **kargs)
>> >
>> >   File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
>> >
>> > res = f(*args, **kwargs)
>> >
>> >   File "/usr/share/vdsm/storage/hsm.py", line 2835, in
>> getStorageDomainInfo
>> >
>> > dom = self.validateSdUUID(sdUUID)
>> >
>> >   File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
>> >
>> > sdDom.validate()
>> >
>> >   File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
>> >
>> > raise se.StorageDomainAccessError(self.sdUUID)
>> >
>> > StorageDomainAccessError: Domain is either partially accessible or
>> entirely
>> > inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)
>> >
>> > Thread-::DEBUG::2016-04-27
>> > 23:01:27,865::task::885::Storage.TaskManager.Task::(_run)
>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Task._run:
>> > d2acf575-1a60-4fa0-a5bb-cd4363636b94
>> > ('03926733-1872-4f85-bb21-18dc320560db',) {} failed - stopping task
>> >
>> > Thread-::DEBUG::2016-04-27
>> > 23:01:27,865::task::1246::Storage.TaskManager.Task::(stop)
>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::stopping in state preparing
>> > (force False)
>> >
>> > Thread-::DEBUG::2016-04-27
>> > 23:01:27,865::task::993::Storage.TaskManager.Task::(_decref)
>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::ref 1 aborting True
>> >
>> > Thread-::INFO::2016-04-27
>> > 23:01:27,865::task::1171::Storage.TaskManager.Task::(prepare)
>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::aborting: Task is aborted:
>> > 

Re: [ovirt-users] Fwd: Having issues with Hosted Engine

2016-04-28 Thread Luiz Claudio Prazeres Goncalves
Nice!... so, I'll survive a bit more with these issues until the version
3.6.6 gets released...


Thanks
-Luiz

2016-04-28 4:50 GMT-03:00 Simone Tiraboschi :

> On Thu, Apr 28, 2016 at 8:32 AM, Sahina Bose  wrote:
> > This seems like issue reported in
> > https://bugzilla.redhat.com/show_bug.cgi?id=1327121
> >
> > Nir, Simone?
>
> The issue is here:
> MainThread::INFO::2016-04-27
>
> 03:26:27,185::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)
> Disconnecting storage server
> MainThread::INFO::2016-04-27
>
> 03:26:27,816::upgrade::983::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(fix_storage_path)
> Fixing storage path in conf file
>
> And it's tracked here: https://bugzilla.redhat.com/1327516
>
> We already have a patch, it will be fixed with 3.6.6
>
> As far as I saw this issue will only cause a lot of mess in the logs
> and some false alert but it's basically harmless
>
> > On 04/28/2016 05:35 AM, Luiz Claudio Prazeres Goncalves wrote:
> >
> >
> > Hi everyone,
> >
> > Until today my environment was fully updated (3.6.5+centos7.2) with 3
> nodes
> > (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster nodes
> > (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which the
> engine
> > storage domain is sitting on top (3.7.11 fully updated+centos7.2)
> >
> > For some weird reason i've been receiving emails from oVirt with
> > EngineUnexpectedDown (attached picture) on a daily basis more or less,
> but
> > the engine seems to be working fine and my vm's are up and running
> normally.
> > I've never had any issue to access the User Interface to manage the vm's
> >
> > Today I run "yum update" on the nodes and realised that vdsm was
> outdated,
> > so I updated the kvm hosts and they are now , again, fully updated.
> >
> >
> > Reviewing the logs It seems to be an intermittent connectivity issue when
> > trying to access the gluster engine storage domain as you can see below.
> I
> > don't have any network issue in place and I'm 100% sure about it. I have
> > another oVirt Cluster using the same network and using a engine storage
> > domain on top of an iSCSI Storage Array with no issues.
> >
> > Here seems to be the issue:
> >
> > Thread-::INFO::2016-04-27
> > 23:01:27,864::fileSD::357::Storage.StorageDomain::(validate)
> > sdUUID=03926733-1872-4f85-bb21-18dc320560db
> >
> > Thread-::DEBUG::2016-04-27
> > 23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh) read
> > lines (FileMetadataRW)=[]
> >
> > Thread-::DEBUG::2016-04-27
> > 23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh)
> Empty
> > metadata
> >
> > Thread-::ERROR::2016-04-27
> > 23:01:27,865::task::866::Storage.TaskManager.Task::(_setError)
> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error
> >
> > Traceback (most recent call last):
> >
> >   File "/usr/share/vdsm/storage/task.py", line 873, in _run
> >
> > return fn(*args, **kargs)
> >
> >   File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
> >
> > res = f(*args, **kwargs)
> >
> >   File "/usr/share/vdsm/storage/hsm.py", line 2835, in
> getStorageDomainInfo
> >
> > dom = self.validateSdUUID(sdUUID)
> >
> >   File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
> >
> > sdDom.validate()
> >
> >   File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
> >
> > raise se.StorageDomainAccessError(self.sdUUID)
> >
> > StorageDomainAccessError: Domain is either partially accessible or
> entirely
> > inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)
> >
> > Thread-::DEBUG::2016-04-27
> > 23:01:27,865::task::885::Storage.TaskManager.Task::(_run)
> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Task._run:
> > d2acf575-1a60-4fa0-a5bb-cd4363636b94
> > ('03926733-1872-4f85-bb21-18dc320560db',) {} failed - stopping task
> >
> > Thread-::DEBUG::2016-04-27
> > 23:01:27,865::task::1246::Storage.TaskManager.Task::(stop)
> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::stopping in state preparing
> > (force False)
> >
> > Thread-::DEBUG::2016-04-27
> > 23:01:27,865::task::993::Storage.TaskManager.Task::(_decref)
> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::ref 1 aborting True
> >
> > Thread-::INFO::2016-04-27
> > 23:01:27,865::task::1171::Storage.TaskManager.Task::(prepare)
> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::aborting: Task is aborted:
> > 'Domain is either partially accessible or entirely inaccessible' - code
> 379
> >
> > Thread-::DEBUG::2016-04-27
> > 23:01:27,866::task::1176::Storage.TaskManager.Task::(prepare)
> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Prepare: aborted: Domain is
> > either partially accessible or entirely inaccessible
> >
> >
> > Question: Anyone know what might be happening? I have several gluster
> > config's, as you can see below. All the storage domain are using the same
> > config's
> >
> >
> > More information:

Re: [ovirt-users] Fwd: Having issues with Hosted Engine

2016-04-28 Thread Sahina Bose
This seems like issue reported in 
https://bugzilla.redhat.com/show_bug.cgi?id=1327121


Nir, Simone?

On 04/28/2016 05:35 AM, Luiz Claudio Prazeres Goncalves wrote:


Hi everyone,

Until today my environment was fully updated (3.6.5+centos7.2) with 3 
nodes (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster 
nodes (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which 
the engine storage domain is sitting on top (3.7.11 fully 
updated+centos7.2)


For some weird reason i've been receiving emails from oVirt with 
EngineUnexpectedDown (attached picture) on a daily basis more or less, 
but the engine seems to be working fine and my vm's are up and running 
normally. I've never had any issue to access the User Interface to 
manage the vm's


Today I run "yum update" on the nodes and realised that vdsm was 
outdated, so I updated the kvm hosts and they are now , again, fully 
updated.



Reviewing the logs It seems to be an intermittent connectivity issue 
when trying to access the gluster engine storage domain as you can see 
below. I don't have any network issue in place and I'm 100% sure about 
it. I have another oVirt Cluster using the same network and using a 
engine storage domain on top of an iSCSI Storage Array with no issues.


*Here seems to be the issue:*

Thread-::INFO::2016-04-27 
23:01:27,864::fileSD::357::Storage.StorageDomain::(validate) 
sdUUID=03926733-1872-4f85-bb21-18dc320560db


Thread-::DEBUG::2016-04-27 
23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh) 
read lines (FileMetadataRW)=[]


Thread-::DEBUG::2016-04-27 
23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh) 
Empty metadata


Thread-::ERROR::2016-04-27 
23:01:27,865::task::866::Storage.TaskManager.Task::(_setError) 
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error


Traceback (most recent call last):

  File "/usr/share/vdsm/storage/task.py", line 873, in _run

return fn(*args, **kargs)

  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper

res = f(*args, **kwargs)

  File "/usr/share/vdsm/storage/hsm.py", line 2835, in 
getStorageDomainInfo


dom = self.validateSdUUID(sdUUID)

  File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID

sdDom.validate()

  File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate

raise se.StorageDomainAccessError(self.sdUUID)

StorageDomainAccessError: Domain is either partially accessible or 
entirely inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)


Thread-::DEBUG::2016-04-27 
23:01:27,865::task::885::Storage.TaskManager.Task::(_run) 
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Task._run: 
d2acf575-1a60-4fa0-a5bb-cd4363636b94 
('03926733-1872-4f85-bb21-18dc320560db',) {} failed - stopping task


Thread-::DEBUG::2016-04-27 
23:01:27,865::task::1246::Storage.TaskManager.Task::(stop) 
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::stopping in state 
preparing (force False)


Thread-::DEBUG::2016-04-27 
23:01:27,865::task::993::Storage.TaskManager.Task::(_decref) 
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::ref 1 aborting True


Thread-::INFO::2016-04-27 
23:01:27,865::task::1171::Storage.TaskManager.Task::(prepare) 
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::aborting: Task is 
aborted: 'Domain is either partially accessible or entirely 
inaccessible' - code 379


Thread-::DEBUG::2016-04-27 
23:01:27,866::task::1176::Storage.TaskManager.Task::(prepare) 
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Prepare: aborted: Domain 
is either partially accessible or entirely inaccessible



*Question: Anyone know what might be happening? I have several gluster 
config's, as you can see below. All the storage domain are using the 
same config's*



*More information:*

I have the "engine" storage domain, "vmos1" storage domain and 
"master" storage domain, so everything looks good.


[root@kvm1 vdsm]# vdsClient -s 0 getStorageDomainsList

03926733-1872-4f85-bb21-18dc320560db

35021ff4-fb95-43d7-92a3-f538273a3c2e

e306e54e-ca98-468d-bb04-3e8900f8840c


*Gluster config:*

[root@gluster-root1 ~]# gluster volume info

Volume Name: engine

Type: Replicate

Volume ID: 64b413d2-c42e-40fd-b356-3e6975e941b0

Status: Started

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: gluster1.xyz.com:/gluster/engine/brick1

Brick2: gluster2.xyz.com:/gluster/engine/brick1

Brick3: gluster-root1.xyz.com:/gluster/engine/brick1

Options Reconfigured:

performance.cache-size: 1GB

performance.write-behind-window-size: 4MB

performance.write-behind: off

performance.quick-read: off

performance.read-ahead: off

performance.io-cache: off

performance.stat-prefetch: off

cluster.eager-lock: enable

cluster.quorum-type: auto

network.remote-dio: enable

cluster.server-quorum-type: server

cluster.data-self-heal-algorithm: full

performance.low-prio-threads: 32

features.shard-block-size: 512MB

features.shard: on

storage.owner-gid: 36

storage.owner-uid: 36