Re: [ovirt-users] oVirt split brain resolution

2017-06-27 Thread Abi Askushi
Hi Satheesaran,

gluster volume info engine

Volume Name: engine
Type: Replicate
Volume ID: 3caae601-74dd-40d1-8629-9a61072bec0f
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster0:/gluster/engine/brick
Brick2: gluster1:/gluster/engine/brick
Brick3: gluster2:/gluster/engine/brick (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 1
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.granular-entry-heal: enable
nfs.export-volumes: on

As per my previous, i have resolved this by following the steps described.


On Tue, Jun 27, 2017 at 1:42 PM, Satheesaran Sundaramoorthi <
sasun...@redhat.com> wrote:

> On Sat, Jun 24, 2017 at 3:17 PM, Abi Askushi 
> wrote:
>
>> Hi all,
>>
>> For the records, I had to remove manually the conflicting directory and
>> ts respective gfid from the arbiter volume:
>>
>>  getfattr -m . -d -e hex e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
>>
>> That gave me the gfid: 0x277c9caa9dce4a17a2a93775357befd5
>>
>> Then cd .glusterfs/27/7c
>>
>> rm -rf 277c9caa-9dce-4a17-a2a9-3775357befd5 (or move it out of there)
>>
>> Triggerred heal: gluster volume heal engine
>>
>> Then all ok:
>>
>> gluster volume heal engine info
>> Brick gluster0:/gluster/engine/brick
>> Status: Connected
>> Number of entries: 0
>>
>> Brick gluster1:/gluster/engine/brick
>> Status: Connected
>> Number of entries: 0
>>
>> Brick gluster2:/gluster/engine/brick
>> Status: Connected
>> Number of entries: 0
>>
>> Thanx.
>>
>
> ​Hi Abi,
>
> What is the volume type of 'engine' volume ?
> Could you also provide the output of 'gluster volume info engine' to get
> to the
> closer look at the problem
>
> ​-- sas​
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt split brain resolution

2017-06-27 Thread Satheesaran Sundaramoorthi
On Sat, Jun 24, 2017 at 3:17 PM, Abi Askushi 
wrote:

> Hi all,
>
> For the records, I had to remove manually the conflicting directory and ts
> respective gfid from the arbiter volume:
>
>  getfattr -m . -d -e hex e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
>
> That gave me the gfid: 0x277c9caa9dce4a17a2a93775357befd5
>
> Then cd .glusterfs/27/7c
>
> rm -rf 277c9caa-9dce-4a17-a2a9-3775357befd5 (or move it out of there)
>
> Triggerred heal: gluster volume heal engine
>
> Then all ok:
>
> gluster volume heal engine info
> Brick gluster0:/gluster/engine/brick
> Status: Connected
> Number of entries: 0
>
> Brick gluster1:/gluster/engine/brick
> Status: Connected
> Number of entries: 0
>
> Brick gluster2:/gluster/engine/brick
> Status: Connected
> Number of entries: 0
>
> Thanx.
>

​Hi Abi,

What is the volume type of 'engine' volume ?
Could you also provide the output of 'gluster volume info engine' to get to
the
closer look at the problem

​-- sas​
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt split brain resolution

2017-06-24 Thread Abi Askushi
Hi all,

For the records, I had to remove manually the conflicting directory and ts
respective gfid from the arbiter volume:

 getfattr -m . -d -e hex e1c80750-b880-495e-9609-b8bc7760d101/ha_agent

That gave me the gfid: 0x277c9caa9dce4a17a2a93775357befd5

Then cd .glusterfs/27/7c

rm -rf 277c9caa-9dce-4a17-a2a9-3775357befd5 (or move it out of there)

Triggerred heal: gluster volume heal engine

Then all ok:

gluster volume heal engine info
Brick gluster0:/gluster/engine/brick
Status: Connected
Number of entries: 0

Brick gluster1:/gluster/engine/brick
Status: Connected
Number of entries: 0

Brick gluster2:/gluster/engine/brick
Status: Connected
Number of entries: 0

Thanx.

On Fri, Jun 23, 2017 at 7:21 PM, Abi Askushi 
wrote:

> Hi Denis,
>
> I receive permission denied as below:
>
> gluster volume heal engine split-brain latest-mtime
> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
> Healing /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent failed:Operation
> not permitted.
> Volume heal failed.
>
>
> When I shutdown host3 then no split brain is reported from the remaining
> two hosts. When I power up host3 then I receive the mentioned split brain
> and host3 logs the following at ovirt-hosted-engine-ha/agent.log
>
> MainThread::INFO::2017-06-23 16:18:06,067::hosted_engine::
> 594::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_broker) Failed set the storage domain: 'Failed
> to set storage domain VdsmBackend, options {'hosted-engine.lockspace': '
> 7B22696D6167655F75756964223A202238323132626637382D66392D
> 346465652D61672D346265633734353035366235222C202270617468
> 223A206E756C6C2C2022766F6C756D655F75756964223A20223632373930
> 3162652D666261332D346263342D393037632D393931356138333632633537227D',
> 'sp_uuid': '----', 'dom_type':
> 'glusterfs', 'hosted-engine.metadata': '7B22696D6167655F75756964223A20
> 2263353930633034372D613462322D346539312D613832362D6434386239
> 61643537323330222C202270617468223A206E756C6C2C2022766F6C756D
> 655F75756964223A202230353166653865612D39632D346134302D38
> 3438382D386335313138666438373238227D', 'sd_uuid':
> 'e1c80750-b880-495e-9609-b8bc7760d101'}: Request failed:  'exceptions.OSError'>'. Waiting '5's before the next attempt
>
> and the following at /var/log/messages:
> Jun 23 16:19:43 v2 journal: vdsm root ERROR failed to retrieve Hosted
> Engine HA info#012Traceback (most recent call last):#012  File
> "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in
> _getHaInfo#012stats = instance.get_all_stats()#012  File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
> line 105, in get_all_stats#012stats = 
> broker.get_stats_from_storage(service)#012
> File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 233, in get_stats_from_storage#012result =
> self._checked_communicate(request)#012  File "/usr/lib/python2.7/site-
> packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 261, in
> _checked_communicate#012.format(message or response))#012RequestError:
> Request failed: failed to read metadata: [Errno 5] Input/output error:
> '/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/e1c80750-b880-495e-
> 9609-b8bc7760d101/ha_agent/hosted-engine.metadata'
>
> Thanx
>
>
> On Fri, Jun 23, 2017 at 6:05 PM, Denis Chaplygin 
> wrote:
>
>> Hello Abi,
>>
>> On Fri, Jun 23, 2017 at 4:47 PM, Abi Askushi 
>> wrote:
>>
>>> Hi All,
>>>
>>> I have a 3 node ovirt 4.1 setup. I lost one node due to raid controller
>>> issues. Upon restoration I have the following split brain, although the
>>> hosts have mounted the storage domains:
>>>
>>> gluster volume heal engine info split-brain
>>> Brick gluster0:/gluster/engine/brick
>>> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
>>> Status: Connected
>>> Number of entries in split-brain: 1
>>>
>>> Brick gluster1:/gluster/engine/brick
>>> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
>>> Status: Connected
>>> Number of entries in split-brain: 1
>>>
>>> Brick gluster2:/gluster/engine/brick
>>> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
>>> Status: Connected
>>> Number of entries in split-brain: 1
>>>
>>>
>>>
>> It is definitely on gluster side. You could try to use
>>
>> gluster volume heal engine split-brain latest-mtime /e1c80750-b880-49
>> 5e-9609-b8bc7760d101/ha_agent
>>
>>
>> I also added gluster developers to that thread, so they may provide you
>> with better advices.
>>
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt split brain resolution

2017-06-23 Thread Abi Askushi
Hi Denis,

I receive permission denied as below:

gluster volume heal engine split-brain latest-mtime
/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
Healing /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent failed:Operation not
permitted.
Volume heal failed.


When I shutdown host3 then no split brain is reported from the remaining
two hosts. When I power up host3 then I receive the mentioned split brain
and host3 logs the following at ovirt-hosted-engine-ha/agent.log

MainThread::INFO::2017-06-23
16:18:06,067::hosted_engine::594::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Failed set the storage domain: 'Failed to set storage domain VdsmBackend,
options {'hosted-engine.lockspace':
'7B22696D6167655F75756964223A202238323132626637382D66392D346465652D61672D346265633734353035366235222C202270617468223A206E756C6C2C2022766F6C756D655F75756964223A202236323739303162652D666261332D346263342D393037632D393931356138333632633537227D',
'sp_uuid': '----', 'dom_type': 'glusterfs',
'hosted-engine.metadata':
'7B22696D6167655F75756964223A202263353930633034372D613462322D346539312D613832362D643438623961643537323330222C202270617468223A206E756C6C2C2022766F6C756D655F75756964223A202230353166653865612D39632D346134302D383438382D386335313138666438373238227D',
'sd_uuid': 'e1c80750-b880-495e-9609-b8bc7760d101'}: Request failed: '. Waiting '5's before the next attempt

and the following at /var/log/messages:
Jun 23 16:19:43 v2 journal: vdsm root ERROR failed to retrieve Hosted
Engine HA info#012Traceback (most recent call last):#012  File
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in
_getHaInfo#012stats = instance.get_all_stats()#012  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 105, in get_all_stats#012stats =
broker.get_stats_from_storage(service)#012  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 233, in get_stats_from_storage#012result =
self._checked_communicate(request)#012  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 261, in _checked_communicate#012.format(message or
response))#012RequestError: Request failed: failed to read metadata: [Errno
5] Input/output error: '/rhev/data-center/mnt/glusterSD/10.100.100.1:
_engine/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent/hosted-engine.metadata'

Thanx


On Fri, Jun 23, 2017 at 6:05 PM, Denis Chaplygin 
wrote:

> Hello Abi,
>
> On Fri, Jun 23, 2017 at 4:47 PM, Abi Askushi 
> wrote:
>
>> Hi All,
>>
>> I have a 3 node ovirt 4.1 setup. I lost one node due to raid controller
>> issues. Upon restoration I have the following split brain, although the
>> hosts have mounted the storage domains:
>>
>> gluster volume heal engine info split-brain
>> Brick gluster0:/gluster/engine/brick
>> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
>> Status: Connected
>> Number of entries in split-brain: 1
>>
>> Brick gluster1:/gluster/engine/brick
>> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
>> Status: Connected
>> Number of entries in split-brain: 1
>>
>> Brick gluster2:/gluster/engine/brick
>> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
>> Status: Connected
>> Number of entries in split-brain: 1
>>
>>
>>
> It is definitely on gluster side. You could try to use
>
> gluster volume heal engine split-brain latest-mtime /e1c80750-b880-
> 495e-9609-b8bc7760d101/ha_agent
>
>
> I also added gluster developers to that thread, so they may provide you
> with better advices.
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt split brain resolution

2017-06-23 Thread Denis Chaplygin
Hello Abi,

On Fri, Jun 23, 2017 at 4:47 PM, Abi Askushi 
wrote:

> Hi All,
>
> I have a 3 node ovirt 4.1 setup. I lost one node due to raid controller
> issues. Upon restoration I have the following split brain, although the
> hosts have mounted the storage domains:
>
> gluster volume heal engine info split-brain
> Brick gluster0:/gluster/engine/brick
> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
> Status: Connected
> Number of entries in split-brain: 1
>
> Brick gluster1:/gluster/engine/brick
> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
> Status: Connected
> Number of entries in split-brain: 1
>
> Brick gluster2:/gluster/engine/brick
> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
> Status: Connected
> Number of entries in split-brain: 1
>
>
>
It is definitely on gluster side. You could try to use

gluster volume heal engine split-brain latest-mtime
/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent


I also added gluster developers to that thread, so they may provide you
with better advices.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] oVirt split brain resolution

2017-06-23 Thread Abi Askushi
Hi All,

I have a 3 node ovirt 4.1 setup. I lost one node due to raid controller
issues. Upon restoration I have the following split brain, although the
hosts have mounted the storage domains:

gluster volume heal engine info split-brain
Brick gluster0:/gluster/engine/brick
/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
Status: Connected
Number of entries in split-brain: 1

Brick gluster1:/gluster/engine/brick
/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
Status: Connected
Number of entries in split-brain: 1

Brick gluster2:/gluster/engine/brick
/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
Status: Connected
Number of entries in split-brain: 1


Hosted engine status gives the following:

hosted-engine --vm-status
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py",
line 173, in 
if not status_checker.print_status():
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py",
line 103, in print_status
all_host_stats = self._get_all_host_stats()
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py",
line 73, in _get_all_host_stats
all_host_stats = ha_cli.get_all_host_stats()
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 160, in get_all_host_stats
return self.get_all_stats(self.StatModes.HOST)
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 105, in get_all_stats
stats = broker.get_stats_from_storage(service)
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 233, in get_stats_from_storage
result = self._checked_communicate(request)
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 261, in _checked_communicate
.format(message or response))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed
to read metadata: [Errno 5] Input/output error:
'/rhev/data-center/mnt/glusterSD/10.100.100.1:
_engine/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent/hosted-engine.metadata'

Any idea on how to resolve this split brain?

Thanx,
Alex
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users