Re: [ovirt-users] oVirt split brain resolution
Hi Satheesaran, gluster volume info engine Volume Name: engine Type: Replicate Volume ID: 3caae601-74dd-40d1-8629-9a61072bec0f Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gluster0:/gluster/engine/brick Brick2: gluster1:/gluster/engine/brick Brick3: gluster2:/gluster/engine/brick (arbiter) Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 1 features.shard: on user.cifs: off storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on cluster.granular-entry-heal: enable nfs.export-volumes: on As per my previous, i have resolved this by following the steps described. On Tue, Jun 27, 2017 at 1:42 PM, Satheesaran Sundaramoorthi < sasun...@redhat.com> wrote: > On Sat, Jun 24, 2017 at 3:17 PM, Abi Askushi > wrote: > >> Hi all, >> >> For the records, I had to remove manually the conflicting directory and >> ts respective gfid from the arbiter volume: >> >> getfattr -m . -d -e hex e1c80750-b880-495e-9609-b8bc7760d101/ha_agent >> >> That gave me the gfid: 0x277c9caa9dce4a17a2a93775357befd5 >> >> Then cd .glusterfs/27/7c >> >> rm -rf 277c9caa-9dce-4a17-a2a9-3775357befd5 (or move it out of there) >> >> Triggerred heal: gluster volume heal engine >> >> Then all ok: >> >> gluster volume heal engine info >> Brick gluster0:/gluster/engine/brick >> Status: Connected >> Number of entries: 0 >> >> Brick gluster1:/gluster/engine/brick >> Status: Connected >> Number of entries: 0 >> >> Brick gluster2:/gluster/engine/brick >> Status: Connected >> Number of entries: 0 >> >> Thanx. >> > > Hi Abi, > > What is the volume type of 'engine' volume ? > Could you also provide the output of 'gluster volume info engine' to get > to the > closer look at the problem > > -- sas > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] oVirt split brain resolution
On Sat, Jun 24, 2017 at 3:17 PM, Abi Askushi wrote: > Hi all, > > For the records, I had to remove manually the conflicting directory and ts > respective gfid from the arbiter volume: > > getfattr -m . -d -e hex e1c80750-b880-495e-9609-b8bc7760d101/ha_agent > > That gave me the gfid: 0x277c9caa9dce4a17a2a93775357befd5 > > Then cd .glusterfs/27/7c > > rm -rf 277c9caa-9dce-4a17-a2a9-3775357befd5 (or move it out of there) > > Triggerred heal: gluster volume heal engine > > Then all ok: > > gluster volume heal engine info > Brick gluster0:/gluster/engine/brick > Status: Connected > Number of entries: 0 > > Brick gluster1:/gluster/engine/brick > Status: Connected > Number of entries: 0 > > Brick gluster2:/gluster/engine/brick > Status: Connected > Number of entries: 0 > > Thanx. > Hi Abi, What is the volume type of 'engine' volume ? Could you also provide the output of 'gluster volume info engine' to get to the closer look at the problem -- sas ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] oVirt split brain resolution
Hi all, For the records, I had to remove manually the conflicting directory and ts respective gfid from the arbiter volume: getfattr -m . -d -e hex e1c80750-b880-495e-9609-b8bc7760d101/ha_agent That gave me the gfid: 0x277c9caa9dce4a17a2a93775357befd5 Then cd .glusterfs/27/7c rm -rf 277c9caa-9dce-4a17-a2a9-3775357befd5 (or move it out of there) Triggerred heal: gluster volume heal engine Then all ok: gluster volume heal engine info Brick gluster0:/gluster/engine/brick Status: Connected Number of entries: 0 Brick gluster1:/gluster/engine/brick Status: Connected Number of entries: 0 Brick gluster2:/gluster/engine/brick Status: Connected Number of entries: 0 Thanx. On Fri, Jun 23, 2017 at 7:21 PM, Abi Askushi wrote: > Hi Denis, > > I receive permission denied as below: > > gluster volume heal engine split-brain latest-mtime > /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent > Healing /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent failed:Operation > not permitted. > Volume heal failed. > > > When I shutdown host3 then no split brain is reported from the remaining > two hosts. When I power up host3 then I receive the mentioned split brain > and host3 logs the following at ovirt-hosted-engine-ha/agent.log > > MainThread::INFO::2017-06-23 16:18:06,067::hosted_engine:: > 594::ovirt_hosted_engine_ha.agent.hosted_engine. > HostedEngine::(_initialize_broker) Failed set the storage domain: 'Failed > to set storage domain VdsmBackend, options {'hosted-engine.lockspace': ' > 7B22696D6167655F75756964223A202238323132626637382D66392D > 346465652D61672D346265633734353035366235222C202270617468 > 223A206E756C6C2C2022766F6C756D655F75756964223A20223632373930 > 3162652D666261332D346263342D393037632D393931356138333632633537227D', > 'sp_uuid': '----', 'dom_type': > 'glusterfs', 'hosted-engine.metadata': '7B22696D6167655F75756964223A20 > 2263353930633034372D613462322D346539312D613832362D6434386239 > 61643537323330222C202270617468223A206E756C6C2C2022766F6C756D > 655F75756964223A202230353166653865612D39632D346134302D38 > 3438382D386335313138666438373238227D', 'sd_uuid': > 'e1c80750-b880-495e-9609-b8bc7760d101'}: Request failed: 'exceptions.OSError'>'. Waiting '5's before the next attempt > > and the following at /var/log/messages: > Jun 23 16:19:43 v2 journal: vdsm root ERROR failed to retrieve Hosted > Engine HA info#012Traceback (most recent call last):#012 File > "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in > _getHaInfo#012stats = instance.get_all_stats()#012 File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 105, in get_all_stats#012stats = > broker.get_stats_from_storage(service)#012 > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > line 233, in get_stats_from_storage#012result = > self._checked_communicate(request)#012 File "/usr/lib/python2.7/site- > packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 261, in > _checked_communicate#012.format(message or response))#012RequestError: > Request failed: failed to read metadata: [Errno 5] Input/output error: > '/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/e1c80750-b880-495e- > 9609-b8bc7760d101/ha_agent/hosted-engine.metadata' > > Thanx > > > On Fri, Jun 23, 2017 at 6:05 PM, Denis Chaplygin > wrote: > >> Hello Abi, >> >> On Fri, Jun 23, 2017 at 4:47 PM, Abi Askushi >> wrote: >> >>> Hi All, >>> >>> I have a 3 node ovirt 4.1 setup. I lost one node due to raid controller >>> issues. Upon restoration I have the following split brain, although the >>> hosts have mounted the storage domains: >>> >>> gluster volume heal engine info split-brain >>> Brick gluster0:/gluster/engine/brick >>> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent >>> Status: Connected >>> Number of entries in split-brain: 1 >>> >>> Brick gluster1:/gluster/engine/brick >>> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent >>> Status: Connected >>> Number of entries in split-brain: 1 >>> >>> Brick gluster2:/gluster/engine/brick >>> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent >>> Status: Connected >>> Number of entries in split-brain: 1 >>> >>> >>> >> It is definitely on gluster side. You could try to use >> >> gluster volume heal engine split-brain latest-mtime /e1c80750-b880-49 >> 5e-9609-b8bc7760d101/ha_agent >> >> >> I also added gluster developers to that thread, so they may provide you >> with better advices. >> > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] oVirt split brain resolution
Hi Denis, I receive permission denied as below: gluster volume heal engine split-brain latest-mtime /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent Healing /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent failed:Operation not permitted. Volume heal failed. When I shutdown host3 then no split brain is reported from the remaining two hosts. When I power up host3 then I receive the mentioned split brain and host3 logs the following at ovirt-hosted-engine-ha/agent.log MainThread::INFO::2017-06-23 16:18:06,067::hosted_engine::594::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed set the storage domain: 'Failed to set storage domain VdsmBackend, options {'hosted-engine.lockspace': '7B22696D6167655F75756964223A202238323132626637382D66392D346465652D61672D346265633734353035366235222C202270617468223A206E756C6C2C2022766F6C756D655F75756964223A202236323739303162652D666261332D346263342D393037632D393931356138333632633537227D', 'sp_uuid': '----', 'dom_type': 'glusterfs', 'hosted-engine.metadata': '7B22696D6167655F75756964223A202263353930633034372D613462322D346539312D613832362D643438623961643537323330222C202270617468223A206E756C6C2C2022766F6C756D655F75756964223A202230353166653865612D39632D346134302D383438382D386335313138666438373238227D', 'sd_uuid': 'e1c80750-b880-495e-9609-b8bc7760d101'}: Request failed: '. Waiting '5's before the next attempt and the following at /var/log/messages: Jun 23 16:19:43 v2 journal: vdsm root ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo#012stats = instance.get_all_stats()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 105, in get_all_stats#012stats = broker.get_stats_from_storage(service)#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 233, in get_stats_from_storage#012result = self._checked_communicate(request)#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 261, in _checked_communicate#012.format(message or response))#012RequestError: Request failed: failed to read metadata: [Errno 5] Input/output error: '/rhev/data-center/mnt/glusterSD/10.100.100.1: _engine/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent/hosted-engine.metadata' Thanx On Fri, Jun 23, 2017 at 6:05 PM, Denis Chaplygin wrote: > Hello Abi, > > On Fri, Jun 23, 2017 at 4:47 PM, Abi Askushi > wrote: > >> Hi All, >> >> I have a 3 node ovirt 4.1 setup. I lost one node due to raid controller >> issues. Upon restoration I have the following split brain, although the >> hosts have mounted the storage domains: >> >> gluster volume heal engine info split-brain >> Brick gluster0:/gluster/engine/brick >> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent >> Status: Connected >> Number of entries in split-brain: 1 >> >> Brick gluster1:/gluster/engine/brick >> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent >> Status: Connected >> Number of entries in split-brain: 1 >> >> Brick gluster2:/gluster/engine/brick >> /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent >> Status: Connected >> Number of entries in split-brain: 1 >> >> >> > It is definitely on gluster side. You could try to use > > gluster volume heal engine split-brain latest-mtime /e1c80750-b880- > 495e-9609-b8bc7760d101/ha_agent > > > I also added gluster developers to that thread, so they may provide you > with better advices. > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] oVirt split brain resolution
Hello Abi, On Fri, Jun 23, 2017 at 4:47 PM, Abi Askushi wrote: > Hi All, > > I have a 3 node ovirt 4.1 setup. I lost one node due to raid controller > issues. Upon restoration I have the following split brain, although the > hosts have mounted the storage domains: > > gluster volume heal engine info split-brain > Brick gluster0:/gluster/engine/brick > /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent > Status: Connected > Number of entries in split-brain: 1 > > Brick gluster1:/gluster/engine/brick > /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent > Status: Connected > Number of entries in split-brain: 1 > > Brick gluster2:/gluster/engine/brick > /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent > Status: Connected > Number of entries in split-brain: 1 > > > It is definitely on gluster side. You could try to use gluster volume heal engine split-brain latest-mtime /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent I also added gluster developers to that thread, so they may provide you with better advices. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] oVirt split brain resolution
Hi All, I have a 3 node ovirt 4.1 setup. I lost one node due to raid controller issues. Upon restoration I have the following split brain, although the hosts have mounted the storage domains: gluster volume heal engine info split-brain Brick gluster0:/gluster/engine/brick /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent Status: Connected Number of entries in split-brain: 1 Brick gluster1:/gluster/engine/brick /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent Status: Connected Number of entries in split-brain: 1 Brick gluster2:/gluster/engine/brick /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent Status: Connected Number of entries in split-brain: 1 Hosted engine status gives the following: hosted-engine --vm-status Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 173, in if not status_checker.print_status(): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 103, in print_status all_host_stats = self._get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 73, in _get_all_host_stats all_host_stats = ha_cli.get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats return self.get_all_stats(self.StatModes.HOST) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 105, in get_all_stats stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 233, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 261, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 5] Input/output error: '/rhev/data-center/mnt/glusterSD/10.100.100.1: _engine/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent/hosted-engine.metadata' Any idea on how to resolve this split brain? Thanx, Alex ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users