Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem
On 11/11/2014 05:56 AM, Jaicel wrote: Hi Jirka, the patch works. it stabilized the status of my two hosts. the engine migration during failover also works fine. thanks guys! Hi Jaicel, I'm glad it works for you! Enjoy the hosted engine ;) --Jirka Jaicel *From: *"Jiri Moskovcak" *To: *"Jaicel" *Cc: *"Niels de Vos" , "Vijay Bellur" , us...@ovirt.org, "Gluster Devel" *Sent: *Monday, November 3, 2014 3:33:16 PM *Subject: *Re: [ovirt-users] Hosted-Engine HA problem On 11/01/2014 07:43 AM, Jaicel wrote: > Hi, > > my engine runs on Host1. current status and agent logs below. > > Host 1 Hi, it seems like you ran into [1], you can either zero-out the metadata file or apply the patch from [1] manually. --Jirka [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158925 > > MainThread::INFO::2014-10-31 16:55:39,918::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engi > ne-ha agent 1.1.6 started > MainThread::INFO::2014-10-31 16:55:39,985::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: > :(_get_hostname) Found certificate common name: 192.168.12.11 > MainThread::INFO::2014-10-31 16:55:40,228::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: > :(_initialize_broker) Initializing ha-broker connection > MainThread::INFO::2014-10-31 16:55:40,228::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo > nitor) Starting monitor ping, options {'addr': '192.168.12.254'} > MainThread::INFO::2014-10-31 16:55:40,231::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo > nitor) Success, id 140634215107920 > MainThread::INFO::2014-10-31 16:55:40,231::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo > nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} > MainThread::INFO::2014-10-31 16:55:40,237::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo > nitor) Success, id 140634215108432 > MainThread::INFO::2014-10-31 16:55:40,237::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo > nitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} > MainThread::INFO::2014-10-31 16:55:40,240::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo > nitor) Success, id 39956688 > MainThread::INFO::2014-10-31 16:55:40,240::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo > nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f > 9', 'address': '0'} > MainThread::INFO::2014-10-31 16:55:40,243::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo > nitor) Success, id 140634215107664 > MainThread::INFO::2014-10-31 16:55:40,244::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo > nitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f9', ' > address': '0'} > MainThread::INFO::2014-10-31 16:55:40,249::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo > nitor) Success, id 140634006879632 > MainThread::INFO::2014-10-31 16:55:40,249::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: > :(_initialize_broker) Broker initialized, all submonitors started > MainThread::INFO::2014-10-31 16:55:40,298::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: > :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /rhev/data-center/mnt/g > luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace) > MainThread::INFO::2014-10-31 16:55:40,322::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: > :(refresh) Global metadata: {'maintenance': False} > MainThread::INFO::2014-10-31 16:55:40,322::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine: > :(refresh) Host 192.168.12.12 (id 2): {'live-data': False, 'extra': 'metadata_parse_version=1\nmetadata_feature_version > =1\ntimestamp=1413882675 (Tue Oct 21 17:11:15 2014)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n', 'hostname': '192.168.12.12', 'host-id': 2
Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 199, in _checked_communicate .format(message or response)) RequestError: Request failed: [root@ovirt2 ~]# hosted-engine --vm-status Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 111, in if not status_checker.print_status(): File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 58, in print_status all_host_stats = ha_cli.get_all_host_stats() File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 137, in get_all_host_stats return self.get_all_stats(self.StatModes.HOST) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 86, in get_all_stats constants.SERVICE_TYPE) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 199, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: [root@ovirt2 ~]# service ovirt-ha-agent status ovirt-ha-agent dead but subsys locked Thanks, Jaicel - Original Message - From: "Jiri Moskovcak" To: "Jaicel" Cc: "Niels de Vos" , "Vijay Bellur" , us...@ovirt.org, "Gluster Devel" Sent: Friday, October 31, 2014 11:05:32 PM Subject: Re: [ovirt-users] Hosted-Engine HA problem On 10/31/2014 10:26 AM, Jaicel wrote: i've increased the limit and then restarted agent and broker. status normalize, but then right now it went to "False" state again but still both having 2400 score. agent logs remains the same, with "ovirt-ha-agent dead but subsys locked" status. ha-broker logs below Thread-138::INFO::2014-10-31 17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-138::INFO::2014-10-31 17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-139::INFO::2014-10-31 17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-139::INFO::2014-10-31 17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-140::INFO::2014-10-31 17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-140::INFO::2014-10-31 17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-141::INFO::2014-10-31 17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-141::INFO::2014-10-31 17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-142::INFO::2014-10-31 17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-142::INFO::2014-10-31 17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thanks, Jaicel ok, now it seems that broker runs fine, so I need the recent agent.log to debug it more. --Jirka - Original Message - From: "Jiri Moskovcak" To: "Jaicel R. Sabonsolin" , "Niels de Vos" Cc: "Vijay Bellur" , us...@ovirt.org, "Gluster Devel" Sent: Friday, October 31, 2014 4:32:02 PM Subject: Re: [ovirt-users] Hosted-Engine HA problem On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote: Hi guys, these logs appear on both hosts just like the result of --vm-status. tried to tcpdump on ovirt hosts and gluster nodes but only packets exchange with my monitoring VM(zabbix) appeared. agent.log new_data = self.refresh(self._state.data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 662, in collect_stats constants.SERVI
Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem
On 10/31/2014 10:26 AM, Jaicel wrote: i've increased the limit and then restarted agent and broker. status normalize, but then right now it went to "False" state again but still both having 2400 score. agent logs remains the same, with "ovirt-ha-agent dead but subsys locked" status. ha-broker logs below Thread-138::INFO::2014-10-31 17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-138::INFO::2014-10-31 17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-139::INFO::2014-10-31 17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-139::INFO::2014-10-31 17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-140::INFO::2014-10-31 17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-140::INFO::2014-10-31 17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-141::INFO::2014-10-31 17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-141::INFO::2014-10-31 17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-142::INFO::2014-10-31 17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-142::INFO::2014-10-31 17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thanks, Jaicel ok, now it seems that broker runs fine, so I need the recent agent.log to debug it more. --Jirka - Original Message ----- From: "Jiri Moskovcak" To: "Jaicel R. Sabonsolin" , "Niels de Vos" Cc: "Vijay Bellur" , us...@ovirt.org, "Gluster Devel" Sent: Friday, October 31, 2014 4:32:02 PM Subject: Re: [ovirt-users] Hosted-Engine HA problem On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote: Hi guys, these logs appear on both hosts just like the result of --vm-status. tried to tcpdump on ovirt hosts and gluster nodes but only packets exchange with my monitoring VM(zabbix) appeared. agent.log new_data = self.refresh(self._state.data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 662, in collect_stats constants.SERVICE_TYPE) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 199, in _checked_communicate .format(message or response)) RequestError: Request failed: broker.log File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle response = "success " + self._dispatch(data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch .get_all_stats_for_service_type(**options) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata' - ah, there we go ^^ you might need to tweak the limit of allowed open files as described here [1] or find the app keeps so many files open --Jirka [1] http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/ Thread-38160::INFO::2014-10-31 10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-38161::INFO::2014-10-31 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-38161::ERROR::2014-10-31 10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'get-stats storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111
Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem
On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote: Hi guys, these logs appear on both hosts just like the result of --vm-status. tried to tcpdump on ovirt hosts and gluster nodes but only packets exchange with my monitoring VM(zabbix) appeared. agent.log new_data = self.refresh(self._state.data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 662, in collect_stats constants.SERVICE_TYPE) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 199, in _checked_communicate .format(message or response)) RequestError: Request failed: broker.log File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle response = "success " + self._dispatch(data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch .get_all_stats_for_service_type(**options) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata' - ah, there we go ^^ you might need to tweak the limit of allowed open files as described here [1] or find the app keeps so many files open --Jirka [1] http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/ Thread-38160::INFO::2014-10-31 10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-38161::INFO::2014-10-31 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-38161::ERROR::2014-10-31 10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'get-stats storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent service_type=hosted-engine' Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle response = "success " + self._dispatch(data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch .get_all_stats_for_service_type(**options) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata' Thread-38161::INFO::2014-10-31 10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thanks, Jaicel - Original Message - From: "Niels de Vos" To: "Vijay Bellur" Cc: "Jiri Moskovcak" , "Jaicel R. Sabonsolin" , us...@ovirt.org, "Gluster Devel" Sent: Friday, October 31, 2014 4:11:25 AM Subject: Re: [ovirt-users] Hosted-Engine HA problem On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote: On 10/30/2014 06:45 PM, Jiri Moskovcak wrote: On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote: Hi Guys, I need help with my ovirt Hosted-Engine HA setup. I am running on 2 ovirt hosts and 2 gluster nodes with replicated volumes. i already have VMs running on my hosts and they can migrate normally once i for example power off the host that they are running on. the problem is that the engine can't migrate once i switch off the host that hosts the engine. oVirt3.4.3-1.el6 KVM 0.12.1.2 - 2.415.el6_5.10 LIBVIRT libvirt-0.10.2-29.el6_5.9 VDSM vdsm-4.14.17-0.el6 right now, i have this result from hosted-engine --vm-status. File "/usr
Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?
On 07/19/2014 08:58 AM, Pranith Kumar Karampuri wrote: On 07/19/2014 11:25 AM, Andrew Lau wrote: On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri mailto:pkara...@redhat.com>> wrote: On 07/18/2014 05:43 PM, Andrew Lau wrote: On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur mailto:vbel...@redhat.com>> wrote: [Adding gluster-devel] On 07/18/2014 05:20 PM, Andrew Lau wrote: Hi all, As most of you have got hints from previous messages, hosted engine won't work on gluster . A quote from BZ1097639 "Using hosted engine with Gluster backed storage is currently something we really warn against. I think this bug should be closed or re-targeted at documentation, because there is nothing we can do here. Hosted engine assumes that all writes are atomic and (immediately) available for all hosts in the cluster. Gluster violates those assumptions. " I tried going through BZ1097639 but could not find much detail with respect to gluster there. A few questions around the problem: 1. Can somebody please explain in detail the scenario that causes the problem? 2. Is hosted engine performing synchronous writes to ensure that writes are durable? Also, if there is any documentation that details the hosted engine architecture that would help in enhancing our understanding of its interactions with gluster. Now my question, does this theory prevent a scenario of perhaps something like a gluster replicated volume being mounted as a glusterfs filesystem and then re-exported as the native kernel NFS share for the hosted-engine to consume? It could then be possible to chuck ctdb in there to provide a last resort failover solution. I have tried myself and suggested it to two people who are running a similar setup. Now using the native kernel NFS server for hosted-engine and they haven't reported as many issues. Curious, could anyone validate my theory on this? If we obtain more details on the use case and obtain gluster logs from the failed scenarios, we should be able to understand the problem better. That could be the first step in validating your theory or evolving further recommendations :). I'm not sure how useful this is, but Jiri Moskovcak tracked this down in an off list message. Message Quote: == We were able to track it down to this (thanks Andrew for providing the testing setup): -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine' Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle response = "success " + self._dispatch(data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch .get_all_stats_for_service_type(**options) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata' Andrew/Jiri, Would it be possible to post gluster logs of both the mount and bricks on the bz? I can take a look at it once. If I gather nothing then probably I will ask for your help in re-creating the issue. Pranith Unfortunately, I don't have the logs for that setup any more.. I'll try replicate when I get a chance. If I understand the comment from the BZ, I don't think it's a gluster bug per-say, more just how gluster does its replication. hi Andrew, Thanks for that. I couldn't come to any conclusions because no logs were available. It is unlikely that self-heal is involved because there were no bricks going down/up according to the bug description. Hi, I've never had such setup, I guessed problem with gluster based on "OSError: [Errno 116] Stale file handle:" which happens when the file opened by application on client gets removed on the server. I'm pretty sure we