Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem

2014-11-10 Thread Jiri Moskovcak

On 11/11/2014 05:56 AM, Jaicel wrote:

Hi Jirka,

the patch works. it stabilized the status of my two hosts. the engine
migration during failover also works fine. thanks guys!


Hi Jaicel,
I'm glad it works for you! Enjoy the hosted engine ;)

--Jirka



Jaicel


*From: *"Jiri Moskovcak" 
*To: *"Jaicel" 
*Cc: *"Niels de Vos" , "Vijay Bellur"
, us...@ovirt.org, "Gluster Devel"

*Sent: *Monday, November 3, 2014 3:33:16 PM
*Subject: *Re: [ovirt-users] Hosted-Engine HA problem

On 11/01/2014 07:43 AM, Jaicel wrote:
 > Hi,
 >
 > my engine runs on Host1. current status and agent logs below.
 >
 > Host 1

Hi,
it seems like you ran into [1], you can either zero-out the metadata
file or apply the patch from [1] manually.

--Jirka

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1158925

 >
 > MainThread::INFO::2014-10-31
16:55:39,918::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
ovirt-hosted-engi
 > ne-ha agent 1.1.6 started
 > MainThread::INFO::2014-10-31
16:55:39,985::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
 > :(_get_hostname) Found certificate common name: 192.168.12.11
 > MainThread::INFO::2014-10-31
16:55:40,228::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
 > :(_initialize_broker) Initializing ha-broker connection
 > MainThread::INFO::2014-10-31
16:55:40,228::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
 > nitor) Starting monitor ping, options {'addr': '192.168.12.254'}
 > MainThread::INFO::2014-10-31
16:55:40,231::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
 > nitor) Success, id 140634215107920
 > MainThread::INFO::2014-10-31
16:55:40,231::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
 > nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true',
'bridge_name': 'ovirtmgmt', 'address': '0'}
 > MainThread::INFO::2014-10-31
16:55:40,237::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
 > nitor) Success, id 140634215108432
 > MainThread::INFO::2014-10-31
16:55:40,237::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
 > nitor) Starting monitor mem-free, options {'use_ssl': 'true',
'address': '0'}
 > MainThread::INFO::2014-10-31
16:55:40,240::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
 > nitor) Success, id 39956688
 > MainThread::INFO::2014-10-31
16:55:40,240::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
 > nitor) Starting monitor cpu-load-no-engine, options {'use_ssl':
'true', 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f
 > 9', 'address': '0'}
 > MainThread::INFO::2014-10-31
16:55:40,243::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
 > nitor) Success, id 140634215107664
 > MainThread::INFO::2014-10-31
16:55:40,244::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
 > nitor) Starting monitor engine-health, options {'use_ssl': 'true',
'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f9', '
 > address': '0'}
 > MainThread::INFO::2014-10-31
16:55:40,249::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
 > nitor) Success, id 140634006879632
 > MainThread::INFO::2014-10-31
16:55:40,249::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
 > :(_initialize_broker) Broker initialized, all submonitors started
 > MainThread::INFO::2014-10-31
16:55:40,298::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
 > :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine,
host id 1 is acquired (file: /rhev/data-center/mnt/g
 >
luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)
 > MainThread::INFO::2014-10-31
16:55:40,322::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
 > :(refresh) Global metadata: {'maintenance': False}
 > MainThread::INFO::2014-10-31
16:55:40,322::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
 > :(refresh) Host 192.168.12.12 (id 2): {'live-data': False, 'extra':
'metadata_parse_version=1\nmetadata_feature_version
 > =1\ntimestamp=1413882675 (Tue Oct 21 17:11:15
2014)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
'hostname': '192.168.12.12', 'host-id': 2

Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem

2014-11-02 Thread Jiri Moskovcak
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 171, in get_stats_from_storage
 result = self._checked_communicate(request)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 199, in _checked_communicate
 .format(message or response))
RequestError: Request failed: 

[root@ovirt2 ~]# hosted-engine --vm-status
Traceback (most recent call last):
   File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
 "__main__", fname, loader, pkg_name)
   File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
 exec code in run_globals
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", 
line 111, in 
 if not status_checker.print_status():
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 
58, in print_status
 all_host_stats = ha_cli.get_all_host_stats()
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 137, in get_all_host_stats
 return self.get_all_stats(self.StatModes.HOST)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 86, in get_all_stats
 constants.SERVICE_TYPE)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 171, in get_stats_from_storage
 result = self._checked_communicate(request)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 199, in _checked_communicate
 .format(message or response))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: 
[root@ovirt2 ~]# service ovirt-ha-agent status
ovirt-ha-agent dead but subsys locked


Thanks,
Jaicel

- Original Message -
From: "Jiri Moskovcak" 
To: "Jaicel" 
Cc: "Niels de Vos" , "Vijay Bellur" , us...@ovirt.org, 
"Gluster Devel" 
Sent: Friday, October 31, 2014 11:05:32 PM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On 10/31/2014 10:26 AM, Jaicel wrote:

i've increased the limit and then restarted agent and broker. status normalize, but then right now 
it went to "False" state again but still both having 2400 score. agent logs remains the 
same, with "ovirt-ha-agent dead but subsys locked" status. ha-broker logs below

Thread-138::INFO::2014-10-31 
17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-138::INFO::2014-10-31 
17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-139::INFO::2014-10-31 
17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-139::INFO::2014-10-31 
17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-140::INFO::2014-10-31 
17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-140::INFO::2014-10-31 
17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-141::INFO::2014-10-31 
17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-141::INFO::2014-10-31 
17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-142::INFO::2014-10-31 
17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-142::INFO::2014-10-31 
17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed

Thanks,
Jaicel


ok, now it seems that broker runs fine, so I need the recent agent.log
to debug it more.

--Jirka



- Original Message -
From: "Jiri Moskovcak" 
To: "Jaicel R. Sabonsolin" , "Niels de Vos" 

Cc: "Vijay Bellur" , us...@ovirt.org, "Gluster Devel" 

Sent: Friday, October 31, 2014 4:32:02 PM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:

Hi guys,

these logs appear on both hosts just like the result of --vm-status. tried to 
tcpdump on ovirt hosts and gluster nodes but only packets exchange with my 
monitoring VM(zabbix) appeared.

agent.log
   new_data = self.refresh(self._state.data)
 File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
 line 77, in refresh
   stats.update(self.hosted_engine.collect_stats())
 File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 662, in collect_stats
   constants.SERVI

Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem

2014-10-31 Thread Jiri Moskovcak

On 10/31/2014 10:26 AM, Jaicel wrote:

i've increased the limit and then restarted agent and broker. status normalize, but then right now 
it went to "False" state again but still both having 2400 score. agent logs remains the 
same, with "ovirt-ha-agent dead but subsys locked" status. ha-broker logs below

Thread-138::INFO::2014-10-31 
17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-138::INFO::2014-10-31 
17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-139::INFO::2014-10-31 
17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-139::INFO::2014-10-31 
17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-140::INFO::2014-10-31 
17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-140::INFO::2014-10-31 
17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-141::INFO::2014-10-31 
17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-141::INFO::2014-10-31 
17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-142::INFO::2014-10-31 
17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-142::INFO::2014-10-31 
17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed

Thanks,
Jaicel


ok, now it seems that broker runs fine, so I need the recent agent.log 
to debug it more.


--Jirka



- Original Message -----
From: "Jiri Moskovcak" 
To: "Jaicel R. Sabonsolin" , "Niels de Vos" 

Cc: "Vijay Bellur" , us...@ovirt.org, "Gluster Devel" 

Sent: Friday, October 31, 2014 4:32:02 PM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:

Hi guys,

these logs appear on both hosts just like the result of --vm-status. tried to 
tcpdump on ovirt hosts and gluster nodes but only packets exchange with my 
monitoring VM(zabbix) appeared.

agent.log
  new_data = self.refresh(self._state.data)
File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
 line 77, in refresh
  stats.update(self.hosted_engine.collect_stats())
File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 662, in collect_stats
  constants.SERVICE_TYPE)
File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 171, in get_stats_from_storage
  result = self._checked_communicate(request)
File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 199, in _checked_communicate
  .format(message or response))
RequestError: Request failed: 

broker.log
File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 165, in handle
  response = "success " + self._dispatch(data)
File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 261, in _dispatch
  .get_all_stats_for_service_type(**options)
File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 41, in get_all_stats_for_service_type
  d = self.get_raw_stats_for_service_type(storage_dir, service_type)
File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 74, in get_raw_stats_for_service_type
  f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 24] Too many open files: 
'/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'


- ah, there we go ^^ you might need to tweak the limit of allowed
open files as described here [1] or find the app keeps so many files open


--Jirka

[1]
http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/


Thread-38160::INFO::2014-10-31 
10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-38161::INFO::2014-10-31 
10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-38161::ERROR::2014-10-31 
10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Error handling request, data: 'get-stats 
storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111

Re: [Gluster-devel] [ovirt-users] Hosted-Engine HA problem

2014-10-31 Thread Jiri Moskovcak

On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:

Hi guys,

these logs appear on both hosts just like the result of --vm-status. tried to 
tcpdump on ovirt hosts and gluster nodes but only packets exchange with my 
monitoring VM(zabbix) appeared.

agent.log
 new_data = self.refresh(self._state.data)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
 line 77, in refresh
 stats.update(self.hosted_engine.collect_stats())
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 662, in collect_stats
 constants.SERVICE_TYPE)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 171, in get_stats_from_storage
 result = self._checked_communicate(request)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 199, in _checked_communicate
 .format(message or response))
RequestError: Request failed: 

broker.log
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 165, in handle
 response = "success " + self._dispatch(data)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 261, in _dispatch
 .get_all_stats_for_service_type(**options)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 41, in get_all_stats_for_service_type
 d = self.get_raw_stats_for_service_type(storage_dir, service_type)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 74, in get_raw_stats_for_service_type
 f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 24] Too many open files: 
'/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'


- ah, there we go ^^ you might need to tweak the limit of allowed 
open files as described here [1] or find the app keeps so many files open



--Jirka

[1] 
http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/



Thread-38160::INFO::2014-10-31 
10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-38161::INFO::2014-10-31 
10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
Thread-38161::ERROR::2014-10-31 
10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Error handling request, data: 'get-stats 
storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent
 service_type=hosted-engine'
Traceback (most recent call last):
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 165, in handle
 response = "success " + self._dispatch(data)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 261, in _dispatch
 .get_all_stats_for_service_type(**options)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 41, in get_all_stats_for_service_type
 d = self.get_raw_stats_for_service_type(storage_dir, service_type)
   File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
 line 74, in get_raw_stats_for_service_type
 f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 24] Too many open files: 
'/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
Thread-38161::INFO::2014-10-31 
10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed

Thanks,
Jaicel

- Original Message -
From: "Niels de Vos" 
To: "Vijay Bellur" 
Cc: "Jiri Moskovcak" , "Jaicel R. Sabonsolin" , 
us...@ovirt.org, "Gluster Devel" 
Sent: Friday, October 31, 2014 4:11:25 AM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote:

On 10/30/2014 06:45 PM, Jiri Moskovcak wrote:

On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote:

Hi Guys,

I need help with my ovirt Hosted-Engine HA setup. I am running on 2
ovirt hosts and 2 gluster nodes with replicated volumes. i already have
VMs running on my hosts and they can migrate normally once i for example
power off the host that they are running on. the problem is that the
engine can't migrate once i switch off the host that hosts the engine.

oVirt3.4.3-1.el6
KVM 0.12.1.2 - 2.415.el6_5.10
LIBVIRT   libvirt-0.10.2-29.el6_5.9
VDSM  vdsm-4.14.17-0.el6


right now, i have this result from hosted-engine --vm-status.

   File "/usr

Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-21 Thread Jiri Moskovcak

On 07/19/2014 08:58 AM, Pranith Kumar Karampuri wrote:


On 07/19/2014 11:25 AM, Andrew Lau wrote:



On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri
mailto:pkara...@redhat.com>> wrote:


On 07/18/2014 05:43 PM, Andrew Lau wrote:

​ ​

On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur
mailto:vbel...@redhat.com>> wrote:

[Adding gluster-devel]


On 07/18/2014 05:20 PM, Andrew Lau wrote:

Hi all,

As most of you have got hints from previous messages,
hosted engine
won't work on gluster . A quote from BZ1097639

"Using hosted engine with Gluster backed storage is
currently something
we really warn against.


I think this bug should be closed or re-targeted at
documentation, because there is nothing we can do here.
Hosted engine assumes that all writes are atomic and
(immediately) available for all hosts in the cluster.
Gluster violates those assumptions.
​"

I tried going through BZ1097639 but could not find much
detail with respect to gluster there.

A few questions around the problem:

1. Can somebody please explain in detail the scenario that
causes the problem?

2. Is hosted engine performing synchronous writes to ensure
that writes are durable?

Also, if there is any documentation that details the hosted
engine architecture that would help in enhancing our
understanding of its interactions with gluster.


​

Now my question, does this theory prevent a scenario of
perhaps
something like a gluster replicated volume being mounted
as a glusterfs
filesystem and then re-exported as the native kernel NFS
share for the
hosted-engine to consume? It could then be possible to
chuck ctdb in
there to provide a last resort failover solution. I have
tried myself
and suggested it to two people who are running a similar
setup. Now
using the native kernel NFS server for hosted-engine and
they haven't
reported as many issues. Curious, could anyone validate
my theory on this?


If we obtain more details on the use case and obtain gluster
logs from the failed scenarios, we should be able to
understand the problem better. That could be the first step
in validating your theory or evolving further recommendations :).


​ I'm not sure how useful this is, but ​Jiri Moskovcak tracked
this down in an off list message.

​ Message Quote:​

​ ==​

​We were able to track it down to this (thanks Andrew for
providing the testing setup):

-b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 165, in handle
  response = "success " + self._dispatch(data)
File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 261, in _dispatch
  .get_all_stats_for_service_type(**options)
File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 41, in get_all_stats_for_service_type
  d = self.get_raw_stats_for_service_type(storage_dir, service_type)
File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 74, in get_raw_stats_for_service_type
  f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 116] Stale file handle:

'/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata'

Andrew/Jiri,
Would it be possible to post gluster logs of both the
mount and bricks on the bz? I can take a look at it once. If I
gather nothing then probably I will ask for your help in
re-creating the issue.

Pranith


​Unfortunately, I don't have the logs for that setup any more.. ​I'll
try replicate when I get a chance. If I understand the comment from
the BZ, I don't think it's a gluster bug per-say, more just how
gluster does its replication.

hi Andrew,
  Thanks for that. I couldn't come to any conclusions because no
logs were available. It is unlikely that self-heal is involved because
there were no bricks going down/up according to the bug description.



Hi,
I've never had such setup, I guessed problem with gluster based on 
"OSError: [Errno 116] Stale file handle:" which happens when the file 
opened by application on client gets removed on the server. I'm pretty 
sure we