Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-22 Thread Jason Brooks


> > > RE your bug, what do you use for a mount point for the nfs storage?
> >
> > In the log you attached to your bug, it looks like you're using localhost
> as
> > the nfs mount point. I use a dns name that resolves to the virtual IP
> hosted
> > by ctdb. So, you're only ever talking to one nfs server at a time, and
> failover
> > between the nfs hosts is handled by ctdb.
> 
> I also tried your setup, but hit other complications. I used localhost
> ​in an old setup, ​
> previously as I was under the assumption when accessing anything gluster
> related,
> ​ the connection point only provides the volume info and you connect to any
> server in the volume group.​

As I understand it, with Gluster's nfs, the server you mount is the
only one you're accessing directly, which is why you need to use something 
else to distribute the load, like round robin dns, with gluster's nfs.

> 
> >
> > Anyway, like I said, my main testing rig is now using this configuration,
> > help me try and break it. :)
> 
> rm -rf /
> 
> Jokes aside, are you able to reboot a server without losing the VM ?
> ​ My experience with ctdb (based on your blog) was even with the
> "floating/virtual IP" it wasn't fast enough, or something in the gluster
> layer delayed the failover. Either way, the VM goes into paused state and
> can't be resumed.​

I have rebooted my hosts without issue. If I want to reboot the host
that's serving the nfs storage, I've stopped ctdb first on that host, 
to make it hand off the nfs -- I've done this out of caution, but I
should try just pulling the plug, too.

The main source of VM pausing I've seen is when you have two nodes, one
goes down, and the gluster quorum business goes into effect. With my 
current 3 node, replica 3 setup, gluster stays happy wrt quorum.

I'll be sure to post about it if I have problems, but it's been working
well for me.

Jason


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-22 Thread Andrew Lau
On 23/07/2014 1:45 am, "Jason Brooks"  wrote:
>
>
>
> - Original Message -
> > From: "Jason Brooks" 
> > To: "Andrew Lau" 
> > Cc: "users" 
> > Sent: Tuesday, July 22, 2014 8:29:46 AM
> > Subject: Re: [ovirt-users] Can we debug some truths/myths/facts
about   hosted-engine and gluster?
> >
> >
> >
> > - Original Message -
> > > From: "Andrew Lau" 
> > > To: "users" 
> > > Sent: Friday, July 18, 2014 4:50:31 AM
> > > Subject: [ovirt-users] Can we debug some truths/myths/facts about
> > > hosted-engine and gluster?
> > >
> > > Hi all,
> > >
> > > As most of you have got hints from previous messages, hosted engine
won't
> > > work on gluster . A quote from BZ1097639
> > >
> > > "Using hosted engine with Gluster backed storage is currently
something we
> > > really warn against.
> >
> > My current setup is hosted engine, configured w/ gluster storage as
described
> > in my
> > blog post, but with three hosts and replica 3 volumes.
> >
> > Only issue I've seen is an errant message about the Hosted Engine being
down
> > following an engine migration. The engine does migrate successfully,
though.
​​That was fixed in 3.4.3 I believe, although when it happened to me my
engine didn't migrate it ju​st sat there.


> >
> > RE your bug, what do you use for a mount point for the nfs storage?
>
> In the log you attached to your bug, it looks like you're using localhost
as
> the nfs mount point. I use a dns name that resolves to the virtual IP
hosted
> by ctdb. So, you're only ever talking to one nfs server at a time, and
failover
> between the nfs hosts is handled by ctdb.

I also tried your setup, but hit other complications. I used localhost
​in an old setup, ​
previously as I was under the assumption when accessing anything gluster
related,
​ the connection point only provides the volume info and you connect to any
server in the volume group.​

>
> Anyway, like I said, my main testing rig is now using this configuration,
> help me try and break it. :)

rm -rf /

Jokes aside, are you able to reboot a server without losing the VM ?
​ My experience with ctdb (based on your blog) was even with the
"floating/virtual IP" it wasn't fast enough, or something in the gluster
layer delayed the failover. Either way, the VM goes into paused state and
can't be resumed.​

>
> >
> > Jason
> >
> >
> > >
> > >
> > > I think this bug should be closed or re-targeted at documentation,
> > > because there is nothing we can do here. Hosted engine assumes that
> > > all writes are atomic and (immediately) available for all hosts in the
> > > cluster. Gluster violates those assumptions.
> > >
> > > ​"
> > >
> > > ​Until the documentation gets updated, I hope this serves as a useful
> > > notice at least to save people some of the headaches I hit like
> > > hosted-engine starting up multiple VMs because of above issue.
> > > ​
> > >
> > > Now my question, does this theory prevent a scenario of perhaps
something
> > > like a gluster replicated volume being mounted as a glusterfs
filesystem
> > > and then re-exported as the native kernel NFS share for the
hosted-engine
> > > to consume? It could then be possible to chuck ctdb in there to
provide a
> > > last resort failover solution. I have tried myself and suggested it
to two
> > > people who are running a similar setup. Now using the native kernel
NFS
> > > server for hosted-engine and they haven't reported as many issues.
Curious,
> > > could anyone validate my theory on this?
> > >
> > > Thanks,
> > > Andrew
> > >
> > > ___
> > > Users mailing list
> > > Users@ovirt.org
> > > http://lists.ovirt.org/mailman/listinfo/users
> > >
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-22 Thread Jason Brooks


- Original Message -
> From: "Jason Brooks" 
> To: "Andrew Lau" 
> Cc: "users" 
> Sent: Tuesday, July 22, 2014 8:29:46 AM
> Subject: Re: [ovirt-users] Can we debug some truths/myths/facts   about   
> hosted-engine and gluster?
> 
> 
> 
> - Original Message -
> > From: "Andrew Lau" 
> > To: "users" 
> > Sent: Friday, July 18, 2014 4:50:31 AM
> > Subject: [ovirt-users] Can we debug some truths/myths/facts about
> > hosted-engine and gluster?
> > 
> > Hi all,
> > 
> > As most of you have got hints from previous messages, hosted engine won't
> > work on gluster . A quote from BZ1097639
> > 
> > "Using hosted engine with Gluster backed storage is currently something we
> > really warn against.
> 
> My current setup is hosted engine, configured w/ gluster storage as described
> in my
> blog post, but with three hosts and replica 3 volumes.
> 
> Only issue I've seen is an errant message about the Hosted Engine being down
> following an engine migration. The engine does migrate successfully, though.
> 
> RE your bug, what do you use for a mount point for the nfs storage?

In the log you attached to your bug, it looks like you're using localhost as
the nfs mount point. I use a dns name that resolves to the virtual IP hosted
by ctdb. So, you're only ever talking to one nfs server at a time, and failover
between the nfs hosts is handled by ctdb.

Anyway, like I said, my main testing rig is now using this configuration, 
help me try and break it. :)

> 
> Jason
> 
> 
> > 
> > 
> > I think this bug should be closed or re-targeted at documentation,
> > because there is nothing we can do here. Hosted engine assumes that
> > all writes are atomic and (immediately) available for all hosts in the
> > cluster. Gluster violates those assumptions.
> > 
> > ​"
> > 
> > ​Until the documentation gets updated, I hope this serves as a useful
> > notice at least to save people some of the headaches I hit like
> > hosted-engine starting up multiple VMs because of above issue.
> > ​
> > 
> > Now my question, does this theory prevent a scenario of perhaps something
> > like a gluster replicated volume being mounted as a glusterfs filesystem
> > and then re-exported as the native kernel NFS share for the hosted-engine
> > to consume? It could then be possible to chuck ctdb in there to provide a
> > last resort failover solution. I have tried myself and suggested it to two
> > people who are running a similar setup. Now using the native kernel NFS
> > server for hosted-engine and they haven't reported as many issues. Curious,
> > could anyone validate my theory on this?
> > 
> > Thanks,
> > Andrew
> > 
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> > 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-22 Thread Jason Brooks


- Original Message -
> From: "Andrew Lau" 
> To: "users" 
> Sent: Friday, July 18, 2014 4:50:31 AM
> Subject: [ovirt-users] Can we debug some truths/myths/facts about 
> hosted-engine and gluster?
> 
> Hi all,
> 
> As most of you have got hints from previous messages, hosted engine won't
> work on gluster . A quote from BZ1097639
> 
> "Using hosted engine with Gluster backed storage is currently something we
> really warn against.

My current setup is hosted engine, configured w/ gluster storage as described 
in my 
blog post, but with three hosts and replica 3 volumes. 

Only issue I've seen is an errant message about the Hosted Engine being down 
following an engine migration. The engine does migrate successfully, though.

RE your bug, what do you use for a mount point for the nfs storage?

Jason


> 
> 
> I think this bug should be closed or re-targeted at documentation,
> because there is nothing we can do here. Hosted engine assumes that
> all writes are atomic and (immediately) available for all hosts in the
> cluster. Gluster violates those assumptions.
> 
> ​"
> 
> ​Until the documentation gets updated, I hope this serves as a useful
> notice at least to save people some of the headaches I hit like
> hosted-engine starting up multiple VMs because of above issue.
> ​
> 
> Now my question, does this theory prevent a scenario of perhaps something
> like a gluster replicated volume being mounted as a glusterfs filesystem
> and then re-exported as the native kernel NFS share for the hosted-engine
> to consume? It could then be possible to chuck ctdb in there to provide a
> last resort failover solution. I have tried myself and suggested it to two
> people who are running a similar setup. Now using the native kernel NFS
> server for hosted-engine and they haven't reported as many issues. Curious,
> could anyone validate my theory on this?
> 
> Thanks,
> Andrew
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-18 Thread Andrew Lau
​​

On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur  wrote:

> [Adding gluster-devel]
>
>
> On 07/18/2014 05:20 PM, Andrew Lau wrote:
>
>> Hi all,
>>
>> As most of you have got hints from previous messages, hosted engine
>> won't work on gluster . A quote from BZ1097639
>>
>> "Using hosted engine with Gluster backed storage is currently something
>> we really warn against.
>>
>>
>> I think this bug should be closed or re-targeted at documentation,
>> because there is nothing we can do here. Hosted engine assumes that all
>> writes are atomic and (immediately) available for all hosts in the cluster.
>> Gluster violates those assumptions.
>> ​"
>>
> I tried going through BZ1097639 but could not find much detail with
> respect to gluster there.
>
> A few questions around the problem:
>
> 1. Can somebody please explain in detail the scenario that causes the
> problem?
>
> 2. Is hosted engine performing synchronous writes to ensure that writes
> are durable?
>
> Also, if there is any documentation that details the hosted engine
> architecture that would help in enhancing our understanding of its
> interactions with gluster.
>
>
>  ​
>>
>> Now my question, does this theory prevent a scenario of perhaps
>> something like a gluster replicated volume being mounted as a glusterfs
>> filesystem and then re-exported as the native kernel NFS share for the
>> hosted-engine to consume? It could then be possible to chuck ctdb in
>> there to provide a last resort failover solution. I have tried myself
>> and suggested it to two people who are running a similar setup. Now
>> using the native kernel NFS server for hosted-engine and they haven't
>> reported as many issues. Curious, could anyone validate my theory on this?
>>
>>
> If we obtain more details on the use case and obtain gluster logs from the
> failed scenarios, we should be able to understand the problem better. That
> could be the first step in validating your theory or evolving further
> recommendations :).
>
>
​I'm not sure how useful this is, but ​Jiri Moskovcak tracked this down in
an off list message.

​Message Quote:​

​==​

​We were able to track it down to this (thanks Andrew for providing the
testing setup):

-b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 165, in handle
response = "success " + self._dispatch(data)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 261, in _dispatch
.get_all_stats_for_service_type(**options)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 41, in get_all_stats_for_service_type
d = self.get_raw_stats_for_service_type(storage_dir, service_type)
  File 
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 74, in get_raw_stats_for_service_type
f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localho
st:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-
engine.metadata'

It's definitely connected to the storage which leads us to the gluster, I'm
not very familiar with the gluster so I need to check this with our gluster
 gurus.​

​==​



> Thanks,
> Vijay
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-18 Thread Vijay Bellur

[Adding gluster-devel]

On 07/18/2014 05:20 PM, Andrew Lau wrote:

Hi all,

As most of you have got hints from previous messages, hosted engine
won't work on gluster . A quote from BZ1097639

"Using hosted engine with Gluster backed storage is currently something
we really warn against.


I think this bug should be closed or re-targeted at documentation, because 
there is nothing we can do here. Hosted engine assumes that all writes are 
atomic and (immediately) available for all hosts in the cluster. Gluster 
violates those assumptions.
​"
I tried going through BZ1097639 but could not find much detail with 
respect to gluster there.


A few questions around the problem:

1. Can somebody please explain in detail the scenario that causes the 
problem?


2. Is hosted engine performing synchronous writes to ensure that writes 
are durable?


Also, if there is any documentation that details the hosted engine 
architecture that would help in enhancing our understanding of its 
interactions with gluster.



​

Now my question, does this theory prevent a scenario of perhaps
something like a gluster replicated volume being mounted as a glusterfs
filesystem and then re-exported as the native kernel NFS share for the
hosted-engine to consume? It could then be possible to chuck ctdb in
there to provide a last resort failover solution. I have tried myself
and suggested it to two people who are running a similar setup. Now
using the native kernel NFS server for hosted-engine and they haven't
reported as many issues. Curious, could anyone validate my theory on this?



If we obtain more details on the use case and obtain gluster logs from 
the failed scenarios, we should be able to understand the problem 
better. That could be the first step in validating your theory or 
evolving further recommendations :).


Thanks,
Vijay
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-18 Thread Andrew Lau
Hi all,

As most of you have got hints from previous messages, hosted engine won't
work on gluster . A quote from BZ1097639

"Using hosted engine with Gluster backed storage is currently something we
really warn against.


I think this bug should be closed or re-targeted at documentation,
because there is nothing we can do here. Hosted engine assumes that
all writes are atomic and (immediately) available for all hosts in the
cluster. Gluster violates those assumptions.

​"

​Until the documentation gets updated, I hope this serves as a useful
notice at least to save people some of the headaches I hit like
hosted-engine starting up multiple VMs because of above issue.
​

Now my question, does this theory prevent a scenario of perhaps something
like a gluster replicated volume being mounted as a glusterfs filesystem
and then re-exported as the native kernel NFS share for the hosted-engine
to consume? It could then be possible to chuck ctdb in there to provide a
last resort failover solution. I have tried myself and suggested it to two
people who are running a similar setup. Now using the native kernel NFS
server for hosted-engine and they haven't reported as many issues. Curious,
could anyone validate my theory on this?

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users