Re: [ovirt-users] hyperconverged question

2017-09-12 Thread Charles Kozler
So also on my engine storage domain. Shouldnt we see the mount options in
mount -l output? It appears fault tolerance worked (sort of - see more
below) during my test

[root@appovirtp01 ~]# grep -i mnt_options
/etc/ovirt-hosted-engine/hosted-engine.conf
mnt_options=backup-volfile-servers=n2:n3

[root@appovirtp02 ~]# grep -i mnt_options
/etc/ovirt-hosted-engine/hosted-engine.conf
mnt_options=backup-volfile-servers=n2:n3

[root@appovirtp03 ~]# grep -i mnt_options
/etc/ovirt-hosted-engine/hosted-engine.conf
mnt_options=backup-volfile-servers=n2:n3

Meanwhile not visible in mount -l output:

[root@appovirtp01 ~]# mount -l | grep -i n1:/engine
n1:/engine on /rhev/data-center/mnt/glusterSD/n1:_engine type
fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

[root@appovirtp02 ~]# mount -l | grep -i n1:/engine
n1:/engine on /rhev/data-center/mnt/glusterSD/n1:_engine type
fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

[root@appovirtp03 ~]# mount -l | grep -i n1:/engine
n1:/engine on /rhev/data-center/mnt/glusterSD/n1:_engine type
fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

So since everything is "pointed" at node 1 for engine storage, I decided to
hard shut down node 1 while hosted engine VM runs on node 3

The result was that after ~30 seconds the engine crashed likely because of
the gluster 42 second timeout. The hosted engine VM came back up (with node
1 still down) after about 5-7 minutes

Is this expected for the VM to go down? I thought gluster fuse mounted all
bricks in the volume
http://lists.gluster.org/pipermail/gluster-users/2015-May/021989.html so I
would imagine this to be more seamless?




On Tue, Sep 12, 2017 at 7:04 PM, Charles Kozler 
wrote:

> Hey All -
>
> So I havent tested this yet but what I do know is that I did setup
> backupvol option when I added the data gluster volume, however, mount
> options on mount -l do not show it as being used
>
> n1:/data on /rhev/data-center/mnt/glusterSD/n1:_data type fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,default_permissions,
> allow_other,max_read=131072)
>
> I will delete it and re-add it, but I think this might be part of the
> problem. Perhaps me and Jim have the same issue because oVirt is actually
> not passing the additional mount options from the web UI to the backend to
> mount with said parameters?
>
> Thoughts?
>
> On Mon, Sep 4, 2017 at 10:51 AM, FERNANDO FREDIANI <
> fernando.fredi...@upx.com> wrote:
>
>> I had the very same impression. It doesn't look like that it works then.
>> So for a fully redundant where you can loose a complete host you must have
>> at least 3 nodes then ?
>>
>> Fernando
>>
>> On 01/09/2017 12:53, Jim Kusznir wrote:
>>
>> Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
>> misinformed when I created this setup.  I thought the arbitrator held
>> enough metadata that it could validate or refudiate  any one replica (kinda
>> like the parity drive for a RAID-4 array).  I was also under the impression
>> that one replica  + Arbitrator is enough to keep the array online and
>> functional.
>>
>> --Jim
>>
>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler 
>> wrote:
>>
>>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>>> stores metadata, no actual files. So yes, you were running in degraded mode
>>> so some operations were hindered.
>>>
>>> @ Sahina - Yes, this actually worked fine for me once I did that.
>>> However, the issue I am still facing, is when I go to create a new gluster
>>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>>> I select that host. If I fail that host, all VMs halt. I do not recall this
>>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>>> to a volume and vice versa like you could, for instance, for a singular
>>> hyperconverged to ex: export a local disk via NFS and then mount it via
>>> ovirt domain. But of course, this has its caveats. To that end, I am using
>>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>>> in the connection details I give it node1:/data. I fail node1, all VMs
>>> halt. Did I miss something?
>>>
>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>>
 To the OP question, when you set up a gluster storage domain, you need
 to specify backup-volfile-servers=: where server2
 and server3 also have bricks running. When server1 is down, and the volume
 is mounted again - server2 or server3 are queried to get the gluster
 volfiles.

 @Jim, if this does not work, are you using 4.1.5 build with libgfapi
 access? If not, please provide the vdsm and gluster mount logs to analyse

 If VMs go to paused state - this could mean the storage is not
 available. You can check "gluster volume 

Re: [ovirt-users] hyperconverged question

2017-09-12 Thread Charles Kozler
Hey All -

So I havent tested this yet but what I do know is that I did setup
backupvol option when I added the data gluster volume, however, mount
options on mount -l do not show it as being used

n1:/data on /rhev/data-center/mnt/glusterSD/n1:_data type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

I will delete it and re-add it, but I think this might be part of the
problem. Perhaps me and Jim have the same issue because oVirt is actually
not passing the additional mount options from the web UI to the backend to
mount with said parameters?

Thoughts?

On Mon, Sep 4, 2017 at 10:51 AM, FERNANDO FREDIANI <
fernando.fredi...@upx.com> wrote:

> I had the very same impression. It doesn't look like that it works then.
> So for a fully redundant where you can loose a complete host you must have
> at least 3 nodes then ?
>
> Fernando
>
> On 01/09/2017 12:53, Jim Kusznir wrote:
>
> Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
> misinformed when I created this setup.  I thought the arbitrator held
> enough metadata that it could validate or refudiate  any one replica (kinda
> like the parity drive for a RAID-4 array).  I was also under the impression
> that one replica  + Arbitrator is enough to keep the array online and
> functional.
>
> --Jim
>
> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler 
> wrote:
>
>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>> stores metadata, no actual files. So yes, you were running in degraded mode
>> so some operations were hindered.
>>
>> @ Sahina - Yes, this actually worked fine for me once I did that.
>> However, the issue I am still facing, is when I go to create a new gluster
>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>> I select that host. If I fail that host, all VMs halt. I do not recall this
>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>> to a volume and vice versa like you could, for instance, for a singular
>> hyperconverged to ex: export a local disk via NFS and then mount it via
>> ovirt domain. But of course, this has its caveats. To that end, I am using
>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>> in the connection details I give it node1:/data. I fail node1, all VMs
>> halt. Did I miss something?
>>
>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>
>>> To the OP question, when you set up a gluster storage domain, you need
>>> to specify backup-volfile-servers=: where server2 and
>>> server3 also have bricks running. When server1 is down, and the volume is
>>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>>
>>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>>
>>> If VMs go to paused state - this could mean the storage is not
>>> available. You can check "gluster volume status " to see if
>>> atleast 2 bricks are running.
>>>
>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
>>> wrote:
>>>
 If gluster drops in quorum so that it has less votes than it should it
 will stop file operations until quorum is back to normal.If i rember it
 right you need two bricks to write for quorum to be met and that the
 arbiter only is a vote to avoid split brain.


 Basically what you have is a raid5 solution without a spare. And when
 one disk dies it will run in degraded mode. And some raid systems will stop
 the raid until you have removed the disk or forced it to run anyway.

 You can read up on it here: https://gluster.readthed
 ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

 /Johan

 On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

 Hi all:

 Sorry to hijack the thread, but I was about to start essentially the
 same thread.

 I have a 3 node cluster, all three are hosts and gluster nodes (replica
 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:

 storage=192.168.8.11:/engine
 mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

 I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
 paused, including the engine (all VMs were running on host2:192.168.8.12).
 I couldn't get any gluster stuff working until host1 (192.168.8.11) was
 restored.

 What's wrong / what did I miss?

 (this was set up "manually" through the article on setting up
 self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
 since).

 Thanks!
 --Jim


 On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
 wrote:

 Typo..."Set it up and then failed that **HOST**"

 And upon that host going down, the 

Re: [ovirt-users] hyperconverged question

2017-09-04 Thread FERNANDO FREDIANI
I had the very same impression. It doesn't look like that it works then. 
So for a fully redundant where you can loose a complete host you must 
have at least 3 nodes then ?


Fernando


On 01/09/2017 12:53, Jim Kusznir wrote:
Huh...Ok., how do I convert the arbitrar to full replica, then?  I was 
misinformed when I created this setup.  I thought the arbitrator held 
enough metadata that it could validate or refudiate  any one replica 
(kinda like the parity drive for a RAID-4 array).  I was also under 
the impression that one replica  + Arbitrator is enough to keep the 
array online and functional.


--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler > wrote:


@ Jim - you have only two data volumes and lost quorum. Arbitrator
only stores metadata, no actual files. So yes, you were running in
degraded mode so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that.
However, the issue I am still facing, is when I go to create a new
gluster storage domain (replica 3, hyperconverged) and I tell it
"Host to use" and I select that host. If I fail that host, all VMs
halt. I do not recall this in 3.6 or early 4.0. This to me makes
it seem like this is "pinning" a node to a volume and vice versa
like you could, for instance, for a singular hyperconverged to ex:
export a local disk via NFS and then mount it via ovirt domain.
But of course, this has its caveats. To that end, I am using
gluster replica 3, when configuring it I say "host to use: " node
1, then in the connection details I give it node1:/data. I fail
node1, all VMs halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose > wrote:

To the OP question, when you set up a gluster storage domain,
you need to specify backup-volfile-servers=:
where server2 and server3 also have bricks running. When
server1 is down, and the volume is mounted again - server2 or
server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with
libgfapi access? If not, please provide the vdsm and gluster
mount logs to analyse

If VMs go to paused state - this could mean the storage is not
available. You can check "gluster volume status " to
see if atleast 2 bricks are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson
> wrote:

If gluster drops in quorum so that it has less votes than
it should it will stop file operations until quorum is
back to normal.If i rember it right you need two bricks to
write for quorum to be met and that the arbiter only is a
vote to avoid split brain.


Basically what you have is a raid5 solution without a
spare. And when one disk dies it will run in degraded
mode. And some raid systems will stop the raid until you
have removed the disk or forced it to run anyway.

You can read up on it here:

https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/



/Johan

On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

Hi all:

Sorry to hijack the thread, but I was about to start
essentially the same thread.

I have a 3 node cluster, all three are hosts and gluster
nodes (replica 2 + arbitrar).  I DO have the
mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine
mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down.  ALL
VMs immediately paused, including the engine (all VMs
were running on host2:192.168.8.12).  I couldn't get any
gluster stuff working until host1 (192.168.8.11) was
restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on
setting up self-hosted gluster cluster back when 4.0 was
new..I've upgraded it to 4.1 since).

Thanks!
--Jim


On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler
> wrote:

Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went
down. I only have hosted storage domain and this new one
- is this why the DC went down and no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

   

Re: [ovirt-users] hyperconverged question

2017-09-04 Thread Kasturi Narra
Hi charles,

 The right option is backup-volfile-servers and not
'backupvolfile-server'.
So can you please use the first one and test ?

Thanks
kasturi

On Sat, Sep 2, 2017 at 5:23 AM, Charles Kozler  wrote:

> Jim -
>
> result of this test...engine crashed but all VM's on the gluster domain
> (backed by the same physical nodes/hardware/gluster process/etc) stayed up
> fine
>
> I guess there is some functional difference between 'backupvolfile-server'
> and 'backup-volfile-servers'?
>
> Perhaps try latter and see what happens. My next test is going to be to
> configure hosted-engine.conf with backupvolfile-server=node2:node3 and
> see if engine VM still shuts down. Seems odd engine VM would shut itself
> down (or vdsm would shut it down) but not other VMs. Perhaps built in HA
> functionality of sorts
>
> On Fri, Sep 1, 2017 at 7:38 PM, Charles Kozler 
> wrote:
>
>> Jim -
>>
>> One thing I noticed is that, by accident, I used
>> 'backupvolfile-server=node2:node3' which is apparently a supported
>> setting. It would appear, by reading the man page of mount.glusterfs, the
>> syntax is slightly different. not sure if my setting being different has
>> different impacts
>>
>> hosted-engine.conf:
>>
>> # cat /etc/ovirt-hosted-engine/hosted-engine.conf | grep -i option
>> mnt_options=backup-volfile-servers=node2:node3
>>
>> And for my datatest gluster domain I have:
>>
>> backupvolfile-server=node2:node3
>>
>> I am now curious what happens when I move everything to node1 and drop
>> node2
>>
>> To that end, will follow up with that test
>>
>>
>>
>>
>> On Fri, Sep 1, 2017 at 7:20 PM, Charles Kozler 
>> wrote:
>>
>>> Jim -
>>>
>>> here is my test:
>>>
>>> - All VM's on node2: hosted engine and 1 test VM
>>> - Test VM on gluster storage domain (with mount options set)
>>> - hosted engine is on gluster as well, with settings persisted to
>>> hosted-engine.conf for backupvol
>>>
>>> All VM's stayed up. Nothing in dmesg of the test vm indicating a pause
>>> or an issue or anything
>>>
>>> However, what I did notice during this, is my /datatest volume doesnt
>>> have quorum set. So I will set that now and report back what happens
>>>
>>> # gluster volume info datatest
>>>
>>> Volume Name: datatest
>>> Type: Replicate
>>> Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: node1:/gluster/data/datatest/brick1
>>> Brick2: node2:/gluster/data/datatest/brick1
>>> Brick3: node3:/gluster/data/datatest/brick1
>>> Options Reconfigured:
>>> transport.address-family: inet
>>> nfs.disable: on
>>>
>>> Perhaps quorum may be more trouble than its worth when you have 3 nodes
>>> and/or 2 nodes + arbiter?
>>>
>>> Since I am keeping my 3rd node out of ovirt, I am more content on
>>> keeping it as a warm spare if I **had** to swap it in to ovirt cluster, but
>>> keeps my storage 100% quorum
>>>
>>> On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir  wrote:
>>>
 I can confirm that I did set it up manually, and I did specify
 backupvol, and in the "manage domain" storage settings, I do have under
 mount options, backup-volfile-servers=192.168.8.12:192.168.8.13  (and
 this was done at initial install time).

 The "used managed gluster" checkbox is NOT checked, and if I check it
 and save settings, next time I go in it is not checked.

 --Jim

 On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler 
 wrote:

> @ Jim - here is my setup which I will test in a few (brand new
> cluster) and report back what I found in my tests
>
> - 3x servers direct connected via 10Gb
> - 2 of those 3 setup in ovirt as hosts
> - Hosted engine
> - Gluster replica 3 (no arbiter) for all volumes
> - 1x engine volume gluster replica 3 manually configured (not using
> ovirt managed gluster)
> - 1x datatest volume (20gb) replica 3 manually configured (not using
> ovirt managed gluster)
> - 1x nfstest domain served from some other server in my infrastructure
> which, at the time of my original testing, was master domain
>
> I tested this earlier and all VMs stayed online. However, ovirt
> cluster reported DC/cluster down, all VM's stayed up
>
> As I am now typing this, can you confirm you setup your gluster
> storage domain with backupvol? Also, confirm you updated 
> hosted-engine.conf
> with backupvol mount option as well?
>
> On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir 
> wrote:
>
>> So, after reading the first document twice and the 2nd link
>> thoroughly once, I believe that the arbitrator volume should be 
>> sufficient
>> and count for replica / split brain.  EG, if any one full replica is 
>> down,
>> and the arbitrator and the other replica is up, 

Re: [ovirt-users] hyperconverged question

2017-09-04 Thread Kasturi Narra
Hi Jim,

  I looked at the gluster volume info and that looks to be fine for me.
Recommended config is arbiter for data and vmstore and for engine it should
be replica 3 since we would want HE to be available always.

 If i understand right the problem you are facing is when you shut down one
of the node all the HE vms and app vms goes to paused state right ?

 For debugging further and to ensure that volume has been mounted
using  backup-volfile-servers option, you can move the storage domain to
maintenance which will umount the volume , activate back which will mount
it again. During this time you can check the mount command passed in vdsm
logs and that should have the backup-volfile-servers option.

Can you please confirm if you have ovirt-guest-agent installed on the
app vms and power management enabled ? ovirt-guest-agent is required on the
app vms to ensure HA functionality

Thanks
kasturi

On Sat, Sep 2, 2017 at 2:48 AM, Jim Kusznir  wrote:

> I can confirm that I did set it up manually, and I did specify backupvol,
> and in the "manage domain" storage settings, I do have under mount
> options, backup-volfile-servers=192.168.8.12:192.168.8.13  (and this was
> done at initial install time).
>
> The "used managed gluster" checkbox is NOT checked, and if I check it and
> save settings, next time I go in it is not checked.
>
> --Jim
>
> On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler 
> wrote:
>
>> @ Jim - here is my setup which I will test in a few (brand new cluster)
>> and report back what I found in my tests
>>
>> - 3x servers direct connected via 10Gb
>> - 2 of those 3 setup in ovirt as hosts
>> - Hosted engine
>> - Gluster replica 3 (no arbiter) for all volumes
>> - 1x engine volume gluster replica 3 manually configured (not using ovirt
>> managed gluster)
>> - 1x datatest volume (20gb) replica 3 manually configured (not using
>> ovirt managed gluster)
>> - 1x nfstest domain served from some other server in my infrastructure
>> which, at the time of my original testing, was master domain
>>
>> I tested this earlier and all VMs stayed online. However, ovirt cluster
>> reported DC/cluster down, all VM's stayed up
>>
>> As I am now typing this, can you confirm you setup your gluster storage
>> domain with backupvol? Also, confirm you updated hosted-engine.conf with
>> backupvol mount option as well?
>>
>> On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir  wrote:
>>
>>> So, after reading the first document twice and the 2nd link thoroughly
>>> once, I believe that the arbitrator volume should be sufficient and count
>>> for replica / split brain.  EG, if any one full replica is down, and the
>>> arbitrator and the other replica is up, then it should have quorum and all
>>> should be good.
>>>
>>> I think my underlying problem has to do more with config than the
>>> replica state.  That said, I did size the drive on my 3rd node planning to
>>> have an identical copy of all data on it, so I'm still not opposed to
>>> making it a full replica.
>>>
>>> Did I miss something here?
>>>
>>> Thanks!
>>>
>>> On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
>>> wrote:
>>>
 These can get a little confusing but this explains it best:
 https://gluster.readthedocs.io/en/latest/Administrator
 %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes

 Basically in the first paragraph they are explaining why you cant have
 HA with quorum for 2 nodes. Here is another overview doc that explains some
 more

 http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

 From my understanding arbiter is good for resolving split brains.
 Quorum and arbiter are two different things though quorum is a mechanism to
 help you **avoid** split brain and the arbiter is to help gluster resolve
 split brain by voting and other internal mechanics (as outlined in link 1).
 How did you create the volume exactly - what command? It looks to me like
 you created it with 'gluster volume create replica 2 arbiter 1 {}' per
 your earlier mention of "replica 2 arbiter 1". That being said, if you did
 that and then setup quorum in the volume configuration, this would cause
 your gluster to halt up since quorum was lost (as you saw until you
 recovered node 1)

 As you can see from the docs, there is still a corner case for getting
 in to split brain with replica 3, which again, is where arbiter would help
 gluster resolve it

 I need to amend my previous statement: I was told that arbiter volume
 does not store data, only metadata. I cannot find anything in the docs
 backing this up however it would make sense for it to be. That being said,
 in my setup, I would not include my arbiter or my third node in my ovirt VM
 cluster component. I would keep it completely separate


 On Fri, Sep 1, 2017 at 2:46 PM, 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
Jim -

result of this test...engine crashed but all VM's on the gluster domain
(backed by the same physical nodes/hardware/gluster process/etc) stayed up
fine

I guess there is some functional difference between 'backupvolfile-server'
and 'backup-volfile-servers'?

Perhaps try latter and see what happens. My next test is going to be to
configure hosted-engine.conf with backupvolfile-server=node2:node3 and see
if engine VM still shuts down. Seems odd engine VM would shut itself down
(or vdsm would shut it down) but not other VMs. Perhaps built in HA
functionality of sorts

On Fri, Sep 1, 2017 at 7:38 PM, Charles Kozler  wrote:

> Jim -
>
> One thing I noticed is that, by accident, I used
> 'backupvolfile-server=node2:node3' which is apparently a supported
> setting. It would appear, by reading the man page of mount.glusterfs, the
> syntax is slightly different. not sure if my setting being different has
> different impacts
>
> hosted-engine.conf:
>
> # cat /etc/ovirt-hosted-engine/hosted-engine.conf | grep -i option
> mnt_options=backup-volfile-servers=node2:node3
>
> And for my datatest gluster domain I have:
>
> backupvolfile-server=node2:node3
>
> I am now curious what happens when I move everything to node1 and drop
> node2
>
> To that end, will follow up with that test
>
>
>
>
> On Fri, Sep 1, 2017 at 7:20 PM, Charles Kozler 
> wrote:
>
>> Jim -
>>
>> here is my test:
>>
>> - All VM's on node2: hosted engine and 1 test VM
>> - Test VM on gluster storage domain (with mount options set)
>> - hosted engine is on gluster as well, with settings persisted to
>> hosted-engine.conf for backupvol
>>
>> All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or
>> an issue or anything
>>
>> However, what I did notice during this, is my /datatest volume doesnt
>> have quorum set. So I will set that now and report back what happens
>>
>> # gluster volume info datatest
>>
>> Volume Name: datatest
>> Type: Replicate
>> Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: node1:/gluster/data/datatest/brick1
>> Brick2: node2:/gluster/data/datatest/brick1
>> Brick3: node3:/gluster/data/datatest/brick1
>> Options Reconfigured:
>> transport.address-family: inet
>> nfs.disable: on
>>
>> Perhaps quorum may be more trouble than its worth when you have 3 nodes
>> and/or 2 nodes + arbiter?
>>
>> Since I am keeping my 3rd node out of ovirt, I am more content on keeping
>> it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps
>> my storage 100% quorum
>>
>> On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir  wrote:
>>
>>> I can confirm that I did set it up manually, and I did specify
>>> backupvol, and in the "manage domain" storage settings, I do have under
>>> mount options, backup-volfile-servers=192.168.8.12:192.168.8.13  (and
>>> this was done at initial install time).
>>>
>>> The "used managed gluster" checkbox is NOT checked, and if I check it
>>> and save settings, next time I go in it is not checked.
>>>
>>> --Jim
>>>
>>> On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler 
>>> wrote:
>>>
 @ Jim - here is my setup which I will test in a few (brand new cluster)
 and report back what I found in my tests

 - 3x servers direct connected via 10Gb
 - 2 of those 3 setup in ovirt as hosts
 - Hosted engine
 - Gluster replica 3 (no arbiter) for all volumes
 - 1x engine volume gluster replica 3 manually configured (not using
 ovirt managed gluster)
 - 1x datatest volume (20gb) replica 3 manually configured (not using
 ovirt managed gluster)
 - 1x nfstest domain served from some other server in my infrastructure
 which, at the time of my original testing, was master domain

 I tested this earlier and all VMs stayed online. However, ovirt cluster
 reported DC/cluster down, all VM's stayed up

 As I am now typing this, can you confirm you setup your gluster storage
 domain with backupvol? Also, confirm you updated hosted-engine.conf with
 backupvol mount option as well?

 On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir 
 wrote:

> So, after reading the first document twice and the 2nd link thoroughly
> once, I believe that the arbitrator volume should be sufficient and count
> for replica / split brain.  EG, if any one full replica is down, and the
> arbitrator and the other replica is up, then it should have quorum and all
> should be good.
>
> I think my underlying problem has to do more with config than the
> replica state.  That said, I did size the drive on my 3rd node planning to
> have an identical copy of all data on it, so I'm still not opposed to
> making it a full replica.
>
> Did I miss something here?
>
> Thanks!
>
> On Fri, 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
Jim -

One thing I noticed is that, by accident, I used
'backupvolfile-server=node2:node3' which is apparently a supported setting.
It would appear, by reading the man page of mount.glusterfs, the syntax is
slightly different. not sure if my setting being different has different
impacts

hosted-engine.conf:

# cat /etc/ovirt-hosted-engine/hosted-engine.conf | grep -i option
mnt_options=backup-volfile-servers=node2:node3

And for my datatest gluster domain I have:

backupvolfile-server=node2:node3

I am now curious what happens when I move everything to node1 and drop node2

To that end, will follow up with that test




On Fri, Sep 1, 2017 at 7:20 PM, Charles Kozler  wrote:

> Jim -
>
> here is my test:
>
> - All VM's on node2: hosted engine and 1 test VM
> - Test VM on gluster storage domain (with mount options set)
> - hosted engine is on gluster as well, with settings persisted to
> hosted-engine.conf for backupvol
>
> All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or
> an issue or anything
>
> However, what I did notice during this, is my /datatest volume doesnt have
> quorum set. So I will set that now and report back what happens
>
> # gluster volume info datatest
>
> Volume Name: datatest
> Type: Replicate
> Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: node1:/gluster/data/datatest/brick1
> Brick2: node2:/gluster/data/datatest/brick1
> Brick3: node3:/gluster/data/datatest/brick1
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
>
> Perhaps quorum may be more trouble than its worth when you have 3 nodes
> and/or 2 nodes + arbiter?
>
> Since I am keeping my 3rd node out of ovirt, I am more content on keeping
> it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps
> my storage 100% quorum
>
> On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir  wrote:
>
>> I can confirm that I did set it up manually, and I did specify backupvol,
>> and in the "manage domain" storage settings, I do have under mount
>> options, backup-volfile-servers=192.168.8.12:192.168.8.13  (and this was
>> done at initial install time).
>>
>> The "used managed gluster" checkbox is NOT checked, and if I check it and
>> save settings, next time I go in it is not checked.
>>
>> --Jim
>>
>> On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler 
>> wrote:
>>
>>> @ Jim - here is my setup which I will test in a few (brand new cluster)
>>> and report back what I found in my tests
>>>
>>> - 3x servers direct connected via 10Gb
>>> - 2 of those 3 setup in ovirt as hosts
>>> - Hosted engine
>>> - Gluster replica 3 (no arbiter) for all volumes
>>> - 1x engine volume gluster replica 3 manually configured (not using
>>> ovirt managed gluster)
>>> - 1x datatest volume (20gb) replica 3 manually configured (not using
>>> ovirt managed gluster)
>>> - 1x nfstest domain served from some other server in my infrastructure
>>> which, at the time of my original testing, was master domain
>>>
>>> I tested this earlier and all VMs stayed online. However, ovirt cluster
>>> reported DC/cluster down, all VM's stayed up
>>>
>>> As I am now typing this, can you confirm you setup your gluster storage
>>> domain with backupvol? Also, confirm you updated hosted-engine.conf with
>>> backupvol mount option as well?
>>>
>>> On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir  wrote:
>>>
 So, after reading the first document twice and the 2nd link thoroughly
 once, I believe that the arbitrator volume should be sufficient and count
 for replica / split brain.  EG, if any one full replica is down, and the
 arbitrator and the other replica is up, then it should have quorum and all
 should be good.

 I think my underlying problem has to do more with config than the
 replica state.  That said, I did size the drive on my 3rd node planning to
 have an identical copy of all data on it, so I'm still not opposed to
 making it a full replica.

 Did I miss something here?

 Thanks!

 On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
 wrote:

> These can get a little confusing but this explains it best:
> https://gluster.readthedocs.io/en/latest/Administrator
> %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes
>
> Basically in the first paragraph they are explaining why you cant have
> HA with quorum for 2 nodes. Here is another overview doc that explains 
> some
> more
>
> http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>
> From my understanding arbiter is good for resolving split brains.
> Quorum and arbiter are two different things though quorum is a mechanism 
> to
> help you **avoid** split brain and the arbiter is to help gluster resolve
> split brain 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
Jim -

here is my test:

- All VM's on node2: hosted engine and 1 test VM
- Test VM on gluster storage domain (with mount options set)
- hosted engine is on gluster as well, with settings persisted to
hosted-engine.conf for backupvol

All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or
an issue or anything

However, what I did notice during this, is my /datatest volume doesnt have
quorum set. So I will set that now and report back what happens

# gluster volume info datatest

Volume Name: datatest
Type: Replicate
Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/gluster/data/datatest/brick1
Brick2: node2:/gluster/data/datatest/brick1
Brick3: node3:/gluster/data/datatest/brick1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

Perhaps quorum may be more trouble than its worth when you have 3 nodes
and/or 2 nodes + arbiter?

Since I am keeping my 3rd node out of ovirt, I am more content on keeping
it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps
my storage 100% quorum

On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir  wrote:

> I can confirm that I did set it up manually, and I did specify backupvol,
> and in the "manage domain" storage settings, I do have under mount
> options, backup-volfile-servers=192.168.8.12:192.168.8.13  (and this was
> done at initial install time).
>
> The "used managed gluster" checkbox is NOT checked, and if I check it and
> save settings, next time I go in it is not checked.
>
> --Jim
>
> On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler 
> wrote:
>
>> @ Jim - here is my setup which I will test in a few (brand new cluster)
>> and report back what I found in my tests
>>
>> - 3x servers direct connected via 10Gb
>> - 2 of those 3 setup in ovirt as hosts
>> - Hosted engine
>> - Gluster replica 3 (no arbiter) for all volumes
>> - 1x engine volume gluster replica 3 manually configured (not using ovirt
>> managed gluster)
>> - 1x datatest volume (20gb) replica 3 manually configured (not using
>> ovirt managed gluster)
>> - 1x nfstest domain served from some other server in my infrastructure
>> which, at the time of my original testing, was master domain
>>
>> I tested this earlier and all VMs stayed online. However, ovirt cluster
>> reported DC/cluster down, all VM's stayed up
>>
>> As I am now typing this, can you confirm you setup your gluster storage
>> domain with backupvol? Also, confirm you updated hosted-engine.conf with
>> backupvol mount option as well?
>>
>> On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir  wrote:
>>
>>> So, after reading the first document twice and the 2nd link thoroughly
>>> once, I believe that the arbitrator volume should be sufficient and count
>>> for replica / split brain.  EG, if any one full replica is down, and the
>>> arbitrator and the other replica is up, then it should have quorum and all
>>> should be good.
>>>
>>> I think my underlying problem has to do more with config than the
>>> replica state.  That said, I did size the drive on my 3rd node planning to
>>> have an identical copy of all data on it, so I'm still not opposed to
>>> making it a full replica.
>>>
>>> Did I miss something here?
>>>
>>> Thanks!
>>>
>>> On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
>>> wrote:
>>>
 These can get a little confusing but this explains it best:
 https://gluster.readthedocs.io/en/latest/Administrator
 %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes

 Basically in the first paragraph they are explaining why you cant have
 HA with quorum for 2 nodes. Here is another overview doc that explains some
 more

 http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

 From my understanding arbiter is good for resolving split brains.
 Quorum and arbiter are two different things though quorum is a mechanism to
 help you **avoid** split brain and the arbiter is to help gluster resolve
 split brain by voting and other internal mechanics (as outlined in link 1).
 How did you create the volume exactly - what command? It looks to me like
 you created it with 'gluster volume create replica 2 arbiter 1 {}' per
 your earlier mention of "replica 2 arbiter 1". That being said, if you did
 that and then setup quorum in the volume configuration, this would cause
 your gluster to halt up since quorum was lost (as you saw until you
 recovered node 1)

 As you can see from the docs, there is still a corner case for getting
 in to split brain with replica 3, which again, is where arbiter would help
 gluster resolve it

 I need to amend my previous statement: I was told that arbiter volume
 does not store data, only metadata. I cannot find anything in the docs
 backing this up however it would 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
I can confirm that I did set it up manually, and I did specify backupvol,
and in the "manage domain" storage settings, I do have under mount
options, backup-volfile-servers=192.168.8.12:192.168.8.13  (and this was
done at initial install time).

The "used managed gluster" checkbox is NOT checked, and if I check it and
save settings, next time I go in it is not checked.

--Jim

On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler  wrote:

> @ Jim - here is my setup which I will test in a few (brand new cluster)
> and report back what I found in my tests
>
> - 3x servers direct connected via 10Gb
> - 2 of those 3 setup in ovirt as hosts
> - Hosted engine
> - Gluster replica 3 (no arbiter) for all volumes
> - 1x engine volume gluster replica 3 manually configured (not using ovirt
> managed gluster)
> - 1x datatest volume (20gb) replica 3 manually configured (not using ovirt
> managed gluster)
> - 1x nfstest domain served from some other server in my infrastructure
> which, at the time of my original testing, was master domain
>
> I tested this earlier and all VMs stayed online. However, ovirt cluster
> reported DC/cluster down, all VM's stayed up
>
> As I am now typing this, can you confirm you setup your gluster storage
> domain with backupvol? Also, confirm you updated hosted-engine.conf with
> backupvol mount option as well?
>
> On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir  wrote:
>
>> So, after reading the first document twice and the 2nd link thoroughly
>> once, I believe that the arbitrator volume should be sufficient and count
>> for replica / split brain.  EG, if any one full replica is down, and the
>> arbitrator and the other replica is up, then it should have quorum and all
>> should be good.
>>
>> I think my underlying problem has to do more with config than the replica
>> state.  That said, I did size the drive on my 3rd node planning to have an
>> identical copy of all data on it, so I'm still not opposed to making it a
>> full replica.
>>
>> Did I miss something here?
>>
>> Thanks!
>>
>> On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
>> wrote:
>>
>>> These can get a little confusing but this explains it best:
>>> https://gluster.readthedocs.io/en/latest/Administrator
>>> %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes
>>>
>>> Basically in the first paragraph they are explaining why you cant have
>>> HA with quorum for 2 nodes. Here is another overview doc that explains some
>>> more
>>>
>>> http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>>>
>>> From my understanding arbiter is good for resolving split brains. Quorum
>>> and arbiter are two different things though quorum is a mechanism to help
>>> you **avoid** split brain and the arbiter is to help gluster resolve split
>>> brain by voting and other internal mechanics (as outlined in link 1). How
>>> did you create the volume exactly - what command? It looks to me like you
>>> created it with 'gluster volume create replica 2 arbiter 1 {}' per your
>>> earlier mention of "replica 2 arbiter 1". That being said, if you did that
>>> and then setup quorum in the volume configuration, this would cause your
>>> gluster to halt up since quorum was lost (as you saw until you recovered
>>> node 1)
>>>
>>> As you can see from the docs, there is still a corner case for getting
>>> in to split brain with replica 3, which again, is where arbiter would help
>>> gluster resolve it
>>>
>>> I need to amend my previous statement: I was told that arbiter volume
>>> does not store data, only metadata. I cannot find anything in the docs
>>> backing this up however it would make sense for it to be. That being said,
>>> in my setup, I would not include my arbiter or my third node in my ovirt VM
>>> cluster component. I would keep it completely separate
>>>
>>>
>>> On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir  wrote:
>>>
 I'm now also confused as to what the point of an arbiter is / what it
 does / why one would use it.

 On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir 
 wrote:

> Thanks for the help!
>
> Here's my gluster volume info for the data export/brick (I have 3:
> data, engine, and iso, but they're all configured the same):
>
> Volume Name: data
> Type: Replicate
> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
> Options Reconfigured:
> performance.strict-o-direct: on
> nfs.disable: on
> user.cifs: off
> network.ping-timeout: 30
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> cluster.locking-scheme: granular
> 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
@ Jim - here is my setup which I will test in a few (brand new cluster) and
report back what I found in my tests

- 3x servers direct connected via 10Gb
- 2 of those 3 setup in ovirt as hosts
- Hosted engine
- Gluster replica 3 (no arbiter) for all volumes
- 1x engine volume gluster replica 3 manually configured (not using ovirt
managed gluster)
- 1x datatest volume (20gb) replica 3 manually configured (not using ovirt
managed gluster)
- 1x nfstest domain served from some other server in my infrastructure
which, at the time of my original testing, was master domain

I tested this earlier and all VMs stayed online. However, ovirt cluster
reported DC/cluster down, all VM's stayed up

As I am now typing this, can you confirm you setup your gluster storage
domain with backupvol? Also, confirm you updated hosted-engine.conf with
backupvol mount option as well?

On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir  wrote:

> So, after reading the first document twice and the 2nd link thoroughly
> once, I believe that the arbitrator volume should be sufficient and count
> for replica / split brain.  EG, if any one full replica is down, and the
> arbitrator and the other replica is up, then it should have quorum and all
> should be good.
>
> I think my underlying problem has to do more with config than the replica
> state.  That said, I did size the drive on my 3rd node planning to have an
> identical copy of all data on it, so I'm still not opposed to making it a
> full replica.
>
> Did I miss something here?
>
> Thanks!
>
> On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
> wrote:
>
>> These can get a little confusing but this explains it best:
>> https://gluster.readthedocs.io/en/latest/Administrator
>> %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes
>>
>> Basically in the first paragraph they are explaining why you cant have HA
>> with quorum for 2 nodes. Here is another overview doc that explains some
>> more
>>
>> http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>>
>> From my understanding arbiter is good for resolving split brains. Quorum
>> and arbiter are two different things though quorum is a mechanism to help
>> you **avoid** split brain and the arbiter is to help gluster resolve split
>> brain by voting and other internal mechanics (as outlined in link 1). How
>> did you create the volume exactly - what command? It looks to me like you
>> created it with 'gluster volume create replica 2 arbiter 1 {}' per your
>> earlier mention of "replica 2 arbiter 1". That being said, if you did that
>> and then setup quorum in the volume configuration, this would cause your
>> gluster to halt up since quorum was lost (as you saw until you recovered
>> node 1)
>>
>> As you can see from the docs, there is still a corner case for getting in
>> to split brain with replica 3, which again, is where arbiter would help
>> gluster resolve it
>>
>> I need to amend my previous statement: I was told that arbiter volume
>> does not store data, only metadata. I cannot find anything in the docs
>> backing this up however it would make sense for it to be. That being said,
>> in my setup, I would not include my arbiter or my third node in my ovirt VM
>> cluster component. I would keep it completely separate
>>
>>
>> On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir  wrote:
>>
>>> I'm now also confused as to what the point of an arbiter is / what it
>>> does / why one would use it.
>>>
>>> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir 
>>> wrote:
>>>
 Thanks for the help!

 Here's my gluster volume info for the data export/brick (I have 3:
 data, engine, and iso, but they're all configured the same):

 Volume Name: data
 Type: Replicate
 Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
 Status: Started
 Snapshot Count: 0
 Number of Bricks: 1 x (2 + 1) = 3
 Transport-type: tcp
 Bricks:
 Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
 Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
 Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
 Options Reconfigured:
 performance.strict-o-direct: on
 nfs.disable: on
 user.cifs: off
 network.ping-timeout: 30
 cluster.shd-max-threads: 8
 cluster.shd-wait-qlength: 1
 cluster.locking-scheme: granular
 cluster.data-self-heal-algorithm: full
 performance.low-prio-threads: 32
 features.shard-block-size: 512MB
 features.shard: on
 storage.owner-gid: 36
 storage.owner-uid: 36
 cluster.server-quorum-type: server
 cluster.quorum-type: auto
 network.remote-dio: enable
 cluster.eager-lock: enable
 performance.stat-prefetch: off
 performance.io-cache: off
 performance.read-ahead: off
 performance.quick-read: off
 performance.readdir-ahead: on
 server.allow-insecure: on
 [root@ovirt1 ~]#


 all 3 of my brick nodes ARE 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
So, after reading the first document twice and the 2nd link thoroughly
once, I believe that the arbitrator volume should be sufficient and count
for replica / split brain.  EG, if any one full replica is down, and the
arbitrator and the other replica is up, then it should have quorum and all
should be good.

I think my underlying problem has to do more with config than the replica
state.  That said, I did size the drive on my 3rd node planning to have an
identical copy of all data on it, so I'm still not opposed to making it a
full replica.

Did I miss something here?

Thanks!

On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
wrote:

> These can get a little confusing but this explains it best:
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-
> volumes-and-quorum/#replica-2-and-replica-3-volumes
>
> Basically in the first paragraph they are explaining why you cant have HA
> with quorum for 2 nodes. Here is another overview doc that explains some
> more
>
> http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>
> From my understanding arbiter is good for resolving split brains. Quorum
> and arbiter are two different things though quorum is a mechanism to help
> you **avoid** split brain and the arbiter is to help gluster resolve split
> brain by voting and other internal mechanics (as outlined in link 1). How
> did you create the volume exactly - what command? It looks to me like you
> created it with 'gluster volume create replica 2 arbiter 1 {}' per your
> earlier mention of "replica 2 arbiter 1". That being said, if you did that
> and then setup quorum in the volume configuration, this would cause your
> gluster to halt up since quorum was lost (as you saw until you recovered
> node 1)
>
> As you can see from the docs, there is still a corner case for getting in
> to split brain with replica 3, which again, is where arbiter would help
> gluster resolve it
>
> I need to amend my previous statement: I was told that arbiter volume does
> not store data, only metadata. I cannot find anything in the docs backing
> this up however it would make sense for it to be. That being said, in my
> setup, I would not include my arbiter or my third node in my ovirt VM
> cluster component. I would keep it completely separate
>
>
> On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir  wrote:
>
>> I'm now also confused as to what the point of an arbiter is / what it
>> does / why one would use it.
>>
>> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir  wrote:
>>
>>> Thanks for the help!
>>>
>>> Here's my gluster volume info for the data export/brick (I have 3: data,
>>> engine, and iso, but they're all configured the same):
>>>
>>> Volume Name: data
>>> Type: Replicate
>>> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x (2 + 1) = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
>>> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
>>> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
>>> Options Reconfigured:
>>> performance.strict-o-direct: on
>>> nfs.disable: on
>>> user.cifs: off
>>> network.ping-timeout: 30
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 1
>>> cluster.locking-scheme: granular
>>> cluster.data-self-heal-algorithm: full
>>> performance.low-prio-threads: 32
>>> features.shard-block-size: 512MB
>>> features.shard: on
>>> storage.owner-gid: 36
>>> storage.owner-uid: 36
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> performance.stat-prefetch: off
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> performance.readdir-ahead: on
>>> server.allow-insecure: on
>>> [root@ovirt1 ~]#
>>>
>>>
>>> all 3 of my brick nodes ARE also members of the virtualization cluster
>>> (including ovirt3).  How can I convert it into a full replica instead of
>>> just an arbiter?
>>>
>>> Thanks!
>>> --Jim
>>>
>>> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler 
>>> wrote:
>>>
 @Kasturi - Looks good now. Cluster showed down for a moment but VM's
 stayed up in their appropriate places. Thanks!

 < Anyone on this list please feel free to correct my response to Jim if
 its wrong>

 @ Jim - If you can share your gluster volume info / status I can
 confirm (to the best of my knowledge). From my understanding, If you setup
 the volume with something like 'gluster volume set  group virt' this
 will configure some quorum options as well, Ex:
 http://i.imgur.com/Mya4N5o.png

 While, yes, you are configured for arbiter node you're still losing
 quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter
 to configure quorum which is in effect 3 writable nodes and 1 arbiter. If
 one gluster node 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
Thank you!

I created my cluster following these instructions:

https://www.ovirt.org/blog/2016/08/up-and-running-with-ovirt-4-0-and-gluster-storage/

(I built it about 10 months ago)

I used their recipe for automated gluster node creation.  Originally I
thought I had 3 replicas, then I started realizing that node 3's disk usage
was essentially nothing compared to node 1 and 2, and eventually on this
list discovered that I had an arbiter.  Currently I am running on a 1Gbps
backbone, but I can dedicate a gig port (or even do bonded gig -- my
servers have 4 1Gbps interfaces, and my switch is only used for this
cluster, so it has the ports to hook them all up).  I am planning on a
10gbps upgrade once I bring in some more cash to pay for it.

Last night, node 2 and 3 were up, and I rebooted node 1 for updates.  As
soon as it shut down, my cluster halted (including the hosted engine), and
everything went messy.  When the node came back up, I still had to recover
the hosted engine via command line, then could go in and start unpausing my
VMs.  I'm glad it happened at 8pm at night...That would have been very ugly
if it happened during the day.  I had thought I had enough redundancy in
the cluster that I could take down any 1 node and not have an issue...That
definitely is not what happened.

--Jim

On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
wrote:

> These can get a little confusing but this explains it best:
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-
> volumes-and-quorum/#replica-2-and-replica-3-volumes
>
> Basically in the first paragraph they are explaining why you cant have HA
> with quorum for 2 nodes. Here is another overview doc that explains some
> more
>
> http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>
> From my understanding arbiter is good for resolving split brains. Quorum
> and arbiter are two different things though quorum is a mechanism to help
> you **avoid** split brain and the arbiter is to help gluster resolve split
> brain by voting and other internal mechanics (as outlined in link 1). How
> did you create the volume exactly - what command? It looks to me like you
> created it with 'gluster volume create replica 2 arbiter 1 {}' per your
> earlier mention of "replica 2 arbiter 1". That being said, if you did that
> and then setup quorum in the volume configuration, this would cause your
> gluster to halt up since quorum was lost (as you saw until you recovered
> node 1)
>
> As you can see from the docs, there is still a corner case for getting in
> to split brain with replica 3, which again, is where arbiter would help
> gluster resolve it
>
> I need to amend my previous statement: I was told that arbiter volume does
> not store data, only metadata. I cannot find anything in the docs backing
> this up however it would make sense for it to be. That being said, in my
> setup, I would not include my arbiter or my third node in my ovirt VM
> cluster component. I would keep it completely separate
>
>
> On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir  wrote:
>
>> I'm now also confused as to what the point of an arbiter is / what it
>> does / why one would use it.
>>
>> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir  wrote:
>>
>>> Thanks for the help!
>>>
>>> Here's my gluster volume info for the data export/brick (I have 3: data,
>>> engine, and iso, but they're all configured the same):
>>>
>>> Volume Name: data
>>> Type: Replicate
>>> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x (2 + 1) = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
>>> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
>>> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
>>> Options Reconfigured:
>>> performance.strict-o-direct: on
>>> nfs.disable: on
>>> user.cifs: off
>>> network.ping-timeout: 30
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 1
>>> cluster.locking-scheme: granular
>>> cluster.data-self-heal-algorithm: full
>>> performance.low-prio-threads: 32
>>> features.shard-block-size: 512MB
>>> features.shard: on
>>> storage.owner-gid: 36
>>> storage.owner-uid: 36
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> performance.stat-prefetch: off
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> performance.readdir-ahead: on
>>> server.allow-insecure: on
>>> [root@ovirt1 ~]#
>>>
>>>
>>> all 3 of my brick nodes ARE also members of the virtualization cluster
>>> (including ovirt3).  How can I convert it into a full replica instead of
>>> just an arbiter?
>>>
>>> Thanks!
>>> --Jim
>>>
>>> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler 
>>> wrote:
>>>
 @Kasturi - Looks good now. Cluster showed down for a moment but VM's

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread WK



On 9/1/2017 8:53 AM, Jim Kusznir wrote:
Huh...Ok., how do I convert the arbitrar to full replica, then?  I was 
misinformed when I created this setup.  I thought the arbitrator held 
enough metadata that it could validate or refudiate  any one replica 
(kinda like the parity drive for a RAID-4 array).  I was also under 
the impression that one replica  + Arbitrator is enough to keep the 
array online and functional.


I can not speak for the Ovirt implementation of Rep2+Arbiter as I've not 
used it, but on a standalone libvirt VM host cluster,   Arb does exactly 
what you want. You can lose 'one' of the two replicas and stay online. 
The Arb maintains quorum. Of course if you lose the second Replica 
before you have repaired the first failure you have completely lost your 
data as the Arb doesn't have that. So Rep2+Arb is not as SAFE as Rep3, 
however it can be faster, especially on less than 10G networks.


When any node fails, Gluster will pause for 42 seconds or so (its 
configurable) before marking the bad node as bad. Then normal activity 
will resume.


On most people's systems, the 'pause' (I think its a read-only event), 
it noticeable, but not enough to cause issue. One person has reported 
that his VMs went read-only during that period, but other have not 
reported that.


-wk
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
These can get a little confusing but this explains it best:
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes

Basically in the first paragraph they are explaining why you cant have HA
with quorum for 2 nodes. Here is another overview doc that explains some
more

http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

>From my understanding arbiter is good for resolving split brains. Quorum
and arbiter are two different things though quorum is a mechanism to help
you **avoid** split brain and the arbiter is to help gluster resolve split
brain by voting and other internal mechanics (as outlined in link 1). How
did you create the volume exactly - what command? It looks to me like you
created it with 'gluster volume create replica 2 arbiter 1 {}' per your
earlier mention of "replica 2 arbiter 1". That being said, if you did that
and then setup quorum in the volume configuration, this would cause your
gluster to halt up since quorum was lost (as you saw until you recovered
node 1)

As you can see from the docs, there is still a corner case for getting in
to split brain with replica 3, which again, is where arbiter would help
gluster resolve it

I need to amend my previous statement: I was told that arbiter volume does
not store data, only metadata. I cannot find anything in the docs backing
this up however it would make sense for it to be. That being said, in my
setup, I would not include my arbiter or my third node in my ovirt VM
cluster component. I would keep it completely separate


On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir  wrote:

> I'm now also confused as to what the point of an arbiter is / what it does
> / why one would use it.
>
> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir  wrote:
>
>> Thanks for the help!
>>
>> Here's my gluster volume info for the data export/brick (I have 3: data,
>> engine, and iso, but they're all configured the same):
>>
>> Volume Name: data
>> Type: Replicate
>> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
>> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
>> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
>> Options Reconfigured:
>> performance.strict-o-direct: on
>> nfs.disable: on
>> user.cifs: off
>> network.ping-timeout: 30
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 1
>> cluster.locking-scheme: granular
>> cluster.data-self-heal-algorithm: full
>> performance.low-prio-threads: 32
>> features.shard-block-size: 512MB
>> features.shard: on
>> storage.owner-gid: 36
>> storage.owner-uid: 36
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.stat-prefetch: off
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.readdir-ahead: on
>> server.allow-insecure: on
>> [root@ovirt1 ~]#
>>
>>
>> all 3 of my brick nodes ARE also members of the virtualization cluster
>> (including ovirt3).  How can I convert it into a full replica instead of
>> just an arbiter?
>>
>> Thanks!
>> --Jim
>>
>> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler 
>> wrote:
>>
>>> @Kasturi - Looks good now. Cluster showed down for a moment but VM's
>>> stayed up in their appropriate places. Thanks!
>>>
>>> < Anyone on this list please feel free to correct my response to Jim if
>>> its wrong>
>>>
>>> @ Jim - If you can share your gluster volume info / status I can confirm
>>> (to the best of my knowledge). From my understanding, If you setup the
>>> volume with something like 'gluster volume set  group virt' this will
>>> configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o
>>> .png
>>>
>>> While, yes, you are configured for arbiter node you're still losing
>>> quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter
>>> to configure quorum which is in effect 3 writable nodes and 1 arbiter. If
>>> one gluster node drops, you still have 2 up. Although in this case, you
>>> probably wouldnt need arbiter at all
>>>
>>> If you are configured, you can drop quorum settings and just let arbiter
>>> run since you're not using arbiter node in your VM cluster part (I
>>> believe), just storage cluster part. When using quorum, you need > 50% of
>>> the cluster being up at one time. Since you have 3 nodes with 1 arbiter,
>>> you're actually losing 1/2 which == 50 which == degraded / hindered gluster
>>>
>>> Again, this is to the best of my knowledge based on other quorum backed
>>> softwareand this is what I understand from testing with gluster and
>>> ovirt thus far
>>>
>>> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir 
>>> wrote:
>>>
 Huh...Ok., how do I convert the arbitrar to 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
I'm now also confused as to what the point of an arbiter is / what it does
/ why one would use it.

On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir  wrote:

> Thanks for the help!
>
> Here's my gluster volume info for the data export/brick (I have 3: data,
> engine, and iso, but they're all configured the same):
>
> Volume Name: data
> Type: Replicate
> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
> Options Reconfigured:
> performance.strict-o-direct: on
> nfs.disable: on
> user.cifs: off
> network.ping-timeout: 30
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> performance.low-prio-threads: 32
> features.shard-block-size: 512MB
> features.shard: on
> storage.owner-gid: 36
> storage.owner-uid: 36
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.stat-prefetch: off
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.readdir-ahead: on
> server.allow-insecure: on
> [root@ovirt1 ~]#
>
>
> all 3 of my brick nodes ARE also members of the virtualization cluster
> (including ovirt3).  How can I convert it into a full replica instead of
> just an arbiter?
>
> Thanks!
> --Jim
>
> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler 
> wrote:
>
>> @Kasturi - Looks good now. Cluster showed down for a moment but VM's
>> stayed up in their appropriate places. Thanks!
>>
>> < Anyone on this list please feel free to correct my response to Jim if
>> its wrong>
>>
>> @ Jim - If you can share your gluster volume info / status I can confirm
>> (to the best of my knowledge). From my understanding, If you setup the
>> volume with something like 'gluster volume set  group virt' this will
>> configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png
>>
>> While, yes, you are configured for arbiter node you're still losing
>> quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter
>> to configure quorum which is in effect 3 writable nodes and 1 arbiter. If
>> one gluster node drops, you still have 2 up. Although in this case, you
>> probably wouldnt need arbiter at all
>>
>> If you are configured, you can drop quorum settings and just let arbiter
>> run since you're not using arbiter node in your VM cluster part (I
>> believe), just storage cluster part. When using quorum, you need > 50% of
>> the cluster being up at one time. Since you have 3 nodes with 1 arbiter,
>> you're actually losing 1/2 which == 50 which == degraded / hindered gluster
>>
>> Again, this is to the best of my knowledge based on other quorum backed
>> softwareand this is what I understand from testing with gluster and
>> ovirt thus far
>>
>> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir  wrote:
>>
>>> Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
>>> misinformed when I created this setup.  I thought the arbitrator held
>>> enough metadata that it could validate or refudiate  any one replica (kinda
>>> like the parity drive for a RAID-4 array).  I was also under the impression
>>> that one replica  + Arbitrator is enough to keep the array online and
>>> functional.
>>>
>>> --Jim
>>>
>>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler 
>>> wrote:
>>>
 @ Jim - you have only two data volumes and lost quorum. Arbitrator only
 stores metadata, no actual files. So yes, you were running in degraded mode
 so some operations were hindered.

 @ Sahina - Yes, this actually worked fine for me once I did that.
 However, the issue I am still facing, is when I go to create a new gluster
 storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
 I select that host. If I fail that host, all VMs halt. I do not recall this
 in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
 to a volume and vice versa like you could, for instance, for a singular
 hyperconverged to ex: export a local disk via NFS and then mount it via
 ovirt domain. But of course, this has its caveats. To that end, I am using
 gluster replica 3, when configuring it I say "host to use: " node 1, then
 in the connection details I give it node1:/data. I fail node1, all VMs
 halt. Did I miss something?

 On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:

> To the OP question, when you set up a gluster storage domain, you need
> to specify backup-volfile-servers=: where server2
> and server3 also have bricks running. When server1 is down, and the 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
Thanks for the help!

Here's my gluster volume info for the data export/brick (I have 3: data,
engine, and iso, but they're all configured the same):

Volume Name: data
Type: Replicate
Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
Options Reconfigured:
performance.strict-o-direct: on
nfs.disable: on
user.cifs: off
network.ping-timeout: 30
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 1
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
performance.low-prio-threads: 32
features.shard-block-size: 512MB
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
server.allow-insecure: on
[root@ovirt1 ~]#


all 3 of my brick nodes ARE also members of the virtualization cluster
(including ovirt3).  How can I convert it into a full replica instead of
just an arbiter?

Thanks!
--Jim

On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler  wrote:

> @Kasturi - Looks good now. Cluster showed down for a moment but VM's
> stayed up in their appropriate places. Thanks!
>
> < Anyone on this list please feel free to correct my response to Jim if
> its wrong>
>
> @ Jim - If you can share your gluster volume info / status I can confirm
> (to the best of my knowledge). From my understanding, If you setup the
> volume with something like 'gluster volume set  group virt' this will
> configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png
>
> While, yes, you are configured for arbiter node you're still losing quorum
> by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to
> configure quorum which is in effect 3 writable nodes and 1 arbiter. If one
> gluster node drops, you still have 2 up. Although in this case, you
> probably wouldnt need arbiter at all
>
> If you are configured, you can drop quorum settings and just let arbiter
> run since you're not using arbiter node in your VM cluster part (I
> believe), just storage cluster part. When using quorum, you need > 50% of
> the cluster being up at one time. Since you have 3 nodes with 1 arbiter,
> you're actually losing 1/2 which == 50 which == degraded / hindered gluster
>
> Again, this is to the best of my knowledge based on other quorum backed
> softwareand this is what I understand from testing with gluster and
> ovirt thus far
>
> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir  wrote:
>
>> Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
>> misinformed when I created this setup.  I thought the arbitrator held
>> enough metadata that it could validate or refudiate  any one replica (kinda
>> like the parity drive for a RAID-4 array).  I was also under the impression
>> that one replica  + Arbitrator is enough to keep the array online and
>> functional.
>>
>> --Jim
>>
>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler 
>> wrote:
>>
>>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>>> stores metadata, no actual files. So yes, you were running in degraded mode
>>> so some operations were hindered.
>>>
>>> @ Sahina - Yes, this actually worked fine for me once I did that.
>>> However, the issue I am still facing, is when I go to create a new gluster
>>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>>> I select that host. If I fail that host, all VMs halt. I do not recall this
>>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>>> to a volume and vice versa like you could, for instance, for a singular
>>> hyperconverged to ex: export a local disk via NFS and then mount it via
>>> ovirt domain. But of course, this has its caveats. To that end, I am using
>>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>>> in the connection details I give it node1:/data. I fail node1, all VMs
>>> halt. Did I miss something?
>>>
>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>>
 To the OP question, when you set up a gluster storage domain, you need
 to specify backup-volfile-servers=: where server2
 and server3 also have bricks running. When server1 is down, and the volume
 is mounted again - server2 or server3 are queried to get the gluster
 volfiles.

 @Jim, if this does not work, are you using 4.1.5 build with libgfapi
 access? If not, please provide the vdsm and gluster mount logs to analyse

 If VMs go to paused state - this could mean the storage is not
 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
@Kasturi - Looks good now. Cluster showed down for a moment but VM's stayed
up in their appropriate places. Thanks!

< Anyone on this list please feel free to correct my response to Jim if its
wrong>

@ Jim - If you can share your gluster volume info / status I can confirm
(to the best of my knowledge). From my understanding, If you setup the
volume with something like 'gluster volume set  group virt' this will
configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png

While, yes, you are configured for arbiter node you're still losing quorum
by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to
configure quorum which is in effect 3 writable nodes and 1 arbiter. If one
gluster node drops, you still have 2 up. Although in this case, you
probably wouldnt need arbiter at all

If you are configured, you can drop quorum settings and just let arbiter
run since you're not using arbiter node in your VM cluster part (I
believe), just storage cluster part. When using quorum, you need > 50% of
the cluster being up at one time. Since you have 3 nodes with 1 arbiter,
you're actually losing 1/2 which == 50 which == degraded / hindered gluster

Again, this is to the best of my knowledge based on other quorum backed
softwareand this is what I understand from testing with gluster and
ovirt thus far

On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir  wrote:

> Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
> misinformed when I created this setup.  I thought the arbitrator held
> enough metadata that it could validate or refudiate  any one replica (kinda
> like the parity drive for a RAID-4 array).  I was also under the impression
> that one replica  + Arbitrator is enough to keep the array online and
> functional.
>
> --Jim
>
> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler 
> wrote:
>
>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>> stores metadata, no actual files. So yes, you were running in degraded mode
>> so some operations were hindered.
>>
>> @ Sahina - Yes, this actually worked fine for me once I did that.
>> However, the issue I am still facing, is when I go to create a new gluster
>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>> I select that host. If I fail that host, all VMs halt. I do not recall this
>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>> to a volume and vice versa like you could, for instance, for a singular
>> hyperconverged to ex: export a local disk via NFS and then mount it via
>> ovirt domain. But of course, this has its caveats. To that end, I am using
>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>> in the connection details I give it node1:/data. I fail node1, all VMs
>> halt. Did I miss something?
>>
>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>
>>> To the OP question, when you set up a gluster storage domain, you need
>>> to specify backup-volfile-servers=: where server2 and
>>> server3 also have bricks running. When server1 is down, and the volume is
>>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>>
>>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>>
>>> If VMs go to paused state - this could mean the storage is not
>>> available. You can check "gluster volume status " to see if
>>> atleast 2 bricks are running.
>>>
>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
>>> wrote:
>>>
 If gluster drops in quorum so that it has less votes than it should it
 will stop file operations until quorum is back to normal.If i rember it
 right you need two bricks to write for quorum to be met and that the
 arbiter only is a vote to avoid split brain.


 Basically what you have is a raid5 solution without a spare. And when
 one disk dies it will run in degraded mode. And some raid systems will stop
 the raid until you have removed the disk or forced it to run anyway.

 You can read up on it here: https://gluster.readthed
 ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

 /Johan

 On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

 Hi all:

 Sorry to hijack the thread, but I was about to start essentially the
 same thread.

 I have a 3 node cluster, all three are hosts and gluster nodes (replica
 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:

 storage=192.168.8.11:/engine
 mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

 I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
 paused, including the engine (all VMs were running on host2:192.168.8.12).
 I couldn't get any gluster stuff working until 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
Speaking of the "use managed gluster", I created this gluster setup under
ovirt 4.0 when that wasn't there.  I've gone into my settings and checked
the box and saved it at least twice, but when I go back into the storage
settings, its not checked again.

The "about" box in the gui reports that I'm using this version: oVirt
Engine Version: 4.1.1.8-1.el7.centos

I thought I was staying up to date, but I'm not sure if I'm doing
everything right on the upgrade...The documentation says to click for
hosted engine upgrade instructions, which takes me to a page not found
error...For several versions now, and I haven't found those instructions,
so I've been "winging it".

--Jim

On Fri, Sep 1, 2017 at 8:53 AM, Jim Kusznir  wrote:

> Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
> misinformed when I created this setup.  I thought the arbitrator held
> enough metadata that it could validate or refudiate  any one replica (kinda
> like the parity drive for a RAID-4 array).  I was also under the impression
> that one replica  + Arbitrator is enough to keep the array online and
> functional.
>
> --Jim
>
> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler 
> wrote:
>
>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>> stores metadata, no actual files. So yes, you were running in degraded mode
>> so some operations were hindered.
>>
>> @ Sahina - Yes, this actually worked fine for me once I did that.
>> However, the issue I am still facing, is when I go to create a new gluster
>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>> I select that host. If I fail that host, all VMs halt. I do not recall this
>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>> to a volume and vice versa like you could, for instance, for a singular
>> hyperconverged to ex: export a local disk via NFS and then mount it via
>> ovirt domain. But of course, this has its caveats. To that end, I am using
>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>> in the connection details I give it node1:/data. I fail node1, all VMs
>> halt. Did I miss something?
>>
>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>
>>> To the OP question, when you set up a gluster storage domain, you need
>>> to specify backup-volfile-servers=: where server2 and
>>> server3 also have bricks running. When server1 is down, and the volume is
>>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>>
>>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>>
>>> If VMs go to paused state - this could mean the storage is not
>>> available. You can check "gluster volume status " to see if
>>> atleast 2 bricks are running.
>>>
>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
>>> wrote:
>>>
 If gluster drops in quorum so that it has less votes than it should it
 will stop file operations until quorum is back to normal.If i rember it
 right you need two bricks to write for quorum to be met and that the
 arbiter only is a vote to avoid split brain.


 Basically what you have is a raid5 solution without a spare. And when
 one disk dies it will run in degraded mode. And some raid systems will stop
 the raid until you have removed the disk or forced it to run anyway.

 You can read up on it here: https://gluster.readthed
 ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

 /Johan

 On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

 Hi all:

 Sorry to hijack the thread, but I was about to start essentially the
 same thread.

 I have a 3 node cluster, all three are hosts and gluster nodes (replica
 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:

 storage=192.168.8.11:/engine
 mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

 I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
 paused, including the engine (all VMs were running on host2:192.168.8.12).
 I couldn't get any gluster stuff working until host1 (192.168.8.11) was
 restored.

 What's wrong / what did I miss?

 (this was set up "manually" through the article on setting up
 self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
 since).

 Thanks!
 --Jim


 On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
 wrote:

 Typo..."Set it up and then failed that **HOST**"

 And upon that host going down, the storage domain went down. I only
 have hosted storage domain and this new one - is this why the DC went down
 and no SPM could be elected?

 I dont recall this working 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
misinformed when I created this setup.  I thought the arbitrator held
enough metadata that it could validate or refudiate  any one replica (kinda
like the parity drive for a RAID-4 array).  I was also under the impression
that one replica  + Arbitrator is enough to keep the array online and
functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler  wrote:

> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
> stores metadata, no actual files. So yes, you were running in degraded mode
> so some operations were hindered.
>
> @ Sahina - Yes, this actually worked fine for me once I did that. However,
> the issue I am still facing, is when I go to create a new gluster storage
> domain (replica 3, hyperconverged) and I tell it "Host to use" and I select
> that host. If I fail that host, all VMs halt. I do not recall this in 3.6
> or early 4.0. This to me makes it seem like this is "pinning" a node to a
> volume and vice versa like you could, for instance, for a singular
> hyperconverged to ex: export a local disk via NFS and then mount it via
> ovirt domain. But of course, this has its caveats. To that end, I am using
> gluster replica 3, when configuring it I say "host to use: " node 1, then
> in the connection details I give it node1:/data. I fail node1, all VMs
> halt. Did I miss something?
>
> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>
>> To the OP question, when you set up a gluster storage domain, you need to
>> specify backup-volfile-servers=: where server2 and
>> server3 also have bricks running. When server1 is down, and the volume is
>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>
>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>
>> If VMs go to paused state - this could mean the storage is not available.
>> You can check "gluster volume status " to see if atleast 2 bricks
>> are running.
>>
>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
>> wrote:
>>
>>> If gluster drops in quorum so that it has less votes than it should it
>>> will stop file operations until quorum is back to normal.If i rember it
>>> right you need two bricks to write for quorum to be met and that the
>>> arbiter only is a vote to avoid split brain.
>>>
>>>
>>> Basically what you have is a raid5 solution without a spare. And when
>>> one disk dies it will run in degraded mode. And some raid systems will stop
>>> the raid until you have removed the disk or forced it to run anyway.
>>>
>>> You can read up on it here: https://gluster.readthed
>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
>>>
>>> /Johan
>>>
>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>>>
>>> Hi all:
>>>
>>> Sorry to hijack the thread, but I was about to start essentially the
>>> same thread.
>>>
>>> I have a 3 node cluster, all three are hosts and gluster nodes (replica
>>> 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:
>>>
>>> storage=192.168.8.11:/engine
>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>>>
>>> I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
>>> paused, including the engine (all VMs were running on host2:192.168.8.12).
>>> I couldn't get any gluster stuff working until host1 (192.168.8.11) was
>>> restored.
>>>
>>> What's wrong / what did I miss?
>>>
>>> (this was set up "manually" through the article on setting up
>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
>>> since).
>>>
>>> Thanks!
>>> --Jim
>>>
>>>
>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
>>> wrote:
>>>
>>> Typo..."Set it up and then failed that **HOST**"
>>>
>>> And upon that host going down, the storage domain went down. I only have
>>> hosted storage domain and this new one - is this why the DC went down and
>>> no SPM could be elected?
>>>
>>> I dont recall this working this way in early 4.0 or 3.6
>>>
>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
>>> wrote:
>>>
>>> So I've tested this today and I failed a node. Specifically, I setup a
>>> glusterfs domain and selected "host to use: node1". Set it up and then
>>> failed that VM
>>>
>>> However, this did not work and the datacenter went down. My engine
>>> stayed up, however, it seems configuring a domain to pin to a host to use
>>> will obviously cause it to fail
>>>
>>> This seems counter-intuitive to the point of glusterfs or any redundant
>>> storage. If a single host has to be tied to its function, this introduces a
>>> single point of failure
>>>
>>> Am I missing something obvious?
>>>
>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra 
>>> wrote:
>>>
>>> yes, right.  What you can do is edit the 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Kasturi Narra
yes, that is the same option i was asking about. Apologies that i had
mentioned a different name.

So, ovirt will automatically detect it if you select the option 'use
managed gluster volume'. While adding a storage domain after specifying the
host , you could just select the checkbox and that will list all the
volumes managed from ovirt UI + that will fill the mount options for you.



On Fri, Sep 1, 2017 at 6:40 PM, Charles Kozler  wrote:

> Are you referring to "Mount Options" - > http://i.imgur.com/bYfbyzz.png
>
> Then no, but that would explain why it wasnt working :-). I guess I had a
> silly assumption that oVirt would have detected it and automatically taken
> up the redundancy that was configured inside the replica set / brick
> detection.
>
> I will test and let you know
>
> Thanks!
>
> On Fri, Sep 1, 2017 at 8:52 AM, Kasturi Narra  wrote:
>
>> Hi Charles,
>>
>>   One question, while configuring a storage domain  you are
>> saying "host to use: " node1,  then in the connection details you say
>> node1:/data. What about the backup-volfile-servers option in the UI while
>> configuring storage domain? Are you specifying that too?
>>
>> Thanks
>> kasturi
>>
>>
>> On Fri, Sep 1, 2017 at 5:52 PM, Charles Kozler 
>> wrote:
>>
>>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>>> stores metadata, no actual files. So yes, you were running in degraded mode
>>> so some operations were hindered.
>>>
>>> @ Sahina - Yes, this actually worked fine for me once I did that.
>>> However, the issue I am still facing, is when I go to create a new gluster
>>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>>> I select that host. If I fail that host, all VMs halt. I do not recall this
>>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>>> to a volume and vice versa like you could, for instance, for a singular
>>> hyperconverged to ex: export a local disk via NFS and then mount it via
>>> ovirt domain. But of course, this has its caveats. To that end, I am using
>>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>>> in the connection details I give it node1:/data. I fail node1, all VMs
>>> halt. Did I miss something?
>>>
>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>>
 To the OP question, when you set up a gluster storage domain, you need
 to specify backup-volfile-servers=: where server2
 and server3 also have bricks running. When server1 is down, and the volume
 is mounted again - server2 or server3 are queried to get the gluster
 volfiles.

 @Jim, if this does not work, are you using 4.1.5 build with libgfapi
 access? If not, please provide the vdsm and gluster mount logs to analyse

 If VMs go to paused state - this could mean the storage is not
 available. You can check "gluster volume status " to see if
 atleast 2 bricks are running.

 On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
 wrote:

> If gluster drops in quorum so that it has less votes than it should it
> will stop file operations until quorum is back to normal.If i rember it
> right you need two bricks to write for quorum to be met and that the
> arbiter only is a vote to avoid split brain.
>
>
> Basically what you have is a raid5 solution without a spare. And when
> one disk dies it will run in degraded mode. And some raid systems will 
> stop
> the raid until you have removed the disk or forced it to run anyway.
>
> You can read up on it here: https://gluster.readthed
> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
>
> /Johan
>
> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>
> Hi all:
>
> Sorry to hijack the thread, but I was about to start essentially the
> same thread.
>
> I have a 3 node cluster, all three are hosts and gluster nodes
> (replica 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers=
> set:
>
> storage=192.168.8.11:/engine
> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>
> I had an issue today where 192.168.8.11 went down.  ALL VMs
> immediately paused, including the engine (all VMs were running on
> host2:192.168.8.12).  I couldn't get any gluster stuff working until host1
> (192.168.8.11) was restored.
>
> What's wrong / what did I miss?
>
> (this was set up "manually" through the article on setting up
> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
> since).
>
> Thanks!
> --Jim
>
>
> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler  > wrote:
>
> Typo..."Set it up and then failed that **HOST**"
>
> And upon that host going down, 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
Are you referring to "Mount Options" - > http://i.imgur.com/bYfbyzz.png

Then no, but that would explain why it wasnt working :-). I guess I had a
silly assumption that oVirt would have detected it and automatically taken
up the redundancy that was configured inside the replica set / brick
detection.

I will test and let you know

Thanks!

On Fri, Sep 1, 2017 at 8:52 AM, Kasturi Narra  wrote:

> Hi Charles,
>
>   One question, while configuring a storage domain  you are saying
> "host to use: " node1,  then in the connection details you say node1:/data.
> What about the backup-volfile-servers option in the UI while configuring
> storage domain? Are you specifying that too?
>
> Thanks
> kasturi
>
>
> On Fri, Sep 1, 2017 at 5:52 PM, Charles Kozler 
> wrote:
>
>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>> stores metadata, no actual files. So yes, you were running in degraded mode
>> so some operations were hindered.
>>
>> @ Sahina - Yes, this actually worked fine for me once I did that.
>> However, the issue I am still facing, is when I go to create a new gluster
>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>> I select that host. If I fail that host, all VMs halt. I do not recall this
>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>> to a volume and vice versa like you could, for instance, for a singular
>> hyperconverged to ex: export a local disk via NFS and then mount it via
>> ovirt domain. But of course, this has its caveats. To that end, I am using
>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>> in the connection details I give it node1:/data. I fail node1, all VMs
>> halt. Did I miss something?
>>
>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>
>>> To the OP question, when you set up a gluster storage domain, you need
>>> to specify backup-volfile-servers=: where server2 and
>>> server3 also have bricks running. When server1 is down, and the volume is
>>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>>
>>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>>
>>> If VMs go to paused state - this could mean the storage is not
>>> available. You can check "gluster volume status " to see if
>>> atleast 2 bricks are running.
>>>
>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
>>> wrote:
>>>
 If gluster drops in quorum so that it has less votes than it should it
 will stop file operations until quorum is back to normal.If i rember it
 right you need two bricks to write for quorum to be met and that the
 arbiter only is a vote to avoid split brain.


 Basically what you have is a raid5 solution without a spare. And when
 one disk dies it will run in degraded mode. And some raid systems will stop
 the raid until you have removed the disk or forced it to run anyway.

 You can read up on it here: https://gluster.readthed
 ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

 /Johan

 On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

 Hi all:

 Sorry to hijack the thread, but I was about to start essentially the
 same thread.

 I have a 3 node cluster, all three are hosts and gluster nodes (replica
 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:

 storage=192.168.8.11:/engine
 mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

 I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
 paused, including the engine (all VMs were running on host2:192.168.8.12).
 I couldn't get any gluster stuff working until host1 (192.168.8.11) was
 restored.

 What's wrong / what did I miss?

 (this was set up "manually" through the article on setting up
 self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
 since).

 Thanks!
 --Jim


 On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
 wrote:

 Typo..."Set it up and then failed that **HOST**"

 And upon that host going down, the storage domain went down. I only
 have hosted storage domain and this new one - is this why the DC went down
 and no SPM could be elected?

 I dont recall this working this way in early 4.0 or 3.6

 On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
 wrote:

 So I've tested this today and I failed a node. Specifically, I setup a
 glusterfs domain and selected "host to use: node1". Set it up and then
 failed that VM

 However, this did not work and the datacenter went down. My engine
 stayed up, however, it seems 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Kasturi Narra
Hi Charles,

  One question, while configuring a storage domain  you are saying
"host to use: " node1,  then in the connection details you say node1:/data.
What about the backup-volfile-servers option in the UI while configuring
storage domain? Are you specifying that too?

Thanks
kasturi


On Fri, Sep 1, 2017 at 5:52 PM, Charles Kozler  wrote:

> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
> stores metadata, no actual files. So yes, you were running in degraded mode
> so some operations were hindered.
>
> @ Sahina - Yes, this actually worked fine for me once I did that. However,
> the issue I am still facing, is when I go to create a new gluster storage
> domain (replica 3, hyperconverged) and I tell it "Host to use" and I select
> that host. If I fail that host, all VMs halt. I do not recall this in 3.6
> or early 4.0. This to me makes it seem like this is "pinning" a node to a
> volume and vice versa like you could, for instance, for a singular
> hyperconverged to ex: export a local disk via NFS and then mount it via
> ovirt domain. But of course, this has its caveats. To that end, I am using
> gluster replica 3, when configuring it I say "host to use: " node 1, then
> in the connection details I give it node1:/data. I fail node1, all VMs
> halt. Did I miss something?
>
> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>
>> To the OP question, when you set up a gluster storage domain, you need to
>> specify backup-volfile-servers=: where server2 and
>> server3 also have bricks running. When server1 is down, and the volume is
>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>
>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>
>> If VMs go to paused state - this could mean the storage is not available.
>> You can check "gluster volume status " to see if atleast 2 bricks
>> are running.
>>
>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
>> wrote:
>>
>>> If gluster drops in quorum so that it has less votes than it should it
>>> will stop file operations until quorum is back to normal.If i rember it
>>> right you need two bricks to write for quorum to be met and that the
>>> arbiter only is a vote to avoid split brain.
>>>
>>>
>>> Basically what you have is a raid5 solution without a spare. And when
>>> one disk dies it will run in degraded mode. And some raid systems will stop
>>> the raid until you have removed the disk or forced it to run anyway.
>>>
>>> You can read up on it here: https://gluster.readthed
>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
>>>
>>> /Johan
>>>
>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>>>
>>> Hi all:
>>>
>>> Sorry to hijack the thread, but I was about to start essentially the
>>> same thread.
>>>
>>> I have a 3 node cluster, all three are hosts and gluster nodes (replica
>>> 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:
>>>
>>> storage=192.168.8.11:/engine
>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>>>
>>> I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
>>> paused, including the engine (all VMs were running on host2:192.168.8.12).
>>> I couldn't get any gluster stuff working until host1 (192.168.8.11) was
>>> restored.
>>>
>>> What's wrong / what did I miss?
>>>
>>> (this was set up "manually" through the article on setting up
>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
>>> since).
>>>
>>> Thanks!
>>> --Jim
>>>
>>>
>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
>>> wrote:
>>>
>>> Typo..."Set it up and then failed that **HOST**"
>>>
>>> And upon that host going down, the storage domain went down. I only have
>>> hosted storage domain and this new one - is this why the DC went down and
>>> no SPM could be elected?
>>>
>>> I dont recall this working this way in early 4.0 or 3.6
>>>
>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
>>> wrote:
>>>
>>> So I've tested this today and I failed a node. Specifically, I setup a
>>> glusterfs domain and selected "host to use: node1". Set it up and then
>>> failed that VM
>>>
>>> However, this did not work and the datacenter went down. My engine
>>> stayed up, however, it seems configuring a domain to pin to a host to use
>>> will obviously cause it to fail
>>>
>>> This seems counter-intuitive to the point of glusterfs or any redundant
>>> storage. If a single host has to be tied to its function, this introduces a
>>> single point of failure
>>>
>>> Am I missing something obvious?
>>>
>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra 
>>> wrote:
>>>
>>> yes, right.  What you can do is edit the hosted-engine.conf file and
>>> there is a parameter as shown below [1] and replace h2 and h3 with 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
@ Jim - you have only two data volumes and lost quorum. Arbitrator only
stores metadata, no actual files. So yes, you were running in degraded mode
so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However,
the issue I am still facing, is when I go to create a new gluster storage
domain (replica 3, hyperconverged) and I tell it "Host to use" and I select
that host. If I fail that host, all VMs halt. I do not recall this in 3.6
or early 4.0. This to me makes it seem like this is "pinning" a node to a
volume and vice versa like you could, for instance, for a singular
hyperconverged to ex: export a local disk via NFS and then mount it via
ovirt domain. But of course, this has its caveats. To that end, I am using
gluster replica 3, when configuring it I say "host to use: " node 1, then
in the connection details I give it node1:/data. I fail node1, all VMs
halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:

> To the OP question, when you set up a gluster storage domain, you need to
> specify backup-volfile-servers=: where server2 and
> server3 also have bricks running. When server1 is down, and the volume is
> mounted again - server2 or server3 are queried to get the gluster volfiles.
>
> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
> access? If not, please provide the vdsm and gluster mount logs to analyse
>
> If VMs go to paused state - this could mean the storage is not available.
> You can check "gluster volume status " to see if atleast 2 bricks
> are running.
>
> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
> wrote:
>
>> If gluster drops in quorum so that it has less votes than it should it
>> will stop file operations until quorum is back to normal.If i rember it
>> right you need two bricks to write for quorum to be met and that the
>> arbiter only is a vote to avoid split brain.
>>
>>
>> Basically what you have is a raid5 solution without a spare. And when one
>> disk dies it will run in degraded mode. And some raid systems will stop the
>> raid until you have removed the disk or forced it to run anyway.
>>
>> You can read up on it here: https://gluster.readthed
>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
>>
>> /Johan
>>
>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>>
>> Hi all:
>>
>> Sorry to hijack the thread, but I was about to start essentially the same
>> thread.
>>
>> I have a 3 node cluster, all three are hosts and gluster nodes (replica 2
>> + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:
>>
>> storage=192.168.8.11:/engine
>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>>
>> I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
>> paused, including the engine (all VMs were running on host2:192.168.8.12).
>> I couldn't get any gluster stuff working until host1 (192.168.8.11) was
>> restored.
>>
>> What's wrong / what did I miss?
>>
>> (this was set up "manually" through the article on setting up self-hosted
>> gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).
>>
>> Thanks!
>> --Jim
>>
>>
>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
>> wrote:
>>
>> Typo..."Set it up and then failed that **HOST**"
>>
>> And upon that host going down, the storage domain went down. I only have
>> hosted storage domain and this new one - is this why the DC went down and
>> no SPM could be elected?
>>
>> I dont recall this working this way in early 4.0 or 3.6
>>
>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
>> wrote:
>>
>> So I've tested this today and I failed a node. Specifically, I setup a
>> glusterfs domain and selected "host to use: node1". Set it up and then
>> failed that VM
>>
>> However, this did not work and the datacenter went down. My engine stayed
>> up, however, it seems configuring a domain to pin to a host to use will
>> obviously cause it to fail
>>
>> This seems counter-intuitive to the point of glusterfs or any redundant
>> storage. If a single host has to be tied to its function, this introduces a
>> single point of failure
>>
>> Am I missing something obvious?
>>
>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra  wrote:
>>
>> yes, right.  What you can do is edit the hosted-engine.conf file and
>> there is a parameter as shown below [1] and replace h2 and h3 with your
>> second and third storage servers. Then you will need to restart
>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes .
>>
>> [1] 'mnt_options=backup-volfile-servers=:'
>>
>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler 
>> wrote:
>>
>> Hi Kasturi -
>>
>> Thanks for feedback
>>
>> > If cockpit+gdeploy plugin would be have been used then that would have
>> automatically detected glusterfs replica 3 volume created during Hosted
>> Engine deployment and this 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Sahina Bose
To the OP question, when you set up a gluster storage domain, you need to
specify backup-volfile-servers=: where server2 and
server3 also have bricks running. When server1 is down, and the volume is
mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi
access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available.
You can check "gluster volume status " to see if atleast 2 bricks
are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson  wrote:

> If gluster drops in quorum so that it has less votes than it should it
> will stop file operations until quorum is back to normal.If i rember it
> right you need two bricks to write for quorum to be met and that the
> arbiter only is a vote to avoid split brain.
>
>
> Basically what you have is a raid5 solution without a spare. And when one
> disk dies it will run in degraded mode. And some raid systems will stop the
> raid until you have removed the disk or forced it to run anyway.
>
> You can read up on it here: https://gluster.readthedocs.io/en/latest/
> Administrator%20Guide/arbiter-volumes-and-quorum/
>
> /Johan
>
> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>
> Hi all:
>
> Sorry to hijack the thread, but I was about to start essentially the same
> thread.
>
> I have a 3 node cluster, all three are hosts and gluster nodes (replica 2
> + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:
>
> storage=192.168.8.11:/engine
> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>
> I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
> paused, including the engine (all VMs were running on host2:192.168.8.12).
> I couldn't get any gluster stuff working until host1 (192.168.8.11) was
> restored.
>
> What's wrong / what did I miss?
>
> (this was set up "manually" through the article on setting up self-hosted
> gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).
>
> Thanks!
> --Jim
>
>
> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
> wrote:
>
> Typo..."Set it up and then failed that **HOST**"
>
> And upon that host going down, the storage domain went down. I only have
> hosted storage domain and this new one - is this why the DC went down and
> no SPM could be elected?
>
> I dont recall this working this way in early 4.0 or 3.6
>
> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
> wrote:
>
> So I've tested this today and I failed a node. Specifically, I setup a
> glusterfs domain and selected "host to use: node1". Set it up and then
> failed that VM
>
> However, this did not work and the datacenter went down. My engine stayed
> up, however, it seems configuring a domain to pin to a host to use will
> obviously cause it to fail
>
> This seems counter-intuitive to the point of glusterfs or any redundant
> storage. If a single host has to be tied to its function, this introduces a
> single point of failure
>
> Am I missing something obvious?
>
> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra  wrote:
>
> yes, right.  What you can do is edit the hosted-engine.conf file and there
> is a parameter as shown below [1] and replace h2 and h3 with your second
> and third storage servers. Then you will need to restart ovirt-ha-agent and
> ovirt-ha-broker services in all the nodes .
>
> [1] 'mnt_options=backup-volfile-servers=:'
>
> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler 
> wrote:
>
> Hi Kasturi -
>
> Thanks for feedback
>
> > If cockpit+gdeploy plugin would be have been used then that would have
> automatically detected glusterfs replica 3 volume created during Hosted
> Engine deployment and this question would not have been asked
>
> Actually, doing hosted-engine --deploy it too also auto detects
> glusterfs.  I know glusterfs fuse client has the ability to failover
> between all nodes in cluster, but I am still curious given the fact that I
> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine
> --deploy). So my concern was to ensure and find out exactly how engine
> works when one node goes away and the fuse client moves over to the other
> node in the gluster cluster
>
> But you did somewhat answer my question, the answer seems to be no (as
> default) and I will have to use hosted-engine.conf and change the parameter
> as you list
>
> So I need to do something manual to create HA for engine on gluster? Yes?
>
> Thanks so much!
>
> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra  wrote:
>
> Hi,
>
>During Hosted Engine setup question about glusterfs volume is being
> asked because you have setup the volumes yourself. If cockpit+gdeploy
> plugin would be have been used then that would have automatically detected
> glusterfs replica 3 volume created during Hosted 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Johan Bernhardsson
If gluster drops in quorum so that it has less votes than it should it
will stop file operations until quorum is back to normal.If i rember it
right you need two bricks to write for quorum to be met and that the
arbiter only is a vote to avoid split brain.
Basically what you have is a raid5 solution without a spare. And when
one disk dies it will run in degraded mode. And some raid systems will
stop the raid until you have removed the disk or forced it to run
anyway. 
You can read up on it here: https://gluster.readthedocs.io/en/latest/Ad
ministrator%20Guide/arbiter-volumes-and-quorum/
/JohanOn Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
> Hi all:  
> 
> Sorry to hijack the thread, but I was about to start essentially the
> same thread.
> 
> I have a 3 node cluster, all three are hosts and gluster nodes
> (replica 2 + arbitrar).  I DO have the mnt_options=backup-volfile-
> servers= set:
> 
> storage=192.168.8.11:/engine
> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
> 
> I had an issue today where 192.168.8.11 went down.  ALL VMs
> immediately paused, including the engine (all VMs were running on
> host2:192.168.8.12).  I couldn't get any gluster stuff working until
> host1 (192.168.8.11) was restored.
> 
> What's wrong / what did I miss?
> 
> (this was set up "manually" through the article on setting up self-
> hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
> since).
> 
> Thanks!
> --Jim
> 
> 
> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
> m> wrote:
> > Typo..."Set it up and then failed that **HOST**"
> > 
> > And upon that host going down, the storage domain went down. I only
> > have hosted storage domain and this new one - is this why the DC
> > went down and no SPM could be elected?
> > 
> > I dont recall this working this way in early 4.0 or 3.6
> > 
> > On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
> > om> wrote:
> > > So I've tested this today and I failed a node. Specifically, I
> > > setup a glusterfs domain and selected "host to use: node1". Set
> > > it up and then failed that VM
> > > 
> > > However, this did not work and the datacenter went down. My
> > > engine stayed up, however, it seems configuring a domain to pin
> > > to a host to use will obviously cause it to fail
> > > 
> > > This seems counter-intuitive to the point of glusterfs or any
> > > redundant storage. If a single host has to be tied to its
> > > function, this introduces a single point of failure
> > > 
> > > Am I missing something obvious?
> > > 
> > > On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra  > > > wrote:
> > > > yes, right.  What you can do is edit the hosted-engine.conf
> > > > file and there is a parameter as shown below [1] and replace h2
> > > > and h3 with your second and third storage servers. Then you
> > > > will need to restart ovirt-ha-agent and ovirt-ha-broker
> > > > services in all the nodes .
> > > > 
> > > > [1] 'mnt_options=backup-volfile-servers=:' 
> > > > 
> > > > On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler 
> > > > il.com> wrote:
> > > > > Hi Kasturi -
> > > > > 
> > > > > Thanks for feedback
> > > > > 
> > > > > > If cockpit+gdeploy plugin would be have been used then that
> > > > > would have automatically detected glusterfs replica 3 volume
> > > > > created during Hosted Engine deployment and this question
> > > > > would not have been asked
> > > > >   
> > > > > Actually, doing hosted-engine --deploy it too also auto
> > > > > detects glusterfs.  I know glusterfs fuse client has the
> > > > > ability to failover between all nodes in cluster, but I am
> > > > > still curious given the fact that I see in ovirt config
> > > > > node1:/engine (being node1 I set it to in hosted-engine --
> > > > > deploy). So my concern was to ensure and find out exactly how
> > > > > engine works when one node goes away and the fuse client
> > > > > moves over to the other node in the gluster cluster
> > > > > 
> > > > > But you did somewhat answer my question, the answer seems to
> > > > > be no (as default) and I will have to use hosted-engine.conf
> > > > > and change the parameter as you list
> > > > > 
> > > > > So I need to do something manual to create HA for engine on
> > > > > gluster? Yes?
> > > > > 
> > > > > Thanks so much!
> > > > > 
> > > > > On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra 
> > > > > .com> wrote:
> > > > > > Hi,
> > > > > > 
> > > > > >    During Hosted Engine setup question about glusterfs
> > > > > > volume is being asked because you have setup the volumes
> > > > > > yourself. If cockpit+gdeploy plugin would be have been used
> > > > > > then that would have automatically detected glusterfs
> > > > > > replica 3 volume created during Hosted Engine deployment
> > > > > > and this question would not have been asked.
> > > > > > 
> > > > > >    During new storage domain creation when glusterfs is
> > > > > > selected there is a feature called 'use managed gluster
> > > > > > volumes' and upon checking 

Re: [ovirt-users] hyperconverged question

2017-08-31 Thread Jim Kusznir
Hi all:

Sorry to hijack the thread, but I was about to start essentially the same
thread.

I have a 3 node cluster, all three are hosts and gluster nodes (replica 2 +
arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:

storage=192.168.8.11:/engine
mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
paused, including the engine (all VMs were running on host2:192.168.8.12).
I couldn't get any gluster stuff working until host1 (192.168.8.11) was
restored.

What's wrong / what did I miss?

(this was set up "manually" through the article on setting up self-hosted
gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).

Thanks!
--Jim


On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
wrote:

> Typo..."Set it up and then failed that **HOST**"
>
> And upon that host going down, the storage domain went down. I only have
> hosted storage domain and this new one - is this why the DC went down and
> no SPM could be elected?
>
> I dont recall this working this way in early 4.0 or 3.6
>
> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
> wrote:
>
>> So I've tested this today and I failed a node. Specifically, I setup a
>> glusterfs domain and selected "host to use: node1". Set it up and then
>> failed that VM
>>
>> However, this did not work and the datacenter went down. My engine stayed
>> up, however, it seems configuring a domain to pin to a host to use will
>> obviously cause it to fail
>>
>> This seems counter-intuitive to the point of glusterfs or any redundant
>> storage. If a single host has to be tied to its function, this introduces a
>> single point of failure
>>
>> Am I missing something obvious?
>>
>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra  wrote:
>>
>>> yes, right.  What you can do is edit the hosted-engine.conf file and
>>> there is a parameter as shown below [1] and replace h2 and h3 with your
>>> second and third storage servers. Then you will need to restart
>>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes .
>>>
>>> [1] 'mnt_options=backup-volfile-servers=:'
>>>
>>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler 
>>> wrote:
>>>
 Hi Kasturi -

 Thanks for feedback

 > If cockpit+gdeploy plugin would be have been used then that would
 have automatically detected glusterfs replica 3 volume created during
 Hosted Engine deployment and this question would not have been asked

 Actually, doing hosted-engine --deploy it too also auto detects
 glusterfs.  I know glusterfs fuse client has the ability to failover
 between all nodes in cluster, but I am still curious given the fact that I
 see in ovirt config node1:/engine (being node1 I set it to in hosted-engine
 --deploy). So my concern was to ensure and find out exactly how engine
 works when one node goes away and the fuse client moves over to the other
 node in the gluster cluster

 But you did somewhat answer my question, the answer seems to be no (as
 default) and I will have to use hosted-engine.conf and change the parameter
 as you list

 So I need to do something manual to create HA for engine on gluster?
 Yes?

 Thanks so much!

 On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra 
 wrote:

> Hi,
>
>During Hosted Engine setup question about glusterfs volume is being
> asked because you have setup the volumes yourself. If cockpit+gdeploy
> plugin would be have been used then that would have automatically detected
> glusterfs replica 3 volume created during Hosted Engine deployment and 
> this
> question would not have been asked.
>
>During new storage domain creation when glusterfs is selected there
> is a feature called 'use managed gluster volumes' and upon checking this
> all glusterfs volumes managed will be listed and you could choose the
> volume of your choice from the dropdown list.
>
> There is a conf file called /etc/hosted-engine/hosted-engine.conf
> where there is a parameter called backup-volfile-servers="h1:h2" and if 
> one
> of the gluster node goes down engine uses this parameter to provide ha /
> failover.
>
>  Hope this helps !!
>
> Thanks
> kasturi
>
>
>
> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler 
> wrote:
>
>> Hello -
>>
>> I have successfully created a hyperconverged hosted engine setup
>> consisting of 3 nodes - 2 for VM's and the third purely for storage. I
>> manually configured it all, did not use ovirt node or anything. Built the
>> gluster volumes myself
>>
>> However, I noticed that when setting up the hosted engine and even
>> when adding a new storage domain with glusterfs type, it 

Re: [ovirt-users] hyperconverged question

2017-08-31 Thread Charles Kozler
Typo..."Set it up and then failed that **HOST**"

And upon that host going down, the storage domain went down. I only have
hosted storage domain and this new one - is this why the DC went down and
no SPM could be elected?

I dont recall this working this way in early 4.0 or 3.6

On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
wrote:

> So I've tested this today and I failed a node. Specifically, I setup a
> glusterfs domain and selected "host to use: node1". Set it up and then
> failed that VM
>
> However, this did not work and the datacenter went down. My engine stayed
> up, however, it seems configuring a domain to pin to a host to use will
> obviously cause it to fail
>
> This seems counter-intuitive to the point of glusterfs or any redundant
> storage. If a single host has to be tied to its function, this introduces a
> single point of failure
>
> Am I missing something obvious?
>
> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra  wrote:
>
>> yes, right.  What you can do is edit the hosted-engine.conf file and
>> there is a parameter as shown below [1] and replace h2 and h3 with your
>> second and third storage servers. Then you will need to restart
>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes .
>>
>> [1] 'mnt_options=backup-volfile-servers=:'
>>
>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler 
>> wrote:
>>
>>> Hi Kasturi -
>>>
>>> Thanks for feedback
>>>
>>> > If cockpit+gdeploy plugin would be have been used then that would
>>> have automatically detected glusterfs replica 3 volume created during
>>> Hosted Engine deployment and this question would not have been asked
>>>
>>> Actually, doing hosted-engine --deploy it too also auto detects
>>> glusterfs.  I know glusterfs fuse client has the ability to failover
>>> between all nodes in cluster, but I am still curious given the fact that I
>>> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine
>>> --deploy). So my concern was to ensure and find out exactly how engine
>>> works when one node goes away and the fuse client moves over to the other
>>> node in the gluster cluster
>>>
>>> But you did somewhat answer my question, the answer seems to be no (as
>>> default) and I will have to use hosted-engine.conf and change the parameter
>>> as you list
>>>
>>> So I need to do something manual to create HA for engine on gluster? Yes?
>>>
>>> Thanks so much!
>>>
>>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra 
>>> wrote:
>>>
 Hi,

During Hosted Engine setup question about glusterfs volume is being
 asked because you have setup the volumes yourself. If cockpit+gdeploy
 plugin would be have been used then that would have automatically detected
 glusterfs replica 3 volume created during Hosted Engine deployment and this
 question would not have been asked.

During new storage domain creation when glusterfs is selected there
 is a feature called 'use managed gluster volumes' and upon checking this
 all glusterfs volumes managed will be listed and you could choose the
 volume of your choice from the dropdown list.

 There is a conf file called /etc/hosted-engine/hosted-engine.conf
 where there is a parameter called backup-volfile-servers="h1:h2" and if one
 of the gluster node goes down engine uses this parameter to provide ha /
 failover.

  Hope this helps !!

 Thanks
 kasturi



 On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler 
 wrote:

> Hello -
>
> I have successfully created a hyperconverged hosted engine setup
> consisting of 3 nodes - 2 for VM's and the third purely for storage. I
> manually configured it all, did not use ovirt node or anything. Built the
> gluster volumes myself
>
> However, I noticed that when setting up the hosted engine and even
> when adding a new storage domain with glusterfs type, it still asks for
> hostname:/volumename
>
> This leads me to believe that if that one node goes down (ex:
> node1:/data), then ovirt engine wont be able to communicate with that
> volume because its trying to reach it on node 1 and thus, go down
>
> I know glusterfs fuse client can connect to all nodes to provide
> failover/ha but how does the engine handle this?
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>

>>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-08-31 Thread Charles Kozler
So I've tested this today and I failed a node. Specifically, I setup a
glusterfs domain and selected "host to use: node1". Set it up and then
failed that VM

However, this did not work and the datacenter went down. My engine stayed
up, however, it seems configuring a domain to pin to a host to use will
obviously cause it to fail

This seems counter-intuitive to the point of glusterfs or any redundant
storage. If a single host has to be tied to its function, this introduces a
single point of failure

Am I missing something obvious?

On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra  wrote:

> yes, right.  What you can do is edit the hosted-engine.conf file and there
> is a parameter as shown below [1] and replace h2 and h3 with your second
> and third storage servers. Then you will need to restart ovirt-ha-agent and
> ovirt-ha-broker services in all the nodes .
>
> [1] 'mnt_options=backup-volfile-servers=:'
>
> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler 
> wrote:
>
>> Hi Kasturi -
>>
>> Thanks for feedback
>>
>> > If cockpit+gdeploy plugin would be have been used then that would have
>> automatically detected glusterfs replica 3 volume created during Hosted
>> Engine deployment and this question would not have been asked
>>
>> Actually, doing hosted-engine --deploy it too also auto detects
>> glusterfs.  I know glusterfs fuse client has the ability to failover
>> between all nodes in cluster, but I am still curious given the fact that I
>> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine
>> --deploy). So my concern was to ensure and find out exactly how engine
>> works when one node goes away and the fuse client moves over to the other
>> node in the gluster cluster
>>
>> But you did somewhat answer my question, the answer seems to be no (as
>> default) and I will have to use hosted-engine.conf and change the parameter
>> as you list
>>
>> So I need to do something manual to create HA for engine on gluster? Yes?
>>
>> Thanks so much!
>>
>> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra  wrote:
>>
>>> Hi,
>>>
>>>During Hosted Engine setup question about glusterfs volume is being
>>> asked because you have setup the volumes yourself. If cockpit+gdeploy
>>> plugin would be have been used then that would have automatically detected
>>> glusterfs replica 3 volume created during Hosted Engine deployment and this
>>> question would not have been asked.
>>>
>>>During new storage domain creation when glusterfs is selected there
>>> is a feature called 'use managed gluster volumes' and upon checking this
>>> all glusterfs volumes managed will be listed and you could choose the
>>> volume of your choice from the dropdown list.
>>>
>>> There is a conf file called /etc/hosted-engine/hosted-engine.conf
>>> where there is a parameter called backup-volfile-servers="h1:h2" and if one
>>> of the gluster node goes down engine uses this parameter to provide ha /
>>> failover.
>>>
>>>  Hope this helps !!
>>>
>>> Thanks
>>> kasturi
>>>
>>>
>>>
>>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler 
>>> wrote:
>>>
 Hello -

 I have successfully created a hyperconverged hosted engine setup
 consisting of 3 nodes - 2 for VM's and the third purely for storage. I
 manually configured it all, did not use ovirt node or anything. Built the
 gluster volumes myself

 However, I noticed that when setting up the hosted engine and even when
 adding a new storage domain with glusterfs type, it still asks for
 hostname:/volumename

 This leads me to believe that if that one node goes down (ex:
 node1:/data), then ovirt engine wont be able to communicate with that
 volume because its trying to reach it on node 1 and thus, go down

 I know glusterfs fuse client can connect to all nodes to provide
 failover/ha but how does the engine handle this?

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


>>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-08-31 Thread Kasturi Narra
yes, right.  What you can do is edit the hosted-engine.conf file and there
is a parameter as shown below [1] and replace h2 and h3 with your second
and third storage servers. Then you will need to restart ovirt-ha-agent and
ovirt-ha-broker services in all the nodes .

[1] 'mnt_options=backup-volfile-servers=:'

On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler 
wrote:

> Hi Kasturi -
>
> Thanks for feedback
>
> > If cockpit+gdeploy plugin would be have been used then that would have
> automatically detected glusterfs replica 3 volume created during Hosted
> Engine deployment and this question would not have been asked
>
> Actually, doing hosted-engine --deploy it too also auto detects
> glusterfs.  I know glusterfs fuse client has the ability to failover
> between all nodes in cluster, but I am still curious given the fact that I
> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine
> --deploy). So my concern was to ensure and find out exactly how engine
> works when one node goes away and the fuse client moves over to the other
> node in the gluster cluster
>
> But you did somewhat answer my question, the answer seems to be no (as
> default) and I will have to use hosted-engine.conf and change the parameter
> as you list
>
> So I need to do something manual to create HA for engine on gluster? Yes?
>
> Thanks so much!
>
> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra  wrote:
>
>> Hi,
>>
>>During Hosted Engine setup question about glusterfs volume is being
>> asked because you have setup the volumes yourself. If cockpit+gdeploy
>> plugin would be have been used then that would have automatically detected
>> glusterfs replica 3 volume created during Hosted Engine deployment and this
>> question would not have been asked.
>>
>>During new storage domain creation when glusterfs is selected there is
>> a feature called 'use managed gluster volumes' and upon checking this all
>> glusterfs volumes managed will be listed and you could choose the volume of
>> your choice from the dropdown list.
>>
>> There is a conf file called /etc/hosted-engine/hosted-engine.conf
>> where there is a parameter called backup-volfile-servers="h1:h2" and if one
>> of the gluster node goes down engine uses this parameter to provide ha /
>> failover.
>>
>>  Hope this helps !!
>>
>> Thanks
>> kasturi
>>
>>
>>
>> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler 
>> wrote:
>>
>>> Hello -
>>>
>>> I have successfully created a hyperconverged hosted engine setup
>>> consisting of 3 nodes - 2 for VM's and the third purely for storage. I
>>> manually configured it all, did not use ovirt node or anything. Built the
>>> gluster volumes myself
>>>
>>> However, I noticed that when setting up the hosted engine and even when
>>> adding a new storage domain with glusterfs type, it still asks for
>>> hostname:/volumename
>>>
>>> This leads me to believe that if that one node goes down (ex:
>>> node1:/data), then ovirt engine wont be able to communicate with that
>>> volume because its trying to reach it on node 1 and thus, go down
>>>
>>> I know glusterfs fuse client can connect to all nodes to provide
>>> failover/ha but how does the engine handle this?
>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-08-31 Thread Charles Kozler
Hi Kasturi -

Thanks for feedback

> If cockpit+gdeploy plugin would be have been used then that would have
automatically detected glusterfs replica 3 volume created during Hosted
Engine deployment and this question would not have been asked

Actually, doing hosted-engine --deploy it too also auto detects glusterfs.
I know glusterfs fuse client has the ability to failover between all nodes
in cluster, but I am still curious given the fact that I see in ovirt
config node1:/engine (being node1 I set it to in hosted-engine --deploy).
So my concern was to ensure and find out exactly how engine works when one
node goes away and the fuse client moves over to the other node in the
gluster cluster

But you did somewhat answer my question, the answer seems to be no (as
default) and I will have to use hosted-engine.conf and change the parameter
as you list

So I need to do something manual to create HA for engine on gluster? Yes?

Thanks so much!

On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra  wrote:

> Hi,
>
>During Hosted Engine setup question about glusterfs volume is being
> asked because you have setup the volumes yourself. If cockpit+gdeploy
> plugin would be have been used then that would have automatically detected
> glusterfs replica 3 volume created during Hosted Engine deployment and this
> question would not have been asked.
>
>During new storage domain creation when glusterfs is selected there is
> a feature called 'use managed gluster volumes' and upon checking this all
> glusterfs volumes managed will be listed and you could choose the volume of
> your choice from the dropdown list.
>
> There is a conf file called /etc/hosted-engine/hosted-engine.conf
> where there is a parameter called backup-volfile-servers="h1:h2" and if one
> of the gluster node goes down engine uses this parameter to provide ha /
> failover.
>
>  Hope this helps !!
>
> Thanks
> kasturi
>
>
>
> On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler 
> wrote:
>
>> Hello -
>>
>> I have successfully created a hyperconverged hosted engine setup
>> consisting of 3 nodes - 2 for VM's and the third purely for storage. I
>> manually configured it all, did not use ovirt node or anything. Built the
>> gluster volumes myself
>>
>> However, I noticed that when setting up the hosted engine and even when
>> adding a new storage domain with glusterfs type, it still asks for
>> hostname:/volumename
>>
>> This leads me to believe that if that one node goes down (ex:
>> node1:/data), then ovirt engine wont be able to communicate with that
>> volume because its trying to reach it on node 1 and thus, go down
>>
>> I know glusterfs fuse client can connect to all nodes to provide
>> failover/ha but how does the engine handle this?
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-08-31 Thread Kasturi Narra
Hi,

   During Hosted Engine setup question about glusterfs volume is being
asked because you have setup the volumes yourself. If cockpit+gdeploy
plugin would be have been used then that would have automatically detected
glusterfs replica 3 volume created during Hosted Engine deployment and this
question would not have been asked.

   During new storage domain creation when glusterfs is selected there is a
feature called 'use managed gluster volumes' and upon checking this all
glusterfs volumes managed will be listed and you could choose the volume of
your choice from the dropdown list.

There is a conf file called /etc/hosted-engine/hosted-engine.conf where
there is a parameter called backup-volfile-servers="h1:h2" and if one of
the gluster node goes down engine uses this parameter to provide ha /
failover.

 Hope this helps !!

Thanks
kasturi



On Wed, Aug 30, 2017 at 8:09 PM, Charles Kozler 
wrote:

> Hello -
>
> I have successfully created a hyperconverged hosted engine setup
> consisting of 3 nodes - 2 for VM's and the third purely for storage. I
> manually configured it all, did not use ovirt node or anything. Built the
> gluster volumes myself
>
> However, I noticed that when setting up the hosted engine and even when
> adding a new storage domain with glusterfs type, it still asks for
> hostname:/volumename
>
> This leads me to believe that if that one node goes down (ex:
> node1:/data), then ovirt engine wont be able to communicate with that
> volume because its trying to reach it on node 1 and thus, go down
>
> I know glusterfs fuse client can connect to all nodes to provide
> failover/ha but how does the engine handle this?
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users