Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
Jim -

result of this test...engine crashed but all VM's on the gluster domain
(backed by the same physical nodes/hardware/gluster process/etc) stayed up
fine

I guess there is some functional difference between 'backupvolfile-server'
and 'backup-volfile-servers'?

Perhaps try latter and see what happens. My next test is going to be to
configure hosted-engine.conf with backupvolfile-server=node2:node3 and see
if engine VM still shuts down. Seems odd engine VM would shut itself down
(or vdsm would shut it down) but not other VMs. Perhaps built in HA
functionality of sorts

On Fri, Sep 1, 2017 at 7:38 PM, Charles Kozler  wrote:

> Jim -
>
> One thing I noticed is that, by accident, I used
> 'backupvolfile-server=node2:node3' which is apparently a supported
> setting. It would appear, by reading the man page of mount.glusterfs, the
> syntax is slightly different. not sure if my setting being different has
> different impacts
>
> hosted-engine.conf:
>
> # cat /etc/ovirt-hosted-engine/hosted-engine.conf | grep -i option
> mnt_options=backup-volfile-servers=node2:node3
>
> And for my datatest gluster domain I have:
>
> backupvolfile-server=node2:node3
>
> I am now curious what happens when I move everything to node1 and drop
> node2
>
> To that end, will follow up with that test
>
>
>
>
> On Fri, Sep 1, 2017 at 7:20 PM, Charles Kozler 
> wrote:
>
>> Jim -
>>
>> here is my test:
>>
>> - All VM's on node2: hosted engine and 1 test VM
>> - Test VM on gluster storage domain (with mount options set)
>> - hosted engine is on gluster as well, with settings persisted to
>> hosted-engine.conf for backupvol
>>
>> All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or
>> an issue or anything
>>
>> However, what I did notice during this, is my /datatest volume doesnt
>> have quorum set. So I will set that now and report back what happens
>>
>> # gluster volume info datatest
>>
>> Volume Name: datatest
>> Type: Replicate
>> Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: node1:/gluster/data/datatest/brick1
>> Brick2: node2:/gluster/data/datatest/brick1
>> Brick3: node3:/gluster/data/datatest/brick1
>> Options Reconfigured:
>> transport.address-family: inet
>> nfs.disable: on
>>
>> Perhaps quorum may be more trouble than its worth when you have 3 nodes
>> and/or 2 nodes + arbiter?
>>
>> Since I am keeping my 3rd node out of ovirt, I am more content on keeping
>> it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps
>> my storage 100% quorum
>>
>> On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir  wrote:
>>
>>> I can confirm that I did set it up manually, and I did specify
>>> backupvol, and in the "manage domain" storage settings, I do have under
>>> mount options, backup-volfile-servers=192.168.8.12:192.168.8.13  (and
>>> this was done at initial install time).
>>>
>>> The "used managed gluster" checkbox is NOT checked, and if I check it
>>> and save settings, next time I go in it is not checked.
>>>
>>> --Jim
>>>
>>> On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler 
>>> wrote:
>>>
 @ Jim - here is my setup which I will test in a few (brand new cluster)
 and report back what I found in my tests

 - 3x servers direct connected via 10Gb
 - 2 of those 3 setup in ovirt as hosts
 - Hosted engine
 - Gluster replica 3 (no arbiter) for all volumes
 - 1x engine volume gluster replica 3 manually configured (not using
 ovirt managed gluster)
 - 1x datatest volume (20gb) replica 3 manually configured (not using
 ovirt managed gluster)
 - 1x nfstest domain served from some other server in my infrastructure
 which, at the time of my original testing, was master domain

 I tested this earlier and all VMs stayed online. However, ovirt cluster
 reported DC/cluster down, all VM's stayed up

 As I am now typing this, can you confirm you setup your gluster storage
 domain with backupvol? Also, confirm you updated hosted-engine.conf with
 backupvol mount option as well?

 On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir 
 wrote:

> So, after reading the first document twice and the 2nd link thoroughly
> once, I believe that the arbitrator volume should be sufficient and count
> for replica / split brain.  EG, if any one full replica is down, and the
> arbitrator and the other replica is up, then it should have quorum and all
> should be good.
>
> I think my underlying problem has to do more with config than the
> replica state.  That said, I did size the drive on my 3rd node planning to
> have an identical copy of all data on it, so I'm still not opposed to
> making it a full replica.
>
> Did I miss something here?
>
> Thanks!
>
> On Fri, 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
Jim -

One thing I noticed is that, by accident, I used
'backupvolfile-server=node2:node3' which is apparently a supported setting.
It would appear, by reading the man page of mount.glusterfs, the syntax is
slightly different. not sure if my setting being different has different
impacts

hosted-engine.conf:

# cat /etc/ovirt-hosted-engine/hosted-engine.conf | grep -i option
mnt_options=backup-volfile-servers=node2:node3

And for my datatest gluster domain I have:

backupvolfile-server=node2:node3

I am now curious what happens when I move everything to node1 and drop node2

To that end, will follow up with that test




On Fri, Sep 1, 2017 at 7:20 PM, Charles Kozler  wrote:

> Jim -
>
> here is my test:
>
> - All VM's on node2: hosted engine and 1 test VM
> - Test VM on gluster storage domain (with mount options set)
> - hosted engine is on gluster as well, with settings persisted to
> hosted-engine.conf for backupvol
>
> All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or
> an issue or anything
>
> However, what I did notice during this, is my /datatest volume doesnt have
> quorum set. So I will set that now and report back what happens
>
> # gluster volume info datatest
>
> Volume Name: datatest
> Type: Replicate
> Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: node1:/gluster/data/datatest/brick1
> Brick2: node2:/gluster/data/datatest/brick1
> Brick3: node3:/gluster/data/datatest/brick1
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
>
> Perhaps quorum may be more trouble than its worth when you have 3 nodes
> and/or 2 nodes + arbiter?
>
> Since I am keeping my 3rd node out of ovirt, I am more content on keeping
> it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps
> my storage 100% quorum
>
> On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir  wrote:
>
>> I can confirm that I did set it up manually, and I did specify backupvol,
>> and in the "manage domain" storage settings, I do have under mount
>> options, backup-volfile-servers=192.168.8.12:192.168.8.13  (and this was
>> done at initial install time).
>>
>> The "used managed gluster" checkbox is NOT checked, and if I check it and
>> save settings, next time I go in it is not checked.
>>
>> --Jim
>>
>> On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler 
>> wrote:
>>
>>> @ Jim - here is my setup which I will test in a few (brand new cluster)
>>> and report back what I found in my tests
>>>
>>> - 3x servers direct connected via 10Gb
>>> - 2 of those 3 setup in ovirt as hosts
>>> - Hosted engine
>>> - Gluster replica 3 (no arbiter) for all volumes
>>> - 1x engine volume gluster replica 3 manually configured (not using
>>> ovirt managed gluster)
>>> - 1x datatest volume (20gb) replica 3 manually configured (not using
>>> ovirt managed gluster)
>>> - 1x nfstest domain served from some other server in my infrastructure
>>> which, at the time of my original testing, was master domain
>>>
>>> I tested this earlier and all VMs stayed online. However, ovirt cluster
>>> reported DC/cluster down, all VM's stayed up
>>>
>>> As I am now typing this, can you confirm you setup your gluster storage
>>> domain with backupvol? Also, confirm you updated hosted-engine.conf with
>>> backupvol mount option as well?
>>>
>>> On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir  wrote:
>>>
 So, after reading the first document twice and the 2nd link thoroughly
 once, I believe that the arbitrator volume should be sufficient and count
 for replica / split brain.  EG, if any one full replica is down, and the
 arbitrator and the other replica is up, then it should have quorum and all
 should be good.

 I think my underlying problem has to do more with config than the
 replica state.  That said, I did size the drive on my 3rd node planning to
 have an identical copy of all data on it, so I'm still not opposed to
 making it a full replica.

 Did I miss something here?

 Thanks!

 On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
 wrote:

> These can get a little confusing but this explains it best:
> https://gluster.readthedocs.io/en/latest/Administrator
> %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes
>
> Basically in the first paragraph they are explaining why you cant have
> HA with quorum for 2 nodes. Here is another overview doc that explains 
> some
> more
>
> http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>
> From my understanding arbiter is good for resolving split brains.
> Quorum and arbiter are two different things though quorum is a mechanism 
> to
> help you **avoid** split brain and the arbiter is to help gluster resolve
> split brain 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
Jim -

here is my test:

- All VM's on node2: hosted engine and 1 test VM
- Test VM on gluster storage domain (with mount options set)
- hosted engine is on gluster as well, with settings persisted to
hosted-engine.conf for backupvol

All VM's stayed up. Nothing in dmesg of the test vm indicating a pause or
an issue or anything

However, what I did notice during this, is my /datatest volume doesnt have
quorum set. So I will set that now and report back what happens

# gluster volume info datatest

Volume Name: datatest
Type: Replicate
Volume ID: 229c25f9-405e-4fe7-b008-1d3aea065069
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/gluster/data/datatest/brick1
Brick2: node2:/gluster/data/datatest/brick1
Brick3: node3:/gluster/data/datatest/brick1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

Perhaps quorum may be more trouble than its worth when you have 3 nodes
and/or 2 nodes + arbiter?

Since I am keeping my 3rd node out of ovirt, I am more content on keeping
it as a warm spare if I **had** to swap it in to ovirt cluster, but keeps
my storage 100% quorum

On Fri, Sep 1, 2017 at 5:18 PM, Jim Kusznir  wrote:

> I can confirm that I did set it up manually, and I did specify backupvol,
> and in the "manage domain" storage settings, I do have under mount
> options, backup-volfile-servers=192.168.8.12:192.168.8.13  (and this was
> done at initial install time).
>
> The "used managed gluster" checkbox is NOT checked, and if I check it and
> save settings, next time I go in it is not checked.
>
> --Jim
>
> On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler 
> wrote:
>
>> @ Jim - here is my setup which I will test in a few (brand new cluster)
>> and report back what I found in my tests
>>
>> - 3x servers direct connected via 10Gb
>> - 2 of those 3 setup in ovirt as hosts
>> - Hosted engine
>> - Gluster replica 3 (no arbiter) for all volumes
>> - 1x engine volume gluster replica 3 manually configured (not using ovirt
>> managed gluster)
>> - 1x datatest volume (20gb) replica 3 manually configured (not using
>> ovirt managed gluster)
>> - 1x nfstest domain served from some other server in my infrastructure
>> which, at the time of my original testing, was master domain
>>
>> I tested this earlier and all VMs stayed online. However, ovirt cluster
>> reported DC/cluster down, all VM's stayed up
>>
>> As I am now typing this, can you confirm you setup your gluster storage
>> domain with backupvol? Also, confirm you updated hosted-engine.conf with
>> backupvol mount option as well?
>>
>> On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir  wrote:
>>
>>> So, after reading the first document twice and the 2nd link thoroughly
>>> once, I believe that the arbitrator volume should be sufficient and count
>>> for replica / split brain.  EG, if any one full replica is down, and the
>>> arbitrator and the other replica is up, then it should have quorum and all
>>> should be good.
>>>
>>> I think my underlying problem has to do more with config than the
>>> replica state.  That said, I did size the drive on my 3rd node planning to
>>> have an identical copy of all data on it, so I'm still not opposed to
>>> making it a full replica.
>>>
>>> Did I miss something here?
>>>
>>> Thanks!
>>>
>>> On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
>>> wrote:
>>>
 These can get a little confusing but this explains it best:
 https://gluster.readthedocs.io/en/latest/Administrator
 %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes

 Basically in the first paragraph they are explaining why you cant have
 HA with quorum for 2 nodes. Here is another overview doc that explains some
 more

 http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

 From my understanding arbiter is good for resolving split brains.
 Quorum and arbiter are two different things though quorum is a mechanism to
 help you **avoid** split brain and the arbiter is to help gluster resolve
 split brain by voting and other internal mechanics (as outlined in link 1).
 How did you create the volume exactly - what command? It looks to me like
 you created it with 'gluster volume create replica 2 arbiter 1 {}' per
 your earlier mention of "replica 2 arbiter 1". That being said, if you did
 that and then setup quorum in the volume configuration, this would cause
 your gluster to halt up since quorum was lost (as you saw until you
 recovered node 1)

 As you can see from the docs, there is still a corner case for getting
 in to split brain with replica 3, which again, is where arbiter would help
 gluster resolve it

 I need to amend my previous statement: I was told that arbiter volume
 does not store data, only metadata. I cannot find anything in the docs
 backing this up however it would 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
I can confirm that I did set it up manually, and I did specify backupvol,
and in the "manage domain" storage settings, I do have under mount
options, backup-volfile-servers=192.168.8.12:192.168.8.13  (and this was
done at initial install time).

The "used managed gluster" checkbox is NOT checked, and if I check it and
save settings, next time I go in it is not checked.

--Jim

On Fri, Sep 1, 2017 at 2:08 PM, Charles Kozler  wrote:

> @ Jim - here is my setup which I will test in a few (brand new cluster)
> and report back what I found in my tests
>
> - 3x servers direct connected via 10Gb
> - 2 of those 3 setup in ovirt as hosts
> - Hosted engine
> - Gluster replica 3 (no arbiter) for all volumes
> - 1x engine volume gluster replica 3 manually configured (not using ovirt
> managed gluster)
> - 1x datatest volume (20gb) replica 3 manually configured (not using ovirt
> managed gluster)
> - 1x nfstest domain served from some other server in my infrastructure
> which, at the time of my original testing, was master domain
>
> I tested this earlier and all VMs stayed online. However, ovirt cluster
> reported DC/cluster down, all VM's stayed up
>
> As I am now typing this, can you confirm you setup your gluster storage
> domain with backupvol? Also, confirm you updated hosted-engine.conf with
> backupvol mount option as well?
>
> On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir  wrote:
>
>> So, after reading the first document twice and the 2nd link thoroughly
>> once, I believe that the arbitrator volume should be sufficient and count
>> for replica / split brain.  EG, if any one full replica is down, and the
>> arbitrator and the other replica is up, then it should have quorum and all
>> should be good.
>>
>> I think my underlying problem has to do more with config than the replica
>> state.  That said, I did size the drive on my 3rd node planning to have an
>> identical copy of all data on it, so I'm still not opposed to making it a
>> full replica.
>>
>> Did I miss something here?
>>
>> Thanks!
>>
>> On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
>> wrote:
>>
>>> These can get a little confusing but this explains it best:
>>> https://gluster.readthedocs.io/en/latest/Administrator
>>> %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes
>>>
>>> Basically in the first paragraph they are explaining why you cant have
>>> HA with quorum for 2 nodes. Here is another overview doc that explains some
>>> more
>>>
>>> http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>>>
>>> From my understanding arbiter is good for resolving split brains. Quorum
>>> and arbiter are two different things though quorum is a mechanism to help
>>> you **avoid** split brain and the arbiter is to help gluster resolve split
>>> brain by voting and other internal mechanics (as outlined in link 1). How
>>> did you create the volume exactly - what command? It looks to me like you
>>> created it with 'gluster volume create replica 2 arbiter 1 {}' per your
>>> earlier mention of "replica 2 arbiter 1". That being said, if you did that
>>> and then setup quorum in the volume configuration, this would cause your
>>> gluster to halt up since quorum was lost (as you saw until you recovered
>>> node 1)
>>>
>>> As you can see from the docs, there is still a corner case for getting
>>> in to split brain with replica 3, which again, is where arbiter would help
>>> gluster resolve it
>>>
>>> I need to amend my previous statement: I was told that arbiter volume
>>> does not store data, only metadata. I cannot find anything in the docs
>>> backing this up however it would make sense for it to be. That being said,
>>> in my setup, I would not include my arbiter or my third node in my ovirt VM
>>> cluster component. I would keep it completely separate
>>>
>>>
>>> On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir  wrote:
>>>
 I'm now also confused as to what the point of an arbiter is / what it
 does / why one would use it.

 On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir 
 wrote:

> Thanks for the help!
>
> Here's my gluster volume info for the data export/brick (I have 3:
> data, engine, and iso, but they're all configured the same):
>
> Volume Name: data
> Type: Replicate
> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
> Options Reconfigured:
> performance.strict-o-direct: on
> nfs.disable: on
> user.cifs: off
> network.ping-timeout: 30
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> cluster.locking-scheme: granular
> 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
@ Jim - here is my setup which I will test in a few (brand new cluster) and
report back what I found in my tests

- 3x servers direct connected via 10Gb
- 2 of those 3 setup in ovirt as hosts
- Hosted engine
- Gluster replica 3 (no arbiter) for all volumes
- 1x engine volume gluster replica 3 manually configured (not using ovirt
managed gluster)
- 1x datatest volume (20gb) replica 3 manually configured (not using ovirt
managed gluster)
- 1x nfstest domain served from some other server in my infrastructure
which, at the time of my original testing, was master domain

I tested this earlier and all VMs stayed online. However, ovirt cluster
reported DC/cluster down, all VM's stayed up

As I am now typing this, can you confirm you setup your gluster storage
domain with backupvol? Also, confirm you updated hosted-engine.conf with
backupvol mount option as well?

On Fri, Sep 1, 2017 at 4:22 PM, Jim Kusznir  wrote:

> So, after reading the first document twice and the 2nd link thoroughly
> once, I believe that the arbitrator volume should be sufficient and count
> for replica / split brain.  EG, if any one full replica is down, and the
> arbitrator and the other replica is up, then it should have quorum and all
> should be good.
>
> I think my underlying problem has to do more with config than the replica
> state.  That said, I did size the drive on my 3rd node planning to have an
> identical copy of all data on it, so I'm still not opposed to making it a
> full replica.
>
> Did I miss something here?
>
> Thanks!
>
> On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
> wrote:
>
>> These can get a little confusing but this explains it best:
>> https://gluster.readthedocs.io/en/latest/Administrator
>> %20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes
>>
>> Basically in the first paragraph they are explaining why you cant have HA
>> with quorum for 2 nodes. Here is another overview doc that explains some
>> more
>>
>> http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>>
>> From my understanding arbiter is good for resolving split brains. Quorum
>> and arbiter are two different things though quorum is a mechanism to help
>> you **avoid** split brain and the arbiter is to help gluster resolve split
>> brain by voting and other internal mechanics (as outlined in link 1). How
>> did you create the volume exactly - what command? It looks to me like you
>> created it with 'gluster volume create replica 2 arbiter 1 {}' per your
>> earlier mention of "replica 2 arbiter 1". That being said, if you did that
>> and then setup quorum in the volume configuration, this would cause your
>> gluster to halt up since quorum was lost (as you saw until you recovered
>> node 1)
>>
>> As you can see from the docs, there is still a corner case for getting in
>> to split brain with replica 3, which again, is where arbiter would help
>> gluster resolve it
>>
>> I need to amend my previous statement: I was told that arbiter volume
>> does not store data, only metadata. I cannot find anything in the docs
>> backing this up however it would make sense for it to be. That being said,
>> in my setup, I would not include my arbiter or my third node in my ovirt VM
>> cluster component. I would keep it completely separate
>>
>>
>> On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir  wrote:
>>
>>> I'm now also confused as to what the point of an arbiter is / what it
>>> does / why one would use it.
>>>
>>> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir 
>>> wrote:
>>>
 Thanks for the help!

 Here's my gluster volume info for the data export/brick (I have 3:
 data, engine, and iso, but they're all configured the same):

 Volume Name: data
 Type: Replicate
 Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
 Status: Started
 Snapshot Count: 0
 Number of Bricks: 1 x (2 + 1) = 3
 Transport-type: tcp
 Bricks:
 Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
 Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
 Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
 Options Reconfigured:
 performance.strict-o-direct: on
 nfs.disable: on
 user.cifs: off
 network.ping-timeout: 30
 cluster.shd-max-threads: 8
 cluster.shd-wait-qlength: 1
 cluster.locking-scheme: granular
 cluster.data-self-heal-algorithm: full
 performance.low-prio-threads: 32
 features.shard-block-size: 512MB
 features.shard: on
 storage.owner-gid: 36
 storage.owner-uid: 36
 cluster.server-quorum-type: server
 cluster.quorum-type: auto
 network.remote-dio: enable
 cluster.eager-lock: enable
 performance.stat-prefetch: off
 performance.io-cache: off
 performance.read-ahead: off
 performance.quick-read: off
 performance.readdir-ahead: on
 server.allow-insecure: on
 [root@ovirt1 ~]#


 all 3 of my brick nodes ARE 

[ovirt-users] ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages General SSLEngine problem

2017-09-01 Thread Gary Balliet
Good day all.

Just playing with ovirt. New to it but seems quite good.

Single instance/nfs share/centos7/ovirt 4.1



Had a power outage and this error message is in my logs whilst trying to
activate a downed host.  The snippet below is from engine.log.

2017-09-01 13:32:03,092-07 INFO
 [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor)
[] Connecting to /192.168.1.147
2017-09-01 13:32:03,097-07 ERROR
[org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) []
Unable to process messages General SSLEngine problem
2017-09-01 13:32:04,547-07 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(DefaultQuartzScheduler5) [77a871f9-4947-46c9-977f-db5f76cac358] Command
'GetAllVmStatsVDSCommand(HostName = DellServer,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='b8ceb86f-c4e1-4bbd-afad-5044ebe9eddd'})' execution failed:
VDSGenericException: VDSNetworkException: General SSLEngine problem
2017-09-01 13:32:04,547-07 INFO
 [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher]
(DefaultQuartzScheduler5) [77a871f9-4947-46c9-977f-db5f76cac358] Failed to
fetch vms info for host 'DellServer' - skipping VMs monitoring.
2017-09-01 13:32:19,548-07 INFO
 [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor)
[] Connecting to /192.168.1.147
2017-09-01 13:32:19,552-07 ERROR
[org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) []
Unable to process messages General SSLEngine problem
2017-09-01 13:32:23,115-07 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler4) [77a871f9-4947-46c9-977f-db5f76cac358] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null,
Custom Event ID: -1, Message: VDSM DellServer command GetCapabilitiesVDS
failed: General SSLEngine problem
2017-09-01 13:32:23,115-07 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler4) [77a871f9-4947-46c9-977f-db5f76cac358] Command
'org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand'
return value
'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSInfoReturn@65b16430'
2017-09-01 13:32:23,115-07 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler4) [77a871f9-4947-46c9-977f-db5f76cac358] HostName =
DellServer
2017-09-01 13:32:23,116-07 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler4) [77a871f9-4947-46c9-977f-db5f76cac358] Command
'GetCapabilitiesVDSCommand(HostName = DellServer,
VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
hostId='b8ceb86f-c4e1-4bbd-afad-5044ebe9eddd',
vds='Host[DellServer,b8ceb86f-c4e1-4bbd-afad-5044ebe9eddd]'})' execution
failed: VDSGenericException: VDSNetworkException: General SSLEngine problem
2017-09-01 13:32:23,116-07 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(DefaultQuartzScheduler4) [77a871f9-4947-46c9-977f-db5f76cac358] Failure to
refresh host 'DellServer' runtime info: VDSGenericException:
VDSNetworkException: General SSLEngine problem
2017-09-01 13:32:26,118-07 INFO
 [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor)
[] Connecting to /192.168.1.147
2017-09-01 13:32:26,122-07 ERROR
[org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) []
Unable to process messages General SSLEngine problem
2017-09-01 13:32:39,550-07 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(DefaultQuartzScheduler1) [77a871f9-4947-46c9-977f-db5f76cac358] Command
'GetAllVmStatsVDSCommand(HostName = DellServer,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='b8ceb86f-c4e1-4bbd-afad-5044ebe9eddd'})' execution failed:
VDSGenericException: VDSNetworkException: General SSLEngine problem
2017-09-01 13:32:39,551-07 INFO
 [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher]
(DefaultQuartzScheduler1) [77a871f9-4947-46c9-977f-db5f76cac358] Failed to
fetch vms info for host 'DellServer' - skipping VMs monitoring.
2017-09-01 13:32:46,125-07 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler7) [77a871f9-4947-46c9-977f-db5f76cac358] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null,
Custom Event ID: -1, Message: VDSM DellServer command GetCapabilitiesVDS
failed: General SSLEngine problem
2017-09-01 13:32:46,125-07 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler7) [77a871f9-4947-46c9-977f-db5f76cac358] Command
'org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand'
return value
'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSInfoReturn@64999c2a'
2017-09-01 13:32:46,125-07 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler7) [77a871f9-4947-46c9-977f-db5f76cac358] HostName =
DellServer
2017-09-01 13:32:46,125-07 ERROR

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
So, after reading the first document twice and the 2nd link thoroughly
once, I believe that the arbitrator volume should be sufficient and count
for replica / split brain.  EG, if any one full replica is down, and the
arbitrator and the other replica is up, then it should have quorum and all
should be good.

I think my underlying problem has to do more with config than the replica
state.  That said, I did size the drive on my 3rd node planning to have an
identical copy of all data on it, so I'm still not opposed to making it a
full replica.

Did I miss something here?

Thanks!

On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
wrote:

> These can get a little confusing but this explains it best:
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-
> volumes-and-quorum/#replica-2-and-replica-3-volumes
>
> Basically in the first paragraph they are explaining why you cant have HA
> with quorum for 2 nodes. Here is another overview doc that explains some
> more
>
> http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>
> From my understanding arbiter is good for resolving split brains. Quorum
> and arbiter are two different things though quorum is a mechanism to help
> you **avoid** split brain and the arbiter is to help gluster resolve split
> brain by voting and other internal mechanics (as outlined in link 1). How
> did you create the volume exactly - what command? It looks to me like you
> created it with 'gluster volume create replica 2 arbiter 1 {}' per your
> earlier mention of "replica 2 arbiter 1". That being said, if you did that
> and then setup quorum in the volume configuration, this would cause your
> gluster to halt up since quorum was lost (as you saw until you recovered
> node 1)
>
> As you can see from the docs, there is still a corner case for getting in
> to split brain with replica 3, which again, is where arbiter would help
> gluster resolve it
>
> I need to amend my previous statement: I was told that arbiter volume does
> not store data, only metadata. I cannot find anything in the docs backing
> this up however it would make sense for it to be. That being said, in my
> setup, I would not include my arbiter or my third node in my ovirt VM
> cluster component. I would keep it completely separate
>
>
> On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir  wrote:
>
>> I'm now also confused as to what the point of an arbiter is / what it
>> does / why one would use it.
>>
>> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir  wrote:
>>
>>> Thanks for the help!
>>>
>>> Here's my gluster volume info for the data export/brick (I have 3: data,
>>> engine, and iso, but they're all configured the same):
>>>
>>> Volume Name: data
>>> Type: Replicate
>>> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x (2 + 1) = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
>>> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
>>> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
>>> Options Reconfigured:
>>> performance.strict-o-direct: on
>>> nfs.disable: on
>>> user.cifs: off
>>> network.ping-timeout: 30
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 1
>>> cluster.locking-scheme: granular
>>> cluster.data-self-heal-algorithm: full
>>> performance.low-prio-threads: 32
>>> features.shard-block-size: 512MB
>>> features.shard: on
>>> storage.owner-gid: 36
>>> storage.owner-uid: 36
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> performance.stat-prefetch: off
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> performance.readdir-ahead: on
>>> server.allow-insecure: on
>>> [root@ovirt1 ~]#
>>>
>>>
>>> all 3 of my brick nodes ARE also members of the virtualization cluster
>>> (including ovirt3).  How can I convert it into a full replica instead of
>>> just an arbiter?
>>>
>>> Thanks!
>>> --Jim
>>>
>>> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler 
>>> wrote:
>>>
 @Kasturi - Looks good now. Cluster showed down for a moment but VM's
 stayed up in their appropriate places. Thanks!

 < Anyone on this list please feel free to correct my response to Jim if
 its wrong>

 @ Jim - If you can share your gluster volume info / status I can
 confirm (to the best of my knowledge). From my understanding, If you setup
 the volume with something like 'gluster volume set  group virt' this
 will configure some quorum options as well, Ex:
 http://i.imgur.com/Mya4N5o.png

 While, yes, you are configured for arbiter node you're still losing
 quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter
 to configure quorum which is in effect 3 writable nodes and 1 arbiter. If
 one gluster node 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
Thank you!

I created my cluster following these instructions:

https://www.ovirt.org/blog/2016/08/up-and-running-with-ovirt-4-0-and-gluster-storage/

(I built it about 10 months ago)

I used their recipe for automated gluster node creation.  Originally I
thought I had 3 replicas, then I started realizing that node 3's disk usage
was essentially nothing compared to node 1 and 2, and eventually on this
list discovered that I had an arbiter.  Currently I am running on a 1Gbps
backbone, but I can dedicate a gig port (or even do bonded gig -- my
servers have 4 1Gbps interfaces, and my switch is only used for this
cluster, so it has the ports to hook them all up).  I am planning on a
10gbps upgrade once I bring in some more cash to pay for it.

Last night, node 2 and 3 were up, and I rebooted node 1 for updates.  As
soon as it shut down, my cluster halted (including the hosted engine), and
everything went messy.  When the node came back up, I still had to recover
the hosted engine via command line, then could go in and start unpausing my
VMs.  I'm glad it happened at 8pm at night...That would have been very ugly
if it happened during the day.  I had thought I had enough redundancy in
the cluster that I could take down any 1 node and not have an issue...That
definitely is not what happened.

--Jim

On Fri, Sep 1, 2017 at 11:59 AM, Charles Kozler 
wrote:

> These can get a little confusing but this explains it best:
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-
> volumes-and-quorum/#replica-2-and-replica-3-volumes
>
> Basically in the first paragraph they are explaining why you cant have HA
> with quorum for 2 nodes. Here is another overview doc that explains some
> more
>
> http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
>
> From my understanding arbiter is good for resolving split brains. Quorum
> and arbiter are two different things though quorum is a mechanism to help
> you **avoid** split brain and the arbiter is to help gluster resolve split
> brain by voting and other internal mechanics (as outlined in link 1). How
> did you create the volume exactly - what command? It looks to me like you
> created it with 'gluster volume create replica 2 arbiter 1 {}' per your
> earlier mention of "replica 2 arbiter 1". That being said, if you did that
> and then setup quorum in the volume configuration, this would cause your
> gluster to halt up since quorum was lost (as you saw until you recovered
> node 1)
>
> As you can see from the docs, there is still a corner case for getting in
> to split brain with replica 3, which again, is where arbiter would help
> gluster resolve it
>
> I need to amend my previous statement: I was told that arbiter volume does
> not store data, only metadata. I cannot find anything in the docs backing
> this up however it would make sense for it to be. That being said, in my
> setup, I would not include my arbiter or my third node in my ovirt VM
> cluster component. I would keep it completely separate
>
>
> On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir  wrote:
>
>> I'm now also confused as to what the point of an arbiter is / what it
>> does / why one would use it.
>>
>> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir  wrote:
>>
>>> Thanks for the help!
>>>
>>> Here's my gluster volume info for the data export/brick (I have 3: data,
>>> engine, and iso, but they're all configured the same):
>>>
>>> Volume Name: data
>>> Type: Replicate
>>> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x (2 + 1) = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
>>> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
>>> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
>>> Options Reconfigured:
>>> performance.strict-o-direct: on
>>> nfs.disable: on
>>> user.cifs: off
>>> network.ping-timeout: 30
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 1
>>> cluster.locking-scheme: granular
>>> cluster.data-self-heal-algorithm: full
>>> performance.low-prio-threads: 32
>>> features.shard-block-size: 512MB
>>> features.shard: on
>>> storage.owner-gid: 36
>>> storage.owner-uid: 36
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> performance.stat-prefetch: off
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> performance.readdir-ahead: on
>>> server.allow-insecure: on
>>> [root@ovirt1 ~]#
>>>
>>>
>>> all 3 of my brick nodes ARE also members of the virtualization cluster
>>> (including ovirt3).  How can I convert it into a full replica instead of
>>> just an arbiter?
>>>
>>> Thanks!
>>> --Jim
>>>
>>> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler 
>>> wrote:
>>>
 @Kasturi - Looks good now. Cluster showed down for a moment but VM's

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread WK



On 9/1/2017 8:53 AM, Jim Kusznir wrote:
Huh...Ok., how do I convert the arbitrar to full replica, then?  I was 
misinformed when I created this setup.  I thought the arbitrator held 
enough metadata that it could validate or refudiate  any one replica 
(kinda like the parity drive for a RAID-4 array).  I was also under 
the impression that one replica  + Arbitrator is enough to keep the 
array online and functional.


I can not speak for the Ovirt implementation of Rep2+Arbiter as I've not 
used it, but on a standalone libvirt VM host cluster,   Arb does exactly 
what you want. You can lose 'one' of the two replicas and stay online. 
The Arb maintains quorum. Of course if you lose the second Replica 
before you have repaired the first failure you have completely lost your 
data as the Arb doesn't have that. So Rep2+Arb is not as SAFE as Rep3, 
however it can be faster, especially on less than 10G networks.


When any node fails, Gluster will pause for 42 seconds or so (its 
configurable) before marking the bad node as bad. Then normal activity 
will resume.


On most people's systems, the 'pause' (I think its a read-only event), 
it noticeable, but not enough to cause issue. One person has reported 
that his VMs went read-only during that period, but other have not 
reported that.


-wk
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
These can get a little confusing but this explains it best:
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#replica-2-and-replica-3-volumes

Basically in the first paragraph they are explaining why you cant have HA
with quorum for 2 nodes. Here is another overview doc that explains some
more

http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/

>From my understanding arbiter is good for resolving split brains. Quorum
and arbiter are two different things though quorum is a mechanism to help
you **avoid** split brain and the arbiter is to help gluster resolve split
brain by voting and other internal mechanics (as outlined in link 1). How
did you create the volume exactly - what command? It looks to me like you
created it with 'gluster volume create replica 2 arbiter 1 {}' per your
earlier mention of "replica 2 arbiter 1". That being said, if you did that
and then setup quorum in the volume configuration, this would cause your
gluster to halt up since quorum was lost (as you saw until you recovered
node 1)

As you can see from the docs, there is still a corner case for getting in
to split brain with replica 3, which again, is where arbiter would help
gluster resolve it

I need to amend my previous statement: I was told that arbiter volume does
not store data, only metadata. I cannot find anything in the docs backing
this up however it would make sense for it to be. That being said, in my
setup, I would not include my arbiter or my third node in my ovirt VM
cluster component. I would keep it completely separate


On Fri, Sep 1, 2017 at 2:46 PM, Jim Kusznir  wrote:

> I'm now also confused as to what the point of an arbiter is / what it does
> / why one would use it.
>
> On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir  wrote:
>
>> Thanks for the help!
>>
>> Here's my gluster volume info for the data export/brick (I have 3: data,
>> engine, and iso, but they're all configured the same):
>>
>> Volume Name: data
>> Type: Replicate
>> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
>> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
>> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
>> Options Reconfigured:
>> performance.strict-o-direct: on
>> nfs.disable: on
>> user.cifs: off
>> network.ping-timeout: 30
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 1
>> cluster.locking-scheme: granular
>> cluster.data-self-heal-algorithm: full
>> performance.low-prio-threads: 32
>> features.shard-block-size: 512MB
>> features.shard: on
>> storage.owner-gid: 36
>> storage.owner-uid: 36
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.stat-prefetch: off
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.readdir-ahead: on
>> server.allow-insecure: on
>> [root@ovirt1 ~]#
>>
>>
>> all 3 of my brick nodes ARE also members of the virtualization cluster
>> (including ovirt3).  How can I convert it into a full replica instead of
>> just an arbiter?
>>
>> Thanks!
>> --Jim
>>
>> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler 
>> wrote:
>>
>>> @Kasturi - Looks good now. Cluster showed down for a moment but VM's
>>> stayed up in their appropriate places. Thanks!
>>>
>>> < Anyone on this list please feel free to correct my response to Jim if
>>> its wrong>
>>>
>>> @ Jim - If you can share your gluster volume info / status I can confirm
>>> (to the best of my knowledge). From my understanding, If you setup the
>>> volume with something like 'gluster volume set  group virt' this will
>>> configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o
>>> .png
>>>
>>> While, yes, you are configured for arbiter node you're still losing
>>> quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter
>>> to configure quorum which is in effect 3 writable nodes and 1 arbiter. If
>>> one gluster node drops, you still have 2 up. Although in this case, you
>>> probably wouldnt need arbiter at all
>>>
>>> If you are configured, you can drop quorum settings and just let arbiter
>>> run since you're not using arbiter node in your VM cluster part (I
>>> believe), just storage cluster part. When using quorum, you need > 50% of
>>> the cluster being up at one time. Since you have 3 nodes with 1 arbiter,
>>> you're actually losing 1/2 which == 50 which == degraded / hindered gluster
>>>
>>> Again, this is to the best of my knowledge based on other quorum backed
>>> softwareand this is what I understand from testing with gluster and
>>> ovirt thus far
>>>
>>> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir 
>>> wrote:
>>>
 Huh...Ok., how do I convert the arbitrar to 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
I'm now also confused as to what the point of an arbiter is / what it does
/ why one would use it.

On Fri, Sep 1, 2017 at 11:44 AM, Jim Kusznir  wrote:

> Thanks for the help!
>
> Here's my gluster volume info for the data export/brick (I have 3: data,
> engine, and iso, but they're all configured the same):
>
> Volume Name: data
> Type: Replicate
> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
> Options Reconfigured:
> performance.strict-o-direct: on
> nfs.disable: on
> user.cifs: off
> network.ping-timeout: 30
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> performance.low-prio-threads: 32
> features.shard-block-size: 512MB
> features.shard: on
> storage.owner-gid: 36
> storage.owner-uid: 36
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.stat-prefetch: off
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.readdir-ahead: on
> server.allow-insecure: on
> [root@ovirt1 ~]#
>
>
> all 3 of my brick nodes ARE also members of the virtualization cluster
> (including ovirt3).  How can I convert it into a full replica instead of
> just an arbiter?
>
> Thanks!
> --Jim
>
> On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler 
> wrote:
>
>> @Kasturi - Looks good now. Cluster showed down for a moment but VM's
>> stayed up in their appropriate places. Thanks!
>>
>> < Anyone on this list please feel free to correct my response to Jim if
>> its wrong>
>>
>> @ Jim - If you can share your gluster volume info / status I can confirm
>> (to the best of my knowledge). From my understanding, If you setup the
>> volume with something like 'gluster volume set  group virt' this will
>> configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png
>>
>> While, yes, you are configured for arbiter node you're still losing
>> quorum by dropping from 2 -> 1. You would need 4 node with 1 being arbiter
>> to configure quorum which is in effect 3 writable nodes and 1 arbiter. If
>> one gluster node drops, you still have 2 up. Although in this case, you
>> probably wouldnt need arbiter at all
>>
>> If you are configured, you can drop quorum settings and just let arbiter
>> run since you're not using arbiter node in your VM cluster part (I
>> believe), just storage cluster part. When using quorum, you need > 50% of
>> the cluster being up at one time. Since you have 3 nodes with 1 arbiter,
>> you're actually losing 1/2 which == 50 which == degraded / hindered gluster
>>
>> Again, this is to the best of my knowledge based on other quorum backed
>> softwareand this is what I understand from testing with gluster and
>> ovirt thus far
>>
>> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir  wrote:
>>
>>> Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
>>> misinformed when I created this setup.  I thought the arbitrator held
>>> enough metadata that it could validate or refudiate  any one replica (kinda
>>> like the parity drive for a RAID-4 array).  I was also under the impression
>>> that one replica  + Arbitrator is enough to keep the array online and
>>> functional.
>>>
>>> --Jim
>>>
>>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler 
>>> wrote:
>>>
 @ Jim - you have only two data volumes and lost quorum. Arbitrator only
 stores metadata, no actual files. So yes, you were running in degraded mode
 so some operations were hindered.

 @ Sahina - Yes, this actually worked fine for me once I did that.
 However, the issue I am still facing, is when I go to create a new gluster
 storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
 I select that host. If I fail that host, all VMs halt. I do not recall this
 in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
 to a volume and vice versa like you could, for instance, for a singular
 hyperconverged to ex: export a local disk via NFS and then mount it via
 ovirt domain. But of course, this has its caveats. To that end, I am using
 gluster replica 3, when configuring it I say "host to use: " node 1, then
 in the connection details I give it node1:/data. I fail node1, all VMs
 halt. Did I miss something?

 On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:

> To the OP question, when you set up a gluster storage domain, you need
> to specify backup-volfile-servers=: where server2
> and server3 also have bricks running. When server1 is down, and the 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
Thanks for the help!

Here's my gluster volume info for the data export/brick (I have 3: data,
engine, and iso, but they're all configured the same):

Volume Name: data
Type: Replicate
Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
Options Reconfigured:
performance.strict-o-direct: on
nfs.disable: on
user.cifs: off
network.ping-timeout: 30
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 1
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
performance.low-prio-threads: 32
features.shard-block-size: 512MB
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
server.allow-insecure: on
[root@ovirt1 ~]#


all 3 of my brick nodes ARE also members of the virtualization cluster
(including ovirt3).  How can I convert it into a full replica instead of
just an arbiter?

Thanks!
--Jim

On Fri, Sep 1, 2017 at 9:09 AM, Charles Kozler  wrote:

> @Kasturi - Looks good now. Cluster showed down for a moment but VM's
> stayed up in their appropriate places. Thanks!
>
> < Anyone on this list please feel free to correct my response to Jim if
> its wrong>
>
> @ Jim - If you can share your gluster volume info / status I can confirm
> (to the best of my knowledge). From my understanding, If you setup the
> volume with something like 'gluster volume set  group virt' this will
> configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png
>
> While, yes, you are configured for arbiter node you're still losing quorum
> by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to
> configure quorum which is in effect 3 writable nodes and 1 arbiter. If one
> gluster node drops, you still have 2 up. Although in this case, you
> probably wouldnt need arbiter at all
>
> If you are configured, you can drop quorum settings and just let arbiter
> run since you're not using arbiter node in your VM cluster part (I
> believe), just storage cluster part. When using quorum, you need > 50% of
> the cluster being up at one time. Since you have 3 nodes with 1 arbiter,
> you're actually losing 1/2 which == 50 which == degraded / hindered gluster
>
> Again, this is to the best of my knowledge based on other quorum backed
> softwareand this is what I understand from testing with gluster and
> ovirt thus far
>
> On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir  wrote:
>
>> Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
>> misinformed when I created this setup.  I thought the arbitrator held
>> enough metadata that it could validate or refudiate  any one replica (kinda
>> like the parity drive for a RAID-4 array).  I was also under the impression
>> that one replica  + Arbitrator is enough to keep the array online and
>> functional.
>>
>> --Jim
>>
>> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler 
>> wrote:
>>
>>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>>> stores metadata, no actual files. So yes, you were running in degraded mode
>>> so some operations were hindered.
>>>
>>> @ Sahina - Yes, this actually worked fine for me once I did that.
>>> However, the issue I am still facing, is when I go to create a new gluster
>>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>>> I select that host. If I fail that host, all VMs halt. I do not recall this
>>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>>> to a volume and vice versa like you could, for instance, for a singular
>>> hyperconverged to ex: export a local disk via NFS and then mount it via
>>> ovirt domain. But of course, this has its caveats. To that end, I am using
>>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>>> in the connection details I give it node1:/data. I fail node1, all VMs
>>> halt. Did I miss something?
>>>
>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>>
 To the OP question, when you set up a gluster storage domain, you need
 to specify backup-volfile-servers=: where server2
 and server3 also have bricks running. When server1 is down, and the volume
 is mounted again - server2 or server3 are queried to get the gluster
 volfiles.

 @Jim, if this does not work, are you using 4.1.5 build with libgfapi
 access? If not, please provide the vdsm and gluster mount logs to analyse

 If VMs go to paused state - this could mean the storage is not
 

[ovirt-users] Slow booting host - restart loop

2017-09-01 Thread Bernardo Juanicó
Hi everyone,

I installed 2 hosts on a new cluster and the servers take a really long to
boot up (about 8 minutes).

When a host crashes or is powered off the ovirt-manager starts it via power
management, since the servers takes all that time to boot up the
ovirt-manager thinks it failed to start and proceeds to reboot it, several
times before giving up, when the server is finally started (about 20
minutes after the failure)

I changed some engine variables with engine-config trying to set a higher
timeout, but the problem persists.

Any ideas??


Regards,
Bernardo


PGP Key 
Skype: mattraken
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Native Access on gluster storage domain

2017-09-01 Thread Stefano Danzi

On host info I can see:

Cluster compatibility level: 3.6,4.0,4.1

could is this the problem?

Il 30/08/2017 16:32, Stefano Danzi ha scritto:

above the logs.
PS cluster compatibility level is 4.1

engine:

2017-08-30 16:26:07,928+02 INFO 
[org.ovirt.engine.core.bll.UpdateClusterCommand] (default task-8) 
[56d090c5-1097-4641-b745-74af8397d945] Lock Acquired to object 
'EngineLock:{exclusiveLocks='[]', sharedLocks='[]'}'
2017-08-30 16:26:07,951+02 WARN 
[org.ovirt.engine.core.bll.UpdateClusterCommand] (default task-8) 
[56d090c5-1097-4641-b745-74af8397d945] Validation of action 
'UpdateCluster' failed for user admin@internal. Reasons: 
VAR__TYPE__CLUSTER,VAR__ACTION__UPDATE,CLUSTER_CANNOT_UPDATE_SUPPORTED_FEATURES_WITH_LOWER_HOSTS
2017-08-30 16:26:07,952+02 INFO 
[org.ovirt.engine.core.bll.UpdateClusterCommand] (default task-8) 
[56d090c5-1097-4641-b745-74af8397d945] Lock freed to object 
'EngineLock:{exclusiveLocks='[]', sharedLocks='[]'}'


vdsm:

2017-08-30 16:29:23,310+0200 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] 
RPC call GlusterHost.list succeeded in 0.15 seconds (__init__:539)
2017-08-30 16:29:23,419+0200 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] 
RPC call Host.getAllVmStats succeeded in 0.01 seconds (__init__:539)
2017-08-30 16:29:23,424+0200 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] 
RPC call Host.getAllVmIoTunePolicies succeeded in 0.00 seconds 
(__init__:539)
2017-08-30 16:29:23,814+0200 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] 
RPC call GlusterHost.list succeeded in 0.15 seconds (__init__:539)
2017-08-30 16:29:24,011+0200 INFO  (Reactor thread) 
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:51862 
(protocoldetector:72)
2017-08-30 16:29:24,023+0200 INFO  (Reactor thread) 
[ProtocolDetector.Detector] Detected protocol stomp from ::1:51862 
(protocoldetector:127)
2017-08-30 16:29:24,024+0200 INFO  (Reactor thread) 
[Broker.StompAdapter] Processing CONNECT request (stompreactor:103)
2017-08-30 16:29:24,031+0200 INFO  (JsonRpc (StompReactor)) 
[Broker.StompAdapter] Subscribe command received (stompreactor:130)
2017-08-30 16:29:24,287+0200 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] 
RPC call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:539)
2017-08-30 16:29:24,443+0200 INFO  (jsonrpc/7) [vdsm.api] START 
getSpmStatus(spUUID=u'0002-0002-0002-0002-01ef', 
options=None) from=:::192.168.1.55,46502, flow_id=1f664a9, 
task_id=c856903a-0af1-4c0c-8a44-7971fee7dffa (api:46)
2017-08-30 16:29:24,446+0200 INFO  (jsonrpc/7) [vdsm.api] FINISH 
getSpmStatus return={'spm_st': {'spmId': 1, 'spmStatus': 'SPM', 
'spmLver': 1430L}} from=:::192.168.1.55,46502, flow_id=1f664a9, 
task_id=c856903a-0af1-4c0c-8a44-7971fee7dffa (api:52)
2017-08-30 16:29:24,447+0200 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] 
RPC call StoragePool.getSpmStatus succeeded in 0.00 seconds (__init__:539)
2017-08-30 16:29:24,460+0200 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] 
RPC call GlusterHost.list succeeded in 0.16 seconds (__init__:539)
2017-08-30 16:29:24,467+0200 INFO  (jsonrpc/1) [vdsm.api] START 
getStoragePoolInfo(spUUID=u'0002-0002-0002-0002-01ef', 
options=None) from=:::192.168.1.55,46506, flow_id=1f664a9, 
task_id=029ec55e-9c47-4a20-be44-8c80fd1fd5ac (api:46)


Il 30/08/2017 16:06, Shani Leviim ha scritto:

Hi Stefano,
Can you please attach your engine and vdsm logs?

*Regards,
*
*Shani Leviim
*

On Wed, Aug 30, 2017 at 12:46 PM, Stefano Danzi > wrote:


Hello,
I have a test environment with a sigle host and self hosted
engine running oVirt Engine: 4.1.5.2-1.el7.centos

I what to try the option "Native Access on gluster storage
domain" but I get an error because I have to put the
host in maintenance mode. I can't do that because I have a single
host so the hosted engine can't be migrated.

There are a way to change this option but apply it at next reboot?

___
Users mailing list
Users@ovirt.org 
http://lists.ovirt.org/mailman/listinfo/users







___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--

Stefano Danzi
Responsabile sistemi informativi

HAWAI ITALIA S.r.l.
Via Forte Garofolo, 16
37057 S. Giovanni Lupatoto Verona Italia

P. IVA 01680700232

tel. +39/045/8266400
fax +39/045/8266401
Web www.hawai.it

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Still messages about metrics related packages update after 4.1.5

2017-09-01 Thread Gianluca Cecchi
It seems I'm still having problems with metrics related packages after
upgrade to 4.1.5.
Similar to this one at 4.1 time:
http://lists.ovirt.org/pipermail/users/2017-February/079670.html

A cluster with hosts at 4.1.1 has been updated to 4.1.5
Hosts are CentOS 7.3

yum update method on host was executed to make the update, after putting it
into maintenance
And reboot done.

Now after 2 days engine still complains about updates for both hosts:

Check for available updates on host ov300 was completed successfully with
message 'found updates for packages
rubygem-fluent-plugin-collectd-nest-0.1.4-1.el7,
rubygem-fluent-plugin-viaq_data_model-0.0.5-1.el7'.

Check for available updates on host ov301 was completed successfully with
message 'found updates for packages
rubygem-fluent-plugin-collectd-nest-0.1.4-1.el7,
rubygem-fluent-plugin-viaq_data_model-0.0.5-1.el7'.

Actually the packages are not installed at all on host. Probably released
between 4. and 4.1.5.
On both hosts I have executed
yum install rubygem-fluent-plugin-collectd-nest-0.1.4-1.el7
rubygem-fluent-plugin-viaq_data_model-0.0.5-1.el7

but in my opinion it should be corrected so that in case of new packages
between minor versions, they are automatically installed during update in
some way

eg make the new packages a dependency of the new ovirt-release rpm so that
during update

Aug 29 12:13:17 Updated: ovirt-release41-4.1.5-1.el7.centos.noarch

they are automatically pulled in.

Thanks,
Gianluca
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
@Kasturi - Looks good now. Cluster showed down for a moment but VM's stayed
up in their appropriate places. Thanks!

< Anyone on this list please feel free to correct my response to Jim if its
wrong>

@ Jim - If you can share your gluster volume info / status I can confirm
(to the best of my knowledge). From my understanding, If you setup the
volume with something like 'gluster volume set  group virt' this will
configure some quorum options as well, Ex: http://i.imgur.com/Mya4N5o.png

While, yes, you are configured for arbiter node you're still losing quorum
by dropping from 2 -> 1. You would need 4 node with 1 being arbiter to
configure quorum which is in effect 3 writable nodes and 1 arbiter. If one
gluster node drops, you still have 2 up. Although in this case, you
probably wouldnt need arbiter at all

If you are configured, you can drop quorum settings and just let arbiter
run since you're not using arbiter node in your VM cluster part (I
believe), just storage cluster part. When using quorum, you need > 50% of
the cluster being up at one time. Since you have 3 nodes with 1 arbiter,
you're actually losing 1/2 which == 50 which == degraded / hindered gluster

Again, this is to the best of my knowledge based on other quorum backed
softwareand this is what I understand from testing with gluster and
ovirt thus far

On Fri, Sep 1, 2017 at 11:53 AM, Jim Kusznir  wrote:

> Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
> misinformed when I created this setup.  I thought the arbitrator held
> enough metadata that it could validate or refudiate  any one replica (kinda
> like the parity drive for a RAID-4 array).  I was also under the impression
> that one replica  + Arbitrator is enough to keep the array online and
> functional.
>
> --Jim
>
> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler 
> wrote:
>
>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>> stores metadata, no actual files. So yes, you were running in degraded mode
>> so some operations were hindered.
>>
>> @ Sahina - Yes, this actually worked fine for me once I did that.
>> However, the issue I am still facing, is when I go to create a new gluster
>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>> I select that host. If I fail that host, all VMs halt. I do not recall this
>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>> to a volume and vice versa like you could, for instance, for a singular
>> hyperconverged to ex: export a local disk via NFS and then mount it via
>> ovirt domain. But of course, this has its caveats. To that end, I am using
>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>> in the connection details I give it node1:/data. I fail node1, all VMs
>> halt. Did I miss something?
>>
>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>
>>> To the OP question, when you set up a gluster storage domain, you need
>>> to specify backup-volfile-servers=: where server2 and
>>> server3 also have bricks running. When server1 is down, and the volume is
>>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>>
>>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>>
>>> If VMs go to paused state - this could mean the storage is not
>>> available. You can check "gluster volume status " to see if
>>> atleast 2 bricks are running.
>>>
>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
>>> wrote:
>>>
 If gluster drops in quorum so that it has less votes than it should it
 will stop file operations until quorum is back to normal.If i rember it
 right you need two bricks to write for quorum to be met and that the
 arbiter only is a vote to avoid split brain.


 Basically what you have is a raid5 solution without a spare. And when
 one disk dies it will run in degraded mode. And some raid systems will stop
 the raid until you have removed the disk or forced it to run anyway.

 You can read up on it here: https://gluster.readthed
 ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

 /Johan

 On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

 Hi all:

 Sorry to hijack the thread, but I was about to start essentially the
 same thread.

 I have a 3 node cluster, all three are hosts and gluster nodes (replica
 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:

 storage=192.168.8.11:/engine
 mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

 I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
 paused, including the engine (all VMs were running on host2:192.168.8.12).
 I couldn't get any gluster stuff working until 

[ovirt-users] vm grouping

2017-09-01 Thread david caughey
Hi There,

Our 3 node Data center is up and running and I am populating it with the
required vm's.
It's a test lab so I want to provide pre fabricated environments for
different users.
I have already set up a CentOS box for nested virtualisation which works
quite well but I would really like to group multiple machines together in
to one template so that when a user deploys or chooses the template they
get all the vm's together as one. Is there a way to do this in oVirt
without nested virtualisation?
I particularly want to provide a ceph set up with 3 nodes 1 mgmt server and
a couple of clients, which I had planned to do through nested virt but
believe that multiple vm's would be cleaner.

Any suggestions as to how to achieve this?

Any help or hints would be appreciated,

BR/David
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
Speaking of the "use managed gluster", I created this gluster setup under
ovirt 4.0 when that wasn't there.  I've gone into my settings and checked
the box and saved it at least twice, but when I go back into the storage
settings, its not checked again.

The "about" box in the gui reports that I'm using this version: oVirt
Engine Version: 4.1.1.8-1.el7.centos

I thought I was staying up to date, but I'm not sure if I'm doing
everything right on the upgrade...The documentation says to click for
hosted engine upgrade instructions, which takes me to a page not found
error...For several versions now, and I haven't found those instructions,
so I've been "winging it".

--Jim

On Fri, Sep 1, 2017 at 8:53 AM, Jim Kusznir  wrote:

> Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
> misinformed when I created this setup.  I thought the arbitrator held
> enough metadata that it could validate or refudiate  any one replica (kinda
> like the parity drive for a RAID-4 array).  I was also under the impression
> that one replica  + Arbitrator is enough to keep the array online and
> functional.
>
> --Jim
>
> On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler 
> wrote:
>
>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>> stores metadata, no actual files. So yes, you were running in degraded mode
>> so some operations were hindered.
>>
>> @ Sahina - Yes, this actually worked fine for me once I did that.
>> However, the issue I am still facing, is when I go to create a new gluster
>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>> I select that host. If I fail that host, all VMs halt. I do not recall this
>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>> to a volume and vice versa like you could, for instance, for a singular
>> hyperconverged to ex: export a local disk via NFS and then mount it via
>> ovirt domain. But of course, this has its caveats. To that end, I am using
>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>> in the connection details I give it node1:/data. I fail node1, all VMs
>> halt. Did I miss something?
>>
>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>
>>> To the OP question, when you set up a gluster storage domain, you need
>>> to specify backup-volfile-servers=: where server2 and
>>> server3 also have bricks running. When server1 is down, and the volume is
>>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>>
>>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>>
>>> If VMs go to paused state - this could mean the storage is not
>>> available. You can check "gluster volume status " to see if
>>> atleast 2 bricks are running.
>>>
>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
>>> wrote:
>>>
 If gluster drops in quorum so that it has less votes than it should it
 will stop file operations until quorum is back to normal.If i rember it
 right you need two bricks to write for quorum to be met and that the
 arbiter only is a vote to avoid split brain.


 Basically what you have is a raid5 solution without a spare. And when
 one disk dies it will run in degraded mode. And some raid systems will stop
 the raid until you have removed the disk or forced it to run anyway.

 You can read up on it here: https://gluster.readthed
 ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

 /Johan

 On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

 Hi all:

 Sorry to hijack the thread, but I was about to start essentially the
 same thread.

 I have a 3 node cluster, all three are hosts and gluster nodes (replica
 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:

 storage=192.168.8.11:/engine
 mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

 I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
 paused, including the engine (all VMs were running on host2:192.168.8.12).
 I couldn't get any gluster stuff working until host1 (192.168.8.11) was
 restored.

 What's wrong / what did I miss?

 (this was set up "manually" through the article on setting up
 self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
 since).

 Thanks!
 --Jim


 On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
 wrote:

 Typo..."Set it up and then failed that **HOST**"

 And upon that host going down, the storage domain went down. I only
 have hosted storage domain and this new one - is this why the DC went down
 and no SPM could be elected?

 I dont recall this working 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Jim Kusznir
Huh...Ok., how do I convert the arbitrar to full replica, then?  I was
misinformed when I created this setup.  I thought the arbitrator held
enough metadata that it could validate or refudiate  any one replica (kinda
like the parity drive for a RAID-4 array).  I was also under the impression
that one replica  + Arbitrator is enough to keep the array online and
functional.

--Jim

On Fri, Sep 1, 2017 at 5:22 AM, Charles Kozler  wrote:

> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
> stores metadata, no actual files. So yes, you were running in degraded mode
> so some operations were hindered.
>
> @ Sahina - Yes, this actually worked fine for me once I did that. However,
> the issue I am still facing, is when I go to create a new gluster storage
> domain (replica 3, hyperconverged) and I tell it "Host to use" and I select
> that host. If I fail that host, all VMs halt. I do not recall this in 3.6
> or early 4.0. This to me makes it seem like this is "pinning" a node to a
> volume and vice versa like you could, for instance, for a singular
> hyperconverged to ex: export a local disk via NFS and then mount it via
> ovirt domain. But of course, this has its caveats. To that end, I am using
> gluster replica 3, when configuring it I say "host to use: " node 1, then
> in the connection details I give it node1:/data. I fail node1, all VMs
> halt. Did I miss something?
>
> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>
>> To the OP question, when you set up a gluster storage domain, you need to
>> specify backup-volfile-servers=: where server2 and
>> server3 also have bricks running. When server1 is down, and the volume is
>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>
>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>
>> If VMs go to paused state - this could mean the storage is not available.
>> You can check "gluster volume status " to see if atleast 2 bricks
>> are running.
>>
>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
>> wrote:
>>
>>> If gluster drops in quorum so that it has less votes than it should it
>>> will stop file operations until quorum is back to normal.If i rember it
>>> right you need two bricks to write for quorum to be met and that the
>>> arbiter only is a vote to avoid split brain.
>>>
>>>
>>> Basically what you have is a raid5 solution without a spare. And when
>>> one disk dies it will run in degraded mode. And some raid systems will stop
>>> the raid until you have removed the disk or forced it to run anyway.
>>>
>>> You can read up on it here: https://gluster.readthed
>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
>>>
>>> /Johan
>>>
>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>>>
>>> Hi all:
>>>
>>> Sorry to hijack the thread, but I was about to start essentially the
>>> same thread.
>>>
>>> I have a 3 node cluster, all three are hosts and gluster nodes (replica
>>> 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:
>>>
>>> storage=192.168.8.11:/engine
>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>>>
>>> I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
>>> paused, including the engine (all VMs were running on host2:192.168.8.12).
>>> I couldn't get any gluster stuff working until host1 (192.168.8.11) was
>>> restored.
>>>
>>> What's wrong / what did I miss?
>>>
>>> (this was set up "manually" through the article on setting up
>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
>>> since).
>>>
>>> Thanks!
>>> --Jim
>>>
>>>
>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
>>> wrote:
>>>
>>> Typo..."Set it up and then failed that **HOST**"
>>>
>>> And upon that host going down, the storage domain went down. I only have
>>> hosted storage domain and this new one - is this why the DC went down and
>>> no SPM could be elected?
>>>
>>> I dont recall this working this way in early 4.0 or 3.6
>>>
>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
>>> wrote:
>>>
>>> So I've tested this today and I failed a node. Specifically, I setup a
>>> glusterfs domain and selected "host to use: node1". Set it up and then
>>> failed that VM
>>>
>>> However, this did not work and the datacenter went down. My engine
>>> stayed up, however, it seems configuring a domain to pin to a host to use
>>> will obviously cause it to fail
>>>
>>> This seems counter-intuitive to the point of glusterfs or any redundant
>>> storage. If a single host has to be tied to its function, this introduces a
>>> single point of failure
>>>
>>> Am I missing something obvious?
>>>
>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra 
>>> wrote:
>>>
>>> yes, right.  What you can do is edit the 

Re: [ovirt-users] Storage slowly expanding

2017-09-01 Thread Jim Kusznir
Thank you!

I created all the VMs using the sparce allocation method.  I wanted a
method that would create disks that did not immediately occupy their full
declared size (eg, allow overcommit of disk space, as most VM hard drives
are 30-50% empty for their entire life).

I kinda figured that it would not free space on the underlying storage when
a file is deleted within the disk.  What confuses me is a disk that is only
30GB to the OS is using 53GB of space on gluster.  In my understanding, the
actual on-disk usage should be limited to 30GB max if I don't take
snapshots.  (I do like having the ability to take snapshots, and I do use
them from time to time, but I usually don't keep the snapshot for an
extended time...long enough to verify whatever operation I did was
successful).

I did find the "sparcify" command within ovirt and ran that; it reclaimed
some space (the above example of the 30GB disk which is actually using 20GB
inside the VM but was using 53GB on gluster shrunk to 50GB on gluster...But
there's still at least 20GB unaccounted for there.

I would love it if there was something I could do to reclaim the space
inside the disk that isn't in use too (eg, get that disk down to just the
21GB that the actual VM is using).  If I change to virtio-scsi (its
currently just "virtio"), will that enable the DISCARD support, and is
Gluster a supported underlying storage?

Thanks!
--Jim

On Fri, Sep 1, 2017 at 5:45 AM, Yaniv Kaul  wrote:

>
>
> On Fri, Sep 1, 2017 at 8:41 AM, Jim Kusznir  wrote:
>
>> Hi all:
>>
>> I have several VMs, all thin provisioned, on my small storage
>> (self-hosted gluster / hyperconverged cluster).  I'm now noticing that some
>> of my VMs (espicially my only Windows VM) are using even MORE disk space
>> than the blank it was allocated.
>>
>> Example: windows VM: virtual size created at creation: 30GB (thin
>> provisioned).  Actual disk space in use: 19GB.  According to the storage ->
>> Disks tab, its currently using 39GB.  How do I get that down?
>>
>> I have two other VMs that are somewhat heavy DB load (Zabbix and Unifi);
>> both of those are also larger than their created max size despite disk in
>> machine not being fully utilized.
>>
>> None of these have snapshots.
>>
>
> How come you have qcow2 and not raw-sparse, if you are not using
> snapshots? is it a VM from a template?
>
> Generally, this is how thin provisioning works. The underlying qcow2
> doesn't know when you delete a file from within the guest - as file
> deletion is merely marking entries in the file system tables as free, not
> really doing any deletion IO.
> You could run virt-sparsify on the disks to sparsify them, which will, if
> the underlying storage supports it, reclaim storage space.
> You could use IDE or virtio-SCSI and enable DISCARD support, which will,
> if the underlying storage supports it, reclaim storage space.
>
> Those are not exclusive, btw.
> Y.
>
>
>> How do I fix this?
>>
>> Thanks!
>> --Jim
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Kasturi Narra
yes, that is the same option i was asking about. Apologies that i had
mentioned a different name.

So, ovirt will automatically detect it if you select the option 'use
managed gluster volume'. While adding a storage domain after specifying the
host , you could just select the checkbox and that will list all the
volumes managed from ovirt UI + that will fill the mount options for you.



On Fri, Sep 1, 2017 at 6:40 PM, Charles Kozler  wrote:

> Are you referring to "Mount Options" - > http://i.imgur.com/bYfbyzz.png
>
> Then no, but that would explain why it wasnt working :-). I guess I had a
> silly assumption that oVirt would have detected it and automatically taken
> up the redundancy that was configured inside the replica set / brick
> detection.
>
> I will test and let you know
>
> Thanks!
>
> On Fri, Sep 1, 2017 at 8:52 AM, Kasturi Narra  wrote:
>
>> Hi Charles,
>>
>>   One question, while configuring a storage domain  you are
>> saying "host to use: " node1,  then in the connection details you say
>> node1:/data. What about the backup-volfile-servers option in the UI while
>> configuring storage domain? Are you specifying that too?
>>
>> Thanks
>> kasturi
>>
>>
>> On Fri, Sep 1, 2017 at 5:52 PM, Charles Kozler 
>> wrote:
>>
>>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>>> stores metadata, no actual files. So yes, you were running in degraded mode
>>> so some operations were hindered.
>>>
>>> @ Sahina - Yes, this actually worked fine for me once I did that.
>>> However, the issue I am still facing, is when I go to create a new gluster
>>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>>> I select that host. If I fail that host, all VMs halt. I do not recall this
>>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>>> to a volume and vice versa like you could, for instance, for a singular
>>> hyperconverged to ex: export a local disk via NFS and then mount it via
>>> ovirt domain. But of course, this has its caveats. To that end, I am using
>>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>>> in the connection details I give it node1:/data. I fail node1, all VMs
>>> halt. Did I miss something?
>>>
>>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>>
 To the OP question, when you set up a gluster storage domain, you need
 to specify backup-volfile-servers=: where server2
 and server3 also have bricks running. When server1 is down, and the volume
 is mounted again - server2 or server3 are queried to get the gluster
 volfiles.

 @Jim, if this does not work, are you using 4.1.5 build with libgfapi
 access? If not, please provide the vdsm and gluster mount logs to analyse

 If VMs go to paused state - this could mean the storage is not
 available. You can check "gluster volume status " to see if
 atleast 2 bricks are running.

 On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
 wrote:

> If gluster drops in quorum so that it has less votes than it should it
> will stop file operations until quorum is back to normal.If i rember it
> right you need two bricks to write for quorum to be met and that the
> arbiter only is a vote to avoid split brain.
>
>
> Basically what you have is a raid5 solution without a spare. And when
> one disk dies it will run in degraded mode. And some raid systems will 
> stop
> the raid until you have removed the disk or forced it to run anyway.
>
> You can read up on it here: https://gluster.readthed
> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
>
> /Johan
>
> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>
> Hi all:
>
> Sorry to hijack the thread, but I was about to start essentially the
> same thread.
>
> I have a 3 node cluster, all three are hosts and gluster nodes
> (replica 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers=
> set:
>
> storage=192.168.8.11:/engine
> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>
> I had an issue today where 192.168.8.11 went down.  ALL VMs
> immediately paused, including the engine (all VMs were running on
> host2:192.168.8.12).  I couldn't get any gluster stuff working until host1
> (192.168.8.11) was restored.
>
> What's wrong / what did I miss?
>
> (this was set up "manually" through the article on setting up
> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
> since).
>
> Thanks!
> --Jim
>
>
> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler  > wrote:
>
> Typo..."Set it up and then failed that **HOST**"
>
> And upon that host going down, 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
Are you referring to "Mount Options" - > http://i.imgur.com/bYfbyzz.png

Then no, but that would explain why it wasnt working :-). I guess I had a
silly assumption that oVirt would have detected it and automatically taken
up the redundancy that was configured inside the replica set / brick
detection.

I will test and let you know

Thanks!

On Fri, Sep 1, 2017 at 8:52 AM, Kasturi Narra  wrote:

> Hi Charles,
>
>   One question, while configuring a storage domain  you are saying
> "host to use: " node1,  then in the connection details you say node1:/data.
> What about the backup-volfile-servers option in the UI while configuring
> storage domain? Are you specifying that too?
>
> Thanks
> kasturi
>
>
> On Fri, Sep 1, 2017 at 5:52 PM, Charles Kozler 
> wrote:
>
>> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
>> stores metadata, no actual files. So yes, you were running in degraded mode
>> so some operations were hindered.
>>
>> @ Sahina - Yes, this actually worked fine for me once I did that.
>> However, the issue I am still facing, is when I go to create a new gluster
>> storage domain (replica 3, hyperconverged) and I tell it "Host to use" and
>> I select that host. If I fail that host, all VMs halt. I do not recall this
>> in 3.6 or early 4.0. This to me makes it seem like this is "pinning" a node
>> to a volume and vice versa like you could, for instance, for a singular
>> hyperconverged to ex: export a local disk via NFS and then mount it via
>> ovirt domain. But of course, this has its caveats. To that end, I am using
>> gluster replica 3, when configuring it I say "host to use: " node 1, then
>> in the connection details I give it node1:/data. I fail node1, all VMs
>> halt. Did I miss something?
>>
>> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>>
>>> To the OP question, when you set up a gluster storage domain, you need
>>> to specify backup-volfile-servers=: where server2 and
>>> server3 also have bricks running. When server1 is down, and the volume is
>>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>>
>>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>>
>>> If VMs go to paused state - this could mean the storage is not
>>> available. You can check "gluster volume status " to see if
>>> atleast 2 bricks are running.
>>>
>>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
>>> wrote:
>>>
 If gluster drops in quorum so that it has less votes than it should it
 will stop file operations until quorum is back to normal.If i rember it
 right you need two bricks to write for quorum to be met and that the
 arbiter only is a vote to avoid split brain.


 Basically what you have is a raid5 solution without a spare. And when
 one disk dies it will run in degraded mode. And some raid systems will stop
 the raid until you have removed the disk or forced it to run anyway.

 You can read up on it here: https://gluster.readthed
 ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

 /Johan

 On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:

 Hi all:

 Sorry to hijack the thread, but I was about to start essentially the
 same thread.

 I have a 3 node cluster, all three are hosts and gluster nodes (replica
 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:

 storage=192.168.8.11:/engine
 mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13

 I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
 paused, including the engine (all VMs were running on host2:192.168.8.12).
 I couldn't get any gluster stuff working until host1 (192.168.8.11) was
 restored.

 What's wrong / what did I miss?

 (this was set up "manually" through the article on setting up
 self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
 since).

 Thanks!
 --Jim


 On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
 wrote:

 Typo..."Set it up and then failed that **HOST**"

 And upon that host going down, the storage domain went down. I only
 have hosted storage domain and this new one - is this why the DC went down
 and no SPM could be elected?

 I dont recall this working this way in early 4.0 or 3.6

 On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
 wrote:

 So I've tested this today and I failed a node. Specifically, I setup a
 glusterfs domain and selected "host to use: node1". Set it up and then
 failed that VM

 However, this did not work and the datacenter went down. My engine
 stayed up, however, it seems 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Kasturi Narra
Hi Charles,

  One question, while configuring a storage domain  you are saying
"host to use: " node1,  then in the connection details you say node1:/data.
What about the backup-volfile-servers option in the UI while configuring
storage domain? Are you specifying that too?

Thanks
kasturi


On Fri, Sep 1, 2017 at 5:52 PM, Charles Kozler  wrote:

> @ Jim - you have only two data volumes and lost quorum. Arbitrator only
> stores metadata, no actual files. So yes, you were running in degraded mode
> so some operations were hindered.
>
> @ Sahina - Yes, this actually worked fine for me once I did that. However,
> the issue I am still facing, is when I go to create a new gluster storage
> domain (replica 3, hyperconverged) and I tell it "Host to use" and I select
> that host. If I fail that host, all VMs halt. I do not recall this in 3.6
> or early 4.0. This to me makes it seem like this is "pinning" a node to a
> volume and vice versa like you could, for instance, for a singular
> hyperconverged to ex: export a local disk via NFS and then mount it via
> ovirt domain. But of course, this has its caveats. To that end, I am using
> gluster replica 3, when configuring it I say "host to use: " node 1, then
> in the connection details I give it node1:/data. I fail node1, all VMs
> halt. Did I miss something?
>
> On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:
>
>> To the OP question, when you set up a gluster storage domain, you need to
>> specify backup-volfile-servers=: where server2 and
>> server3 also have bricks running. When server1 is down, and the volume is
>> mounted again - server2 or server3 are queried to get the gluster volfiles.
>>
>> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
>> access? If not, please provide the vdsm and gluster mount logs to analyse
>>
>> If VMs go to paused state - this could mean the storage is not available.
>> You can check "gluster volume status " to see if atleast 2 bricks
>> are running.
>>
>> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
>> wrote:
>>
>>> If gluster drops in quorum so that it has less votes than it should it
>>> will stop file operations until quorum is back to normal.If i rember it
>>> right you need two bricks to write for quorum to be met and that the
>>> arbiter only is a vote to avoid split brain.
>>>
>>>
>>> Basically what you have is a raid5 solution without a spare. And when
>>> one disk dies it will run in degraded mode. And some raid systems will stop
>>> the raid until you have removed the disk or forced it to run anyway.
>>>
>>> You can read up on it here: https://gluster.readthed
>>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
>>>
>>> /Johan
>>>
>>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>>>
>>> Hi all:
>>>
>>> Sorry to hijack the thread, but I was about to start essentially the
>>> same thread.
>>>
>>> I have a 3 node cluster, all three are hosts and gluster nodes (replica
>>> 2 + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:
>>>
>>> storage=192.168.8.11:/engine
>>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>>>
>>> I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
>>> paused, including the engine (all VMs were running on host2:192.168.8.12).
>>> I couldn't get any gluster stuff working until host1 (192.168.8.11) was
>>> restored.
>>>
>>> What's wrong / what did I miss?
>>>
>>> (this was set up "manually" through the article on setting up
>>> self-hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
>>> since).
>>>
>>> Thanks!
>>> --Jim
>>>
>>>
>>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
>>> wrote:
>>>
>>> Typo..."Set it up and then failed that **HOST**"
>>>
>>> And upon that host going down, the storage domain went down. I only have
>>> hosted storage domain and this new one - is this why the DC went down and
>>> no SPM could be elected?
>>>
>>> I dont recall this working this way in early 4.0 or 3.6
>>>
>>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
>>> wrote:
>>>
>>> So I've tested this today and I failed a node. Specifically, I setup a
>>> glusterfs domain and selected "host to use: node1". Set it up and then
>>> failed that VM
>>>
>>> However, this did not work and the datacenter went down. My engine
>>> stayed up, however, it seems configuring a domain to pin to a host to use
>>> will obviously cause it to fail
>>>
>>> This seems counter-intuitive to the point of glusterfs or any redundant
>>> storage. If a single host has to be tied to its function, this introduces a
>>> single point of failure
>>>
>>> Am I missing something obvious?
>>>
>>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra 
>>> wrote:
>>>
>>> yes, right.  What you can do is edit the hosted-engine.conf file and
>>> there is a parameter as shown below [1] and replace h2 and h3 with 

Re: [ovirt-users] Storage slowly expanding

2017-09-01 Thread Yaniv Kaul
On Fri, Sep 1, 2017 at 8:41 AM, Jim Kusznir  wrote:

> Hi all:
>
> I have several VMs, all thin provisioned, on my small storage (self-hosted
> gluster / hyperconverged cluster).  I'm now noticing that some of my VMs
> (espicially my only Windows VM) are using even MORE disk space than the
> blank it was allocated.
>
> Example: windows VM: virtual size created at creation: 30GB (thin
> provisioned).  Actual disk space in use: 19GB.  According to the storage ->
> Disks tab, its currently using 39GB.  How do I get that down?
>
> I have two other VMs that are somewhat heavy DB load (Zabbix and Unifi);
> both of those are also larger than their created max size despite disk in
> machine not being fully utilized.
>
> None of these have snapshots.
>

How come you have qcow2 and not raw-sparse, if you are not using snapshots?
is it a VM from a template?

Generally, this is how thin provisioning works. The underlying qcow2
doesn't know when you delete a file from within the guest - as file
deletion is merely marking entries in the file system tables as free, not
really doing any deletion IO.
You could run virt-sparsify on the disks to sparsify them, which will, if
the underlying storage supports it, reclaim storage space.
You could use IDE or virtio-SCSI and enable DISCARD support, which will, if
the underlying storage supports it, reclaim storage space.

Those are not exclusive, btw.
Y.


> How do I fix this?
>
> Thanks!
> --Jim
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Charles Kozler
@ Jim - you have only two data volumes and lost quorum. Arbitrator only
stores metadata, no actual files. So yes, you were running in degraded mode
so some operations were hindered.

@ Sahina - Yes, this actually worked fine for me once I did that. However,
the issue I am still facing, is when I go to create a new gluster storage
domain (replica 3, hyperconverged) and I tell it "Host to use" and I select
that host. If I fail that host, all VMs halt. I do not recall this in 3.6
or early 4.0. This to me makes it seem like this is "pinning" a node to a
volume and vice versa like you could, for instance, for a singular
hyperconverged to ex: export a local disk via NFS and then mount it via
ovirt domain. But of course, this has its caveats. To that end, I am using
gluster replica 3, when configuring it I say "host to use: " node 1, then
in the connection details I give it node1:/data. I fail node1, all VMs
halt. Did I miss something?

On Fri, Sep 1, 2017 at 2:13 AM, Sahina Bose  wrote:

> To the OP question, when you set up a gluster storage domain, you need to
> specify backup-volfile-servers=: where server2 and
> server3 also have bricks running. When server1 is down, and the volume is
> mounted again - server2 or server3 are queried to get the gluster volfiles.
>
> @Jim, if this does not work, are you using 4.1.5 build with libgfapi
> access? If not, please provide the vdsm and gluster mount logs to analyse
>
> If VMs go to paused state - this could mean the storage is not available.
> You can check "gluster volume status " to see if atleast 2 bricks
> are running.
>
> On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson 
> wrote:
>
>> If gluster drops in quorum so that it has less votes than it should it
>> will stop file operations until quorum is back to normal.If i rember it
>> right you need two bricks to write for quorum to be met and that the
>> arbiter only is a vote to avoid split brain.
>>
>>
>> Basically what you have is a raid5 solution without a spare. And when one
>> disk dies it will run in degraded mode. And some raid systems will stop the
>> raid until you have removed the disk or forced it to run anyway.
>>
>> You can read up on it here: https://gluster.readthed
>> ocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
>>
>> /Johan
>>
>> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>>
>> Hi all:
>>
>> Sorry to hijack the thread, but I was about to start essentially the same
>> thread.
>>
>> I have a 3 node cluster, all three are hosts and gluster nodes (replica 2
>> + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:
>>
>> storage=192.168.8.11:/engine
>> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>>
>> I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
>> paused, including the engine (all VMs were running on host2:192.168.8.12).
>> I couldn't get any gluster stuff working until host1 (192.168.8.11) was
>> restored.
>>
>> What's wrong / what did I miss?
>>
>> (this was set up "manually" through the article on setting up self-hosted
>> gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).
>>
>> Thanks!
>> --Jim
>>
>>
>> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
>> wrote:
>>
>> Typo..."Set it up and then failed that **HOST**"
>>
>> And upon that host going down, the storage domain went down. I only have
>> hosted storage domain and this new one - is this why the DC went down and
>> no SPM could be elected?
>>
>> I dont recall this working this way in early 4.0 or 3.6
>>
>> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
>> wrote:
>>
>> So I've tested this today and I failed a node. Specifically, I setup a
>> glusterfs domain and selected "host to use: node1". Set it up and then
>> failed that VM
>>
>> However, this did not work and the datacenter went down. My engine stayed
>> up, however, it seems configuring a domain to pin to a host to use will
>> obviously cause it to fail
>>
>> This seems counter-intuitive to the point of glusterfs or any redundant
>> storage. If a single host has to be tied to its function, this introduces a
>> single point of failure
>>
>> Am I missing something obvious?
>>
>> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra  wrote:
>>
>> yes, right.  What you can do is edit the hosted-engine.conf file and
>> there is a parameter as shown below [1] and replace h2 and h3 with your
>> second and third storage servers. Then you will need to restart
>> ovirt-ha-agent and ovirt-ha-broker services in all the nodes .
>>
>> [1] 'mnt_options=backup-volfile-servers=:'
>>
>> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler 
>> wrote:
>>
>> Hi Kasturi -
>>
>> Thanks for feedback
>>
>> > If cockpit+gdeploy plugin would be have been used then that would have
>> automatically detected glusterfs replica 3 volume created during Hosted
>> Engine deployment and this 

[ovirt-users] Install failed when adding host in Ovirt

2017-09-01 Thread Khoi Thinh
Hi everyone,
I have a question regard of host in Ovirt. Is it possible that we can add
host which is registered in different data-center?


-- 
*Khoi Thinh*
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Sahina Bose
To the OP question, when you set up a gluster storage domain, you need to
specify backup-volfile-servers=: where server2 and
server3 also have bricks running. When server1 is down, and the volume is
mounted again - server2 or server3 are queried to get the gluster volfiles.

@Jim, if this does not work, are you using 4.1.5 build with libgfapi
access? If not, please provide the vdsm and gluster mount logs to analyse

If VMs go to paused state - this could mean the storage is not available.
You can check "gluster volume status " to see if atleast 2 bricks
are running.

On Fri, Sep 1, 2017 at 11:31 AM, Johan Bernhardsson  wrote:

> If gluster drops in quorum so that it has less votes than it should it
> will stop file operations until quorum is back to normal.If i rember it
> right you need two bricks to write for quorum to be met and that the
> arbiter only is a vote to avoid split brain.
>
>
> Basically what you have is a raid5 solution without a spare. And when one
> disk dies it will run in degraded mode. And some raid systems will stop the
> raid until you have removed the disk or forced it to run anyway.
>
> You can read up on it here: https://gluster.readthedocs.io/en/latest/
> Administrator%20Guide/arbiter-volumes-and-quorum/
>
> /Johan
>
> On Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
>
> Hi all:
>
> Sorry to hijack the thread, but I was about to start essentially the same
> thread.
>
> I have a 3 node cluster, all three are hosts and gluster nodes (replica 2
> + arbitrar).  I DO have the mnt_options=backup-volfile-servers= set:
>
> storage=192.168.8.11:/engine
> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
>
> I had an issue today where 192.168.8.11 went down.  ALL VMs immediately
> paused, including the engine (all VMs were running on host2:192.168.8.12).
> I couldn't get any gluster stuff working until host1 (192.168.8.11) was
> restored.
>
> What's wrong / what did I miss?
>
> (this was set up "manually" through the article on setting up self-hosted
> gluster cluster back when 4.0 was new..I've upgraded it to 4.1 since).
>
> Thanks!
> --Jim
>
>
> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
> wrote:
>
> Typo..."Set it up and then failed that **HOST**"
>
> And upon that host going down, the storage domain went down. I only have
> hosted storage domain and this new one - is this why the DC went down and
> no SPM could be elected?
>
> I dont recall this working this way in early 4.0 or 3.6
>
> On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
> wrote:
>
> So I've tested this today and I failed a node. Specifically, I setup a
> glusterfs domain and selected "host to use: node1". Set it up and then
> failed that VM
>
> However, this did not work and the datacenter went down. My engine stayed
> up, however, it seems configuring a domain to pin to a host to use will
> obviously cause it to fail
>
> This seems counter-intuitive to the point of glusterfs or any redundant
> storage. If a single host has to be tied to its function, this introduces a
> single point of failure
>
> Am I missing something obvious?
>
> On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra  wrote:
>
> yes, right.  What you can do is edit the hosted-engine.conf file and there
> is a parameter as shown below [1] and replace h2 and h3 with your second
> and third storage servers. Then you will need to restart ovirt-ha-agent and
> ovirt-ha-broker services in all the nodes .
>
> [1] 'mnt_options=backup-volfile-servers=:'
>
> On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler 
> wrote:
>
> Hi Kasturi -
>
> Thanks for feedback
>
> > If cockpit+gdeploy plugin would be have been used then that would have
> automatically detected glusterfs replica 3 volume created during Hosted
> Engine deployment and this question would not have been asked
>
> Actually, doing hosted-engine --deploy it too also auto detects
> glusterfs.  I know glusterfs fuse client has the ability to failover
> between all nodes in cluster, but I am still curious given the fact that I
> see in ovirt config node1:/engine (being node1 I set it to in hosted-engine
> --deploy). So my concern was to ensure and find out exactly how engine
> works when one node goes away and the fuse client moves over to the other
> node in the gluster cluster
>
> But you did somewhat answer my question, the answer seems to be no (as
> default) and I will have to use hosted-engine.conf and change the parameter
> as you list
>
> So I need to do something manual to create HA for engine on gluster? Yes?
>
> Thanks so much!
>
> On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra  wrote:
>
> Hi,
>
>During Hosted Engine setup question about glusterfs volume is being
> asked because you have setup the volumes yourself. If cockpit+gdeploy
> plugin would be have been used then that would have automatically detected
> glusterfs replica 3 volume created during Hosted 

Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Johan Bernhardsson
If gluster drops in quorum so that it has less votes than it should it
will stop file operations until quorum is back to normal.If i rember it
right you need two bricks to write for quorum to be met and that the
arbiter only is a vote to avoid split brain.
Basically what you have is a raid5 solution without a spare. And when
one disk dies it will run in degraded mode. And some raid systems will
stop the raid until you have removed the disk or forced it to run
anyway. 
You can read up on it here: https://gluster.readthedocs.io/en/latest/Ad
ministrator%20Guide/arbiter-volumes-and-quorum/
/JohanOn Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
> Hi all:  
> 
> Sorry to hijack the thread, but I was about to start essentially the
> same thread.
> 
> I have a 3 node cluster, all three are hosts and gluster nodes
> (replica 2 + arbitrar).  I DO have the mnt_options=backup-volfile-
> servers= set:
> 
> storage=192.168.8.11:/engine
> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
> 
> I had an issue today where 192.168.8.11 went down.  ALL VMs
> immediately paused, including the engine (all VMs were running on
> host2:192.168.8.12).  I couldn't get any gluster stuff working until
> host1 (192.168.8.11) was restored.
> 
> What's wrong / what did I miss?
> 
> (this was set up "manually" through the article on setting up self-
> hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
> since).
> 
> Thanks!
> --Jim
> 
> 
> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
> m> wrote:
> > Typo..."Set it up and then failed that **HOST**"
> > 
> > And upon that host going down, the storage domain went down. I only
> > have hosted storage domain and this new one - is this why the DC
> > went down and no SPM could be elected?
> > 
> > I dont recall this working this way in early 4.0 or 3.6
> > 
> > On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
> > om> wrote:
> > > So I've tested this today and I failed a node. Specifically, I
> > > setup a glusterfs domain and selected "host to use: node1". Set
> > > it up and then failed that VM
> > > 
> > > However, this did not work and the datacenter went down. My
> > > engine stayed up, however, it seems configuring a domain to pin
> > > to a host to use will obviously cause it to fail
> > > 
> > > This seems counter-intuitive to the point of glusterfs or any
> > > redundant storage. If a single host has to be tied to its
> > > function, this introduces a single point of failure
> > > 
> > > Am I missing something obvious?
> > > 
> > > On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra  > > > wrote:
> > > > yes, right.  What you can do is edit the hosted-engine.conf
> > > > file and there is a parameter as shown below [1] and replace h2
> > > > and h3 with your second and third storage servers. Then you
> > > > will need to restart ovirt-ha-agent and ovirt-ha-broker
> > > > services in all the nodes .
> > > > 
> > > > [1] 'mnt_options=backup-volfile-servers=:' 
> > > > 
> > > > On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler 
> > > > il.com> wrote:
> > > > > Hi Kasturi -
> > > > > 
> > > > > Thanks for feedback
> > > > > 
> > > > > > If cockpit+gdeploy plugin would be have been used then that
> > > > > would have automatically detected glusterfs replica 3 volume
> > > > > created during Hosted Engine deployment and this question
> > > > > would not have been asked
> > > > >   
> > > > > Actually, doing hosted-engine --deploy it too also auto
> > > > > detects glusterfs.  I know glusterfs fuse client has the
> > > > > ability to failover between all nodes in cluster, but I am
> > > > > still curious given the fact that I see in ovirt config
> > > > > node1:/engine (being node1 I set it to in hosted-engine --
> > > > > deploy). So my concern was to ensure and find out exactly how
> > > > > engine works when one node goes away and the fuse client
> > > > > moves over to the other node in the gluster cluster
> > > > > 
> > > > > But you did somewhat answer my question, the answer seems to
> > > > > be no (as default) and I will have to use hosted-engine.conf
> > > > > and change the parameter as you list
> > > > > 
> > > > > So I need to do something manual to create HA for engine on
> > > > > gluster? Yes?
> > > > > 
> > > > > Thanks so much!
> > > > > 
> > > > > On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra 
> > > > > .com> wrote:
> > > > > > Hi,
> > > > > > 
> > > > > >    During Hosted Engine setup question about glusterfs
> > > > > > volume is being asked because you have setup the volumes
> > > > > > yourself. If cockpit+gdeploy plugin would be have been used
> > > > > > then that would have automatically detected glusterfs
> > > > > > replica 3 volume created during Hosted Engine deployment
> > > > > > and this question would not have been asked.
> > > > > > 
> > > > > >    During new storage domain creation when glusterfs is
> > > > > > selected there is a feature called 'use managed gluster
> > > > > > volumes' and upon checking