Re: [ovirt-users] [Gluster-users] HA storage based on two nodes with one point of failure

2015-06-08 Thread Юрий Полторацкий
2015-06-08 8:32 GMT+03:00 Ravishankar N ravishan...@redhat.com:



 On 06/08/2015 02:38 AM, Юрий Полторацкий wrote:

 Hi,

 I have made a lab with a config listed below and have got unexpected
 result. Someone, tell me, please, where did I go wrong?

 I am testing oVirt. Data Center has two clusters: the first as a computing
 with three nodes (node1, node2, node3); the second as a storage (node5,
 node6) based on glusterfs (replica 2).

 I want the storage to be HA. I have read here
 https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-brain.html
 next:
 For a replicated volume with two nodes and one brick on each machine, if
 the server-side quorum is enabled and one of the nodes goes offline, the
 other node will also be taken offline because of the quorum configuration.
 As a result, the high availability provided by the replication is
 ineffective. To prevent this situation, a dummy node can be added to the
 trusted storage pool which does not contain any bricks. This ensures that
 even if one of the nodes which contains data goes offline, the other node
 will remain online. Note that if the dummy node and one of the data nodes
 goes offline, the brick on other node will be also be taken offline, and
 will result in data unavailability.

 So, I have added my Engine (not self-hosted) as a dummy node without a
 brick and have configured quorum as listed below:
 cluster.quorum-type: fixed
 cluster.quorum-count: 1
 cluster.server-quorum-type: server
 cluster.server-quorum-ratio: 51%


 Then, I've run a VM and have dropped the network link from node6, after
 one a hour have switched back the link and after a while have got a
 split-brain. But why? No one could write to the brick on node6: the VM was
 running on node3 and node1 was SPM.



 It could have happened that after node6 came up, the client(s) saw a
 temporary disconnect of node 5 and a write happened at that time. When the
 node 5 is connected again, we have AFR xattrs on both nodes blaming each
 other, causing split-brain. For a replica 2 setup. it is best to set the
 client-quorum to auto instead of fixed. What this means is that the first
 node of the replica must always be up for writes to be permitted. If the
 first node goes down, the volume becomes read-only.

Yes, at first I have tested with client-quorum auto, but my VMs has been
paused when the first node goes down and this is not unacceptable

Ok, I understood: there is now way to have fault tolerance storage with
only two servers using GlusterFS. I have to get another one.

Thanks.


 For better availability , it would be better to use a replica 3 volume
 with (again with client-quorum set to auto). If you are using glusterfs
 3.7, you can also consider using the arbiter configuration [1] for replica
 3.

 [1]
 https://github.com/gluster/glusterfs/blob/master/doc/features/afr-arbiter-volumes.md

 Thanks,
 Ravi


  Gluster's log from node6:
 Июн 07 15:35:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]:
 [2015-06-07 12:35:06.106270] C [MSGID: 106002]
 [glusterd-server-quorum.c:356:glusterd_do_volume_quorum_action]
 0-management: Server quorum lost for volume vol3. Stopping local bricks.
 Июн 07 16:30:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]:
 [2015-06-07 13:30:06.261505] C [MSGID: 106003]
 [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
 0-management: Server quorum regained for volume vol3. Starting local bricks.


 gluster volume heal vol3 info
 Brick node5.virt.local:/storage/brick12/
 /5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain

 Number of entries: 1

 Brick node6.virt.local:/storage/brick13/
 /5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain

 Number of entries: 1


 gluster volume info vol3

 Volume Name: vol3
 Type: Replicate
 Volume ID: 69ba8c68-6593-41ca-b1d9-40b3be50ac80
 Status: Started
 Number of Bricks: 1 x 2 = 2
 Transport-type: tcp
 Bricks:
 Brick1: node5.virt.local:/storage/brick12
 Brick2: node6.virt.local:/storage/brick13
 Options Reconfigured:
 storage.owner-gid: 36
 storage.owner-uid: 36
 cluster.server-quorum-type: server
 cluster.quorum-type: fixed
 network.remote-dio: enable
 cluster.eager-lock: enable
 performance.stat-prefetch: off
 performance.io-cache: off
 performance.read-ahead: off
 performance.quick-read: off
 auth.allow: *
 user.cifs: disable
 nfs.disable: on
 performance.readdir-ahead: on
 cluster.quorum-count: 1
 cluster.server-quorum-ratio: 51%



 06.06.2015 12:09, Юрий Полторацкий пишет:

 Hi,

  I want to build a HA storage based on two servers. I want that if one
 goes down, my storage will be available in RW mode.

  If I will use replica 2, then split-brain can occur. To avoid this I
 would use a quorum. As I understand correctly, I can use quorum on a client
 side, on a server side, or on both. I want to add a dummy node without a
 brick and make such config:

 cluster.quorum-type: fixed
 

Re: [ovirt-users] [Gluster-users] HA storage based on two nodes with one point of failure

2015-06-08 Thread Ravishankar N



On 06/08/2015 02:38 AM, Юрий Полторацкий wrote:

Hi,

I have made a lab with a config listed below and have got unexpected 
result. Someone, tell me, please, where did I go wrong?


I am testing oVirt. Data Center has two clusters: the first as a 
computing with three nodes (node1, node2, node3); the second as a 
storage (node5, node6) based on glusterfs (replica 2).


I want the storage to be HA. I have read here 
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-brain.html 
next:
For a replicated volume with two nodes and one brick on each machine, 
if the server-side quorum is enabled and one of the nodes goes 
offline, the other node will also be taken offline because of the 
quorum configuration. As a result, the high availability provided by 
the replication is ineffective. To prevent this situation, a dummy 
node can be added to the trusted storage pool which does not contain 
any bricks. This ensures that even if one of the nodes which contains 
data goes offline, the other node will remain online. Note that if the 
dummy node and one of the data nodes goes offline, the brick on other 
node will be also be taken offline, and will result in data 
unavailability.


So, I have added my Engine (not self-hosted) as a dummy node without 
a brick and have configured quorum as listed below:

cluster.quorum-type: fixed
cluster.quorum-count: 1
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 51%


Then, I've run a VM and have dropped the network link from node6, 
after one a hour have switched back the link and after a while have 
got a split-brain. But why? No one could write to the brick on node6: 
the VM was running on node3 and node1 was SPM.





It could have happened that after node6 came up, the client(s) saw a 
temporary disconnect of node 5 and a write happened at that time. When 
the node 5 is connected again, we have AFR xattrs on both nodes blaming 
each other, causing split-brain. For a replica 2 setup. it is best to 
set the client-quorum to auto instead of fixed. What this means is that 
the first node of the replica must always be up for writes to be 
permitted. If the first node goes down, the volume becomes read-only.  
For better availability , it would be better to use a replica 3 volume 
with (again with client-quorum set to auto). If you are using glusterfs 
3.7, you can also consider using the arbiter configuration [1] for 
replica 3.


[1] 
https://github.com/gluster/glusterfs/blob/master/doc/features/afr-arbiter-volumes.md


Thanks,
Ravi



Gluster's log from node6:
Июн 07 15:35:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]: 
[2015-06-07 12:35:06.106270] C [MSGID: 106002] 
[glusterd-server-quorum.c:356:glusterd_do_volume_quorum_action] 
0-management: Server quorum lost for volume vol3. Stopping local bricks.
Июн 07 16:30:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]: 
[2015-06-07 13:30:06.261505] C [MSGID: 106003] 
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 
0-management: Server quorum regained for volume vol3. Starting local 
bricks.



gluster volume heal vol3 info
Brick node5.virt.local:/storage/brick12/
/5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain

Number of entries: 1

Brick node6.virt.local:/storage/brick13/
/5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain

Number of entries: 1


gluster volume info vol3

Volume Name: vol3
Type: Replicate
Volume ID: 69ba8c68-6593-41ca-b1d9-40b3be50ac80
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: node5.virt.local:/storage/brick12
Brick2: node6.virt.local:/storage/brick13
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: fixed
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: disable
nfs.disable: on
performance.readdir-ahead: on
cluster.quorum-count: 1
cluster.server-quorum-ratio: 51%



06.06.2015 12:09, Юрий Полторацкий пишет:

Hi,

I want to build a HA storage based on two servers. I want that if one 
goes down, my storage will be available in RW mode.


If I will use replica 2, then split-brain can occur. To avoid this I 
would use a quorum. As I understand correctly, I can use quorum on a 
client side, on a server side, or on both. I want to add a dummy node 
without a brick and make such config:


cluster.quorum-type: fixed
cluster.quorum-count: 1
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 51%

I expect that client will have access in RW mode until one brick 
alive. On the other side if server's quorum will not meet, then 
bricks will be RO.


Say, HOST1 with a brick BRICK1, HOST2 with a brick BRICK2, and HOST3 
without a brick.


Once HOST1 lose a network connection, than on this node server quorum 
will not meet