Re: [ovirt-users] [Gluster-users] HA storage based on two nodes with one point of failure

Ravishankar N Mon, 08 Jun 2015 05:39:08 -0700


On 06/08/2015 02:38 AM, Юрий Полторацкий wrote:

Hi,
I have made a lab with a config listed below and have got unexpectedresult. Someone, tell me, please, where did I go wrong?
I am testing oVirt. Data Center has two clusters: the first as acomputing with three nodes (node1, node2, node3); the second as astorage (node5, node6) based on glusterfs (replica 2).
I want the storage to be HA. I have read here<https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-brain.html>next:For a replicated volume with two nodes and one brick on each machine,if the server-side quorum is enabled and one of the nodes goesoffline, the other node will also be taken offline because of thequorum configuration. As a result, the high availability provided bythe replication is ineffective. To prevent this situation, a dummynode can be added to the trusted storage pool which does not containany bricks. This ensures that even if one of the nodes which containsdata goes offline, the other node will remain online. Note that if thedummy node and one of the data nodes goes offline, the brick on othernode will be also be taken offline, and will result in dataunavailability.
So, I have added my "Engine" (not self-hosted) as a dummy node withouta brick and have configured quorum as listed below:
cluster.quorum-type: fixed
cluster.quorum-count: 1
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 51%
Then, I've run a VM and have dropped the network link from node6,after one a hour have switched back the link and after a while havegot a split-brain. But why? No one could write to the brick on node6:the VM was running on node3 and node1 was SPM.

It could have happened that after node6 came up, the client(s) saw atemporary disconnect of node 5 and a write happened at that time. Whenthe node 5 is connected again, we have AFR xattrs on both nodes blamingeach other, causing split-brain. For a replica 2 setup. it is best toset the client-quorum to auto instead of fixed. What this means is thatthe first node of the replica must always be up for writes to bepermitted. If the first node goes down, the volume becomes read-only.For better availability , it would be better to use a replica 3 volumewith (again with client-quorum set to auto). If you are using glusterfs3.7, you can also consider using the arbiter configuration [1] forreplica 3.

[1]https://github.com/gluster/glusterfs/blob/master/doc/features/afr-arbiter-volumes.md


Thanks,
Ravi

Gluster's log from node6:
Июн 07 15:35:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]:[2015-06-07 12:35:06.106270] C [MSGID: 106002][glusterd-server-quorum.c:356:glusterd_do_volume_quorum_action]0-management: Server quorum lost for volume vol3. Stopping local bricks.Июн 07 16:30:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]:[2015-06-07 13:30:06.261505] C [MSGID: 106003][glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]0-management: Server quorum regained for volume vol3. Starting localbricks.
gluster> volume heal vol3 info
Brick node5.virt.local:/storage/brick12/
/5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain

Number of entries: 1

Brick node6.virt.local:/storage/brick13/
/5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain

Number of entries: 1


gluster> volume info vol3

Volume Name: vol3
Type: Replicate
Volume ID: 69ba8c68-6593-41ca-b1d9-40b3be50ac80
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: node5.virt.local:/storage/brick12
Brick2: node6.virt.local:/storage/brick13
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: fixed
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: disable
nfs.disable: on
performance.readdir-ahead: on
cluster.quorum-count: 1
cluster.server-quorum-ratio: 51%



06.06.2015 12:09, Юрий Полторацкий пишет:
Hi,
I want to build a HA storage based on two servers. I want that if onegoes down, my storage will be available in RW mode.
If I will use replica 2, then split-brain can occur. To avoid this Iwould use a quorum. As I understand correctly, I can use quorum on aclient side, on a server side, or on both. I want to add a dummy nodewithout a brick and make such config:
cluster.quorum-type: fixed
cluster.quorum-count: 1
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 51%
I expect that client will have access in RW mode until one brickalive. On the other side if server's quorum will not meet, thenbricks will be RO.
Say, HOST1 with a brick BRICK1, HOST2 with a brick BRICK2, and HOST3without a brick.
Once HOST1 lose a network connection, than on this node server quorumwill not meet and the brick BRICK1 will not be able for writing. Buton HOST2 there is no problem with server quorum (HOST2 + HOST3 > 51%)and that's why BRICK2 still accessible for writing. With client'squorum there is no problem also - one brick is alive, so client canwrite on it.
I have made a lab using KVM on my desktop and it seems to be workedwell and as expected.
The main question is:
Can I use such a storage for production?

Thanks.
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] [Gluster-users] HA storage based on two nodes with one point of failure

Reply via email to