Hello,
we tried the  following test - with unwanted results

input:
5 node gluster
A = replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )
B = replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )
C = distributed replica 3 arbiter 1 ( node1+node2, node3+node4, each arbiter on node 5)
node 5 has only arbiter replica ( 4x )

TEST:
1) directly reboot one node - OK ( is not important which ( data node or arbiter node )) 2) directly reboot two nodes - OK ( if nodes are not from the same replica ) 3) directly reboot three nodes - yes, this is the main problem and a questions .... - rebooted all three nodes from replica "B" ( not so possible, but who knows ... )
    - all VMs with data on this replica was paused ( no data access ) - OK
- all VMs running on replica "B" nodes lost ( started manually, later )( datas on other replicas ) - acceptable
BUT
- !!! all oVIrt domains went down !! - master domain is on replica "A" which lost only one member from three !!! so we are not expecting that all domain will go down, especially master with 2 live members.

Results:
- the whole cluster unreachable until at all domains up - depent of all nodes up !!!
    - all paused VMs started back - OK
    - rest of all VMs rebooted and runnig - OK

Questions:
1) why all domains down if master domain ( on replica "A" ) has two runnig members ( 2 of 3 ) ?? 2) how to fix that colaps without waiting to all nodes up ? ( in worste case if node has HW error eg. ) ?? 3) which oVirt cluster policy can prevent that situation ?? ( if any )

regs.
Pavel


_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to