Re: [ovs-discuss] How to restart raft cluster after a complete shutdown?
On Tue, 25 Aug 2020 at 17:45, Tony Liu wrote: > > Start the first node to create the cluster. > https://github.com/ovn-org/ovn/blob/master/utilities/ovn-ctl#L228 > https://github.com/openvswitch/ovs/blob/master/utilities/ovs-lib.in#L478 > > Start the rest nodes to join the cluster. > https://github.com/ovn-org/ovn/blob/master/utilities/ovn-ctl#L226 > https://github.com/openvswitch/ovs/blob/master/utilities/ovs-lib.in#L478 Unfortunately this is precisely the problem: this doesn't work after the cluster has already been created. The first node fails to come up with: 2020-08-26T08:06:19Z|3|reconnect|INFO|tcp:ovn-ovsdb-1.openstack.svc.cluster.local:6643: connecting... 2020-08-26T08:06:19Z|4|reconnect|INFO|tcp:ovn-ovsdb-2.openstack.svc.cluster.local:6643: connecting... 2020-08-26T08:06:20Z|5|reconnect|INFO|tcp:ovn-ovsdb-1.openstack.svc.cluster.local:6643: connection attempt timed out 2020-08-26T08:06:20Z|6|reconnect|INFO|tcp:ovn-ovsdb-2.openstack.svc.cluster.local:6643: connection attempt timed out This makes sense, because the first node can't come up without joining a quorum, and it can't join a quorum because the other two nodes aren't up. I 'fixed' this by switching from the OrderedReady to Parallel pod management policy for the statefulset. This just means that all pods come up simultaneously rather than waiting for the first to come up on its own, which will never work. However, my bootstrapping mechanism relied on the behaviour of OrderedReady, so I'm going to have to come up with a solution for that. Matt > > Tony > > -Original Message- > > From: discuss On Behalf Of Matthew > > Booth > > Sent: Tuesday, August 25, 2020 7:08 AM > > To: ovs-discuss > > Subject: [ovs-discuss] How to restart raft cluster after a complete > > shutdown? > > > > I'm deploying ovsdb-server (and only ovsdb-server) in K8S as a > > StatefulSet: > > > > https://github.com/openstack-k8s-operators/dev- > > tools/blob/master/ansible/files/ocp/ovn/ovsdb.yaml > > > > I'm going to replace this with an operator in due course, which may make > > the following simpler. I'm not necessarily constrained to only things > > which are easy to do in a StatefulSet. > > > > I've noticed an issue when I kill all 3 pods simultaneously: it is no > > longer possible to start the cluster. The issue is presumably one of > > quorum: when a node comes up it can't contact any other node to make > > quorum, and therefore can't come up. All nodes are similarly affected, > > so the cluster stays down. Ignoring kubernetes, how is this situation > > intended to be handled? Do I have to it to a single-node deployment, > > convert that to a new cluster and re-bootstrap it? This wouldn't be > > ideal. Is there any way, for example, I can bring up the first node > > while asserting to that node that the other 2 are definitely down? > > > > Thanks, > > > > Matt > > -- > > Matthew Booth > > Red Hat OpenStack Engineer, Compute DFG > > > > Phone: +442070094448 (UK) > > > > ___ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > -- Matthew Booth Red Hat OpenStack Engineer, Compute DFG Phone: +442070094448 (UK) ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] How to restart raft cluster after a complete shutdown?
On Tue, Aug 25, 2020 at 7:08 AM Matthew Booth wrote: > > I'm deploying ovsdb-server (and only ovsdb-server) in K8S as a StatefulSet: > > https://github.com/openstack-k8s-operators/dev-tools/blob/master/ansible/files/ocp/ovn/ovsdb.yaml > > I'm going to replace this with an operator in due course, which may > make the following simpler. I'm not necessarily constrained to only > things which are easy to do in a StatefulSet. > > I've noticed an issue when I kill all 3 pods simultaneously: it is no > longer possible to start the cluster. The issue is presumably one of > quorum: when a node comes up it can't contact any other node to make > quorum, and therefore can't come up. All nodes are similarly affected, > so the cluster stays down. Ignoring kubernetes, how is this situation > intended to be handled? Do I have to it to a single-node deployment, > convert that to a new cluster and re-bootstrap it? This wouldn't be > ideal. Is there any way, for example, I can bring up the first node > while asserting to that node that the other 2 are definitely down? > In general you should be able to restart the whole cluster without re-bootstraping it. The cluster should get back to work as long as 2 of the 3 nodes are back online. In your case, I am not sure if you are using k8s pods' IPs as server addresses. If so, probably the k8s pods' IP changed after you restart, which causes the servers stored in the raft log can never be connected again? Is that the problem? ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] How to restart raft cluster after a complete shutdown?
Start the first node to create the cluster. https://github.com/ovn-org/ovn/blob/master/utilities/ovn-ctl#L228 https://github.com/openvswitch/ovs/blob/master/utilities/ovs-lib.in#L478 Start the rest nodes to join the cluster. https://github.com/ovn-org/ovn/blob/master/utilities/ovn-ctl#L226 https://github.com/openvswitch/ovs/blob/master/utilities/ovs-lib.in#L478 Tony > -Original Message- > From: discuss On Behalf Of Matthew > Booth > Sent: Tuesday, August 25, 2020 7:08 AM > To: ovs-discuss > Subject: [ovs-discuss] How to restart raft cluster after a complete > shutdown? > > I'm deploying ovsdb-server (and only ovsdb-server) in K8S as a > StatefulSet: > > https://github.com/openstack-k8s-operators/dev- > tools/blob/master/ansible/files/ocp/ovn/ovsdb.yaml > > I'm going to replace this with an operator in due course, which may make > the following simpler. I'm not necessarily constrained to only things > which are easy to do in a StatefulSet. > > I've noticed an issue when I kill all 3 pods simultaneously: it is no > longer possible to start the cluster. The issue is presumably one of > quorum: when a node comes up it can't contact any other node to make > quorum, and therefore can't come up. All nodes are similarly affected, > so the cluster stays down. Ignoring kubernetes, how is this situation > intended to be handled? Do I have to it to a single-node deployment, > convert that to a new cluster and re-bootstrap it? This wouldn't be > ideal. Is there any way, for example, I can bring up the first node > while asserting to that node that the other 2 are definitely down? > > Thanks, > > Matt > -- > Matthew Booth > Red Hat OpenStack Engineer, Compute DFG > > Phone: +442070094448 (UK) > > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] How to restart raft cluster after a complete shutdown?
I'm deploying ovsdb-server (and only ovsdb-server) in K8S as a StatefulSet: https://github.com/openstack-k8s-operators/dev-tools/blob/master/ansible/files/ocp/ovn/ovsdb.yaml I'm going to replace this with an operator in due course, which may make the following simpler. I'm not necessarily constrained to only things which are easy to do in a StatefulSet. I've noticed an issue when I kill all 3 pods simultaneously: it is no longer possible to start the cluster. The issue is presumably one of quorum: when a node comes up it can't contact any other node to make quorum, and therefore can't come up. All nodes are similarly affected, so the cluster stays down. Ignoring kubernetes, how is this situation intended to be handled? Do I have to it to a single-node deployment, convert that to a new cluster and re-bootstrap it? This wouldn't be ideal. Is there any way, for example, I can bring up the first node while asserting to that node that the other 2 are definitely down? Thanks, Matt -- Matthew Booth Red Hat OpenStack Engineer, Compute DFG Phone: +442070094448 (UK) ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss