Re: [ovs-discuss] How to restart raft cluster after a complete shutdown?

2020-08-25 Thread Han Zhou
On Tue, Aug 25, 2020 at 7:08 AM Matthew Booth  wrote:
>
> I'm deploying ovsdb-server (and only ovsdb-server) in K8S as a
StatefulSet:
>
>
https://github.com/openstack-k8s-operators/dev-tools/blob/master/ansible/files/ocp/ovn/ovsdb.yaml
>
> I'm going to replace this with an operator in due course, which may
> make the following simpler. I'm not necessarily constrained to only
> things which are easy to do in a StatefulSet.
>
> I've noticed an issue when I kill all 3 pods simultaneously: it is no
> longer possible to start the cluster. The issue is presumably one of
> quorum: when a node comes up it can't contact any other node to make
> quorum, and therefore can't come up. All nodes are similarly affected,
> so the cluster stays down. Ignoring kubernetes, how is this situation
> intended to be handled? Do I have to it to a single-node deployment,
> convert that to a new cluster and re-bootstrap it? This wouldn't be
> ideal. Is there any way, for example, I can bring up the first node
> while asserting to that node that the other 2 are definitely down?
>
In general you should be able to restart the whole cluster without
re-bootstraping it. The cluster should get back to work as long as 2 of the
3 nodes are back online.
In your case, I am not sure if you are using k8s pods' IPs as server
addresses. If so, probably the k8s pods' IP changed after you restart,
which causes the servers stored in the raft log can never be connected
again? Is that the problem?
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Inquiry for DDlog status for ovn-northd

2020-08-25 Thread Ben Pfaff
On Tue, Aug 25, 2020 at 06:43:51PM +0200, Dumitru Ceara wrote:
> On 8/25/20 6:01 PM, Ben Pfaff wrote:
> > On Mon, Aug 24, 2020 at 04:28:22PM -0700, Han Zhou wrote:
> >> As I remember you were working on the new ovn-northd that utilizes DDlog
> >> for incremental processing. Could you share the current status?
> >>
> >> Now that some more improvements have been made in ovn-controller and OVSDB,
> >> the ovn-northd becomes the more obvious bottleneck for OVN use in large
> >> scale environments. Since you were not in the OVN meetings for the last
> >> couple of weeks, could you share here the status and plan moving forward?
> > 
> > The status is basically that I haven't yet succeeded at getting Red
> > Hat's recommended benchmarks running.  I'm told that is important before
> > we merge it.  I find them super difficult to set up.  I tried a few
> > weeks ago and basically gave up.  Piles and piles of repos all linked
> > together in tricky ways, making it really difficult to substitute my own
> > branches.  I intend to try again soon, though.  I have a new computer
> > that should be arriving soon, which should also allow it to proceed more
> > quickly.
> 
> Hi Ben,
> 
> I can try to help with setting up ovn-heater, in theory it should be
> enough to export OVS_REPO, OVS_BRANCH, OVN_REPO, OVN_BRANCH, make them
> point to your repos and branches and then run "do.sh install" and it
> should take care of installing all the dependencies and repos.
> 
> I can also try to run the scale tests on our downstream if that helps.

It's probably better if I come up with something locally, because I
expect to have to run it multiple times, maybe many times, since I will
presumably discover bottlenecks.

This time around, I'll speak up when I run into problems.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] How to restart raft cluster after a complete shutdown?

2020-08-25 Thread Tony Liu
Start the first node to create the cluster.
https://github.com/ovn-org/ovn/blob/master/utilities/ovn-ctl#L228
https://github.com/openvswitch/ovs/blob/master/utilities/ovs-lib.in#L478

Start the rest nodes to join the cluster.
https://github.com/ovn-org/ovn/blob/master/utilities/ovn-ctl#L226
https://github.com/openvswitch/ovs/blob/master/utilities/ovs-lib.in#L478

Tony
> -Original Message-
> From: discuss  On Behalf Of Matthew
> Booth
> Sent: Tuesday, August 25, 2020 7:08 AM
> To: ovs-discuss 
> Subject: [ovs-discuss] How to restart raft cluster after a complete
> shutdown?
> 
> I'm deploying ovsdb-server (and only ovsdb-server) in K8S as a
> StatefulSet:
> 
> https://github.com/openstack-k8s-operators/dev-
> tools/blob/master/ansible/files/ocp/ovn/ovsdb.yaml
> 
> I'm going to replace this with an operator in due course, which may make
> the following simpler. I'm not necessarily constrained to only things
> which are easy to do in a StatefulSet.
> 
> I've noticed an issue when I kill all 3 pods simultaneously: it is no
> longer possible to start the cluster. The issue is presumably one of
> quorum: when a node comes up it can't contact any other node to make
> quorum, and therefore can't come up. All nodes are similarly affected,
> so the cluster stays down. Ignoring kubernetes, how is this situation
> intended to be handled? Do I have to it to a single-node deployment,
> convert that to a new cluster and re-bootstrap it? This wouldn't be
> ideal. Is there any way, for example, I can bring up the first node
> while asserting to that node that the other 2 are definitely down?
> 
> Thanks,
> 
> Matt
> --
> Matthew Booth
> Red Hat OpenStack Engineer, Compute DFG
> 
> Phone: +442070094448 (UK)
> 
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Inquiry for DDlog status for ovn-northd

2020-08-25 Thread Dumitru Ceara
On 8/25/20 6:01 PM, Ben Pfaff wrote:
> On Mon, Aug 24, 2020 at 04:28:22PM -0700, Han Zhou wrote:
>> As I remember you were working on the new ovn-northd that utilizes DDlog
>> for incremental processing. Could you share the current status?
>>
>> Now that some more improvements have been made in ovn-controller and OVSDB,
>> the ovn-northd becomes the more obvious bottleneck for OVN use in large
>> scale environments. Since you were not in the OVN meetings for the last
>> couple of weeks, could you share here the status and plan moving forward?
> 
> The status is basically that I haven't yet succeeded at getting Red
> Hat's recommended benchmarks running.  I'm told that is important before
> we merge it.  I find them super difficult to set up.  I tried a few
> weeks ago and basically gave up.  Piles and piles of repos all linked
> together in tricky ways, making it really difficult to substitute my own
> branches.  I intend to try again soon, though.  I have a new computer
> that should be arriving soon, which should also allow it to proceed more
> quickly.

Hi Ben,

I can try to help with setting up ovn-heater, in theory it should be
enough to export OVS_REPO, OVS_BRANCH, OVN_REPO, OVN_BRANCH, make them
point to your repos and branches and then run "do.sh install" and it
should take care of installing all the dependencies and repos.

I can also try to run the scale tests on our downstream if that helps.

Regards,
Dumitru

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Inquiry for DDlog status for ovn-northd

2020-08-25 Thread Ben Pfaff
On Mon, Aug 24, 2020 at 04:28:22PM -0700, Han Zhou wrote:
> As I remember you were working on the new ovn-northd that utilizes DDlog
> for incremental processing. Could you share the current status?
> 
> Now that some more improvements have been made in ovn-controller and OVSDB,
> the ovn-northd becomes the more obvious bottleneck for OVN use in large
> scale environments. Since you were not in the OVN meetings for the last
> couple of weeks, could you share here the status and plan moving forward?

The status is basically that I haven't yet succeeded at getting Red
Hat's recommended benchmarks running.  I'm told that is important before
we merge it.  I find them super difficult to set up.  I tried a few
weeks ago and basically gave up.  Piles and piles of repos all linked
together in tricky ways, making it really difficult to substitute my own
branches.  I intend to try again soon, though.  I have a new computer
that should be arriving soon, which should also allow it to proceed more
quickly.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] How to restart raft cluster after a complete shutdown?

2020-08-25 Thread Matthew Booth
I'm deploying ovsdb-server (and only ovsdb-server) in K8S as a StatefulSet:

https://github.com/openstack-k8s-operators/dev-tools/blob/master/ansible/files/ocp/ovn/ovsdb.yaml

I'm going to replace this with an operator in due course, which may
make the following simpler. I'm not necessarily constrained to only
things which are easy to do in a StatefulSet.

I've noticed an issue when I kill all 3 pods simultaneously: it is no
longer possible to start the cluster. The issue is presumably one of
quorum: when a node comes up it can't contact any other node to make
quorum, and therefore can't come up. All nodes are similarly affected,
so the cluster stays down. Ignoring kubernetes, how is this situation
intended to be handled? Do I have to it to a single-node deployment,
convert that to a new cluster and re-bootstrap it? This wouldn't be
ideal. Is there any way, for example, I can bring up the first node
while asserting to that node that the other 2 are definitely down?

Thanks,

Matt
-- 
Matthew Booth
Red Hat OpenStack Engineer, Compute DFG

Phone: +442070094448 (UK)

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss