I came up with an idea, but have not tried.

Periodically run a kubectl command inside the container to get the current
IPs
of the statefulset pods and then reconcile the raft cluster with the
current IPs.
This work can be done by a readiness probe script.


On Thu, 9 Jul 2020 at 21:52, Matthew Booth <[email protected]> wrote:

> On Thu, 9 Jul 2020 at 13:27, Brendan Doyle <[email protected]>
> wrote:
> >
> > Matt,
> >
> > I don't have any answers just questions, sorry. I'm interested because
> > I've just started
> > playing with this stuff too.
> >
> >
> > On 09/07/2020 11:53, Matthew Booth wrote:
> > > I'm running a 3-node ovsdb raft cluster in kubernetes without using
> > > host networking, NET_ADMIN, or any special networking privileges. I'm
> > > using a StatefulSet, so I have persistent storage and a persistent
> > > network name. However, I don't have a persistent IP. I have studied 2
> > > existing implementation of OVN including [1], but as they are both
> > > focussed on providing SDN service to the cluster itself (which I'm
> > > not: I'm just a regular tenant of the cluster), they both legitimately
> > > use host networking and therefore don't suffer this issue.
> >
> > So I'm using ovn-setup.yaml and ovnkube-db-raft.yaml which use the
> > scripts in
> > ovndb-raft-functions.sh to start the cluster. I know these yamls use
> > host networking,
> > and I know that most of the stuff in this repo is focused on providing
> > an OVN CNI for
> > kubernetes, but I believe if you just run those two yamls you just get
> > the OVN cluster,
> > the CNI/SDN stuff is not created. However I noticed with the headless
> > service and
> > host networking that the endpoints that kube creates use the default
> > networking
> > interfaces. I have a multihome host, and wanted the endpoints on a
> > different subnet to
> > the one kube picked so I had to add a "kind: Endpoints" to
> > ovnkube-db-raft.yaml.
> >
> > But just wondering what is the motivation to modify (or have you created
> > your own)
> > the yamls to not use host networking?
>
> I'm not using host networking for 2 reasons. Firstly it requires a
> level of privilege, and I want a regular user to be able to deploy my
> ovsdb cluster. Secondly, if you want to allow your pod to float
> between hosts as things are restarted/upgraded/replaced you're going
> to have the same issue with changing IP if your pod changes host.
>
> I have my own yamls, btw.
>
> > >
> > > [1]
> https://github.com/ovn-org/ovn-kubernetes/blob/master/dist/templates/ovnkube-db-raft.yaml.j2
> > >
> > > I finally managed to test what happens when a pod's IP changes, and
> > > the answer is: it breaks. Specifically, the logs are full of:
> > >
> > > 2020-07-09T10:09:16Z|06012|socket_util|ERR|Dropped 59 log messages in
> > > last 59 seconds (most recently, 1 seconds ago) due to excessive rate
> > > 2020-07-09T10:09:16Z|06013|socket_util|ERR|6644:10.131.0.4: bind:
> > > Cannot assign requested address
> > > 2020-07-09T10:09:16Z|06014|raft|WARN|Dropped 59 log messages in last
> > > 59 seconds (most recently, 1 seconds ago) due to excessive rate
> > > 2020-07-09T10:09:16Z|06015|raft|WARN|ptcp:6644:10.131.0.4: listen
> > > failed (Cannot assign requested address)
> > >
> > > The reason it can't bind to 10.131.0.4 is that it's no longer a local
> > > IP address.
> > >
> > > Note that this is binding the raft cluster port, not the client port.
> > > I have clients connecting to a service IP, which is static. I can't
> > > specifically test that it still works after the pod IPs change, but as
> > > it worked before there's no reason to suspect it won't.
> > >
> > > My first thought was to use service IPs for the raft cluster, too, but
> > > if it wants to bind to its local cluster IP that's never going to
> > > work, because the service IP is never a local IP address (traffic is
> > > forwarded by an external service).
> > >
> > > ovsdb-server is invoked in its container by ovn-ctl:
> > >
> > >              exec /usr/share/openvswitch/scripts/ovn-ctl \
> > >              --no-monitor \
> > >              --db-nb-create-insecure-remote=yes \
> > >              --db-nb-cluster-remote-addr="$(bracketify
> ${initialiser_ip})" \
> > >              --db-nb-cluster-local-addr="$(bracketify ${LOCAL_IP})" \
> > >              --db-nb-cluster-local-proto=tcp \
> > >              --db-nb-cluster-remote-proto=tcp \
> > >              --ovn-nb-log="-vconsole:${OVN_LOG_LEVEL} -vfile:off" \
> > >              run_nb_ovsdb
> >
> > Is this from your own yaml/scripts I don't see it in
> ovndb-raft-functions.sh
> > Again just curious.
>
> I think I cribbed the initial version of this from the OpenShift
> ovn-kubernetes implementation, but I've significantly departed from it
> because that is also obviously using host networking.
>
> I believe ovn-ctl is a library thing, though. You should have that.
>
> Matt
>
> > > initialiser_ip is the pod IP address of the pod which comes up first.
> > > This is a bootstrapping thing, and afaik isn't relevant once the
> > > cluster is initialised. It certainly doesn't appear in the command
> > > line below. LOCAL_IP is the current ip address of this pod.
> > > Surprisingly (to me), this doesn't appear in the ovsdb-server
> > > invocation either. The actual invocation is:
> > >
> > > ovsdb-server -vconsole:info -vfile:off
> > > --log-file=/var/log/openvswitch/ovsdb-server-sb.log
> > > --remote=punix:/pod-run/ovnsb_db.sock --pidfile=/pod-run/ovnsb_db.pid
> > > --unixctl=ovnsb_db.ctl
> > > --remote=db:OVN_Southbound,SB_Global,connections
> > > --private-key=db:OVN_Southbound,SSL,private_key
> > > --certificate=db:OVN_Southbound,SSL,certificate
> > > --ca-cert=db:OVN_Southbound,SSL,ca_cert
> > > --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
> > > --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
> > > --remote=ptcp:6642:0.0.0.0 /var/lib/openvswitch/ovnsb_db.db
> > >
> > > So it's getting its former IP address from somewhere. As the only
> > > local state is the database itself, I assume it's reading it from the
> > > DB's cluster table. Here's what it currently thinks about cluster
> > > state:
> > >
> > > # ovs-appctl -t /pod-run/ovnsb_db.ctl cluster/status OVN_Southbound
> > > 83c7
> > > Name: OVN_Southbound
> > > Cluster ID: 1524 (1524187a-8a7b-41d5-89cf-ad2d00141258)
> > > Server ID: 83c7 (83c771fd-d866-4324-bdd6-707c1bf72010)
> > > Address: tcp:10.131.0.4:6644
> > > Status: cluster member
> > > Role: candidate
> > > Term: 41039
> > > Leader: unknown
> > > Vote: self
> > >
> > > Log: [5526, 5526]
> > > Entries not yet committed: 0
> > > Entries not yet applied: 0
> > > Connections: (->7f46) (->66fc)
> > > Servers:
> > >      83c7 (83c7 at tcp:10.131.0.4:6644) (self) (voted for 83c7)
> > >      7f46 (7f46 at tcp:10.129.2.9:6644)
> > >      66fc (66fc at tcp:10.128.2.13:6644)
> > >
> > > This highlights the next problem, which is that both the other IPs
> > > have changed, too. I know the new IP addresses of the other 2 cluster
> > > nodes, although I don't know which one is 7f46 (but presumably it
> > > knows). Even if I did know, presumably I can't modify the db while
> > > it's not a member of the cluster anyway. The only way I can currently
> > > think of to recover this situation is:
> > >
> > > * Scale back the cluster to just node-0
> > > * node-0 converts itself to a standalone db
> > > * node-0 converts itself to a cluster db with a new local IP
> > > * Scale the cluster back up to 3 nodes, initialised from node-0
> > >
> > > I haven't tested this so there may be problems with it, but in any
> > > case it's not a realistic solution.
> > >
> > > A much nicer solution would be to use a service IP for the raft
> > > cluster, but from the above error message I'm not expecting that to
> > > work because it won't be able to bind it. I'm going to test this
> > > today, and I'll update if I find to the contrary.
> > I'd be interested to know the results.
> >
> > Brendan
> > > I guess I probably want to tell ovsdb to configure its cluster
> > > identity with some arbitrary IP address that isn't local, then just
> > > bind 0.0.0.0 and wait for traffic sent to its SID.
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > >
> > > Matt
> >
> > _______________________________________________
> > discuss mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
>
>
> --
> Matthew Booth
> Red Hat OpenStack Engineer, Compute DFG
>
> Phone: +442070094448 (UK)
>
> _______________________________________________
> discuss mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>


-- 
刘梦馨
Blog: http://oilbeater.com
Weibo: @oilbeater <http://weibo.com/oilbeater>
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to