Re: [ovs-dev] [RFC 0/2] 'Graceful restart' of OvS

Mark Michelson Sun, 28 Jan 2018 06:30:37 -0800

On 01/24/2018 04:02 PM, Aaron Conole wrote:

Ben Pfaff <[email protected]> writes:

What I'd really like to start from is a high-level description of how an
upgrade would take place.  This patch set covers one low-level part of
that upgrade, but (as you recognize in your description) there is a
bigger set of issues.  There have to be handoffs at multiple levels
(datapath, controller connections, database connections, ...) and I'd
really like to think the whole thing through.


I think there are a few 'types' of upgrade.  I had asked this internally
from the various projects who were requesting this feature, but hadn't
gotten anything concrete for requirements or use cases.  Here's my
rough guess on what exists for upgrade scenarios:

* 'Solution level' upgrades.
   These are the kinds of upgrades where a large project, which
   integrates Open vSwitch for the networking layer, provide upgrades for
   their nodes.  In these cases, I understand that usually the idea is to
   migrate, evacuate, or whatever the correct terminology would be for
   moving VMs and traffic to a 'different node.'

I envision that the 'different node' would be running the upgraded

   version of Open vSwitch userspace (and possibly kernel space)
   already.  This scenario is preferred because:
     - It is deterministic (we know how migration / evacuation behaves,
       and we know how a node getting new traffic behaves)
     - It is the recommended way from multiple projects (OpenStack and
       Open Shift)
     - It provides a path to downgrade if something goes wrong (the
       original node hasn't upgraded yet)

   It does have some drawbacks.  Chiefly, it requires having an available
   node to use as the migration node.  I am told that such a requirement
   is a really high burden to place on some of the customers who deploy
   these setups.

* 'Node level' upgrades.
   These are the kinds of upgrades that spawned the RFC.  These are
   upgrades where customers want to run 'apt-get upgrade' or 'yum
   upgrade' and have the new software start up, not lose any flows, and
   none of their vms/pods/containers have to be migrated (no standby
   required).

   I don't know what it means to 'not lose any flows,' though.  I think
   that for the new software to start up and control the kernel datapath,
   the old software needs to have been shutdown.  Open vSwitch does
   provide a mechanism to save/restore the OpenFlow rules (the ovs-save
   script), and will do so when a restart is called.  That means we can
   make sure that the OpenFlow rules and the datapath rules are preserved
   by applying this series.

   But, as we need to preserve information that means additional
   serialization (whether it be as a shell script in the case of
   ovs-save, or json data, or even some kind of binary format),
   deserialization (even if it is executing a script or series of
   scripts), and a stable format for that information (and if something
   like the mac table changes, it will impose requirements on the upgrade
   / downgrade formats that need to be used).

   I'm not sure what the great advantage is - obviously we can tell
   users "hey just upgrade, even while traffic is running... mostly
   nothing bad will happen?"  There isn't a requirement to have an
   migration node, which probably has a real $ benefit to customers who
   have large data centers and don't need to tie up hardware.

* OVN / orchestration upgrades
   I'm not as familiar with OVN - is there anything *active* that gets
   handled?  Can whatever orchestration tool just be torn-down and
   restarted without impacting the network (not just OVN, but say some
   neutron API back-end that calls into OVS)?

As far as the data plane is concerned, ovn-controller is responsible forhandling certain types of messages (DHCP, ARP, IPv6 NS, etc.). Inconversations I've had about upgrades, the downtime of a restart of OVNis not a concern. This is because the types of packets thatovn-controller handles are infrequent, and even if we did miss a packetbecause we are down, the endpoint would resend that packet type againevnetually.


I can't speak to how neutron is effected by an OVN restart.


* Any other users who will upgrade?
   I'm not sure.  Do we need to classify distros as a different upgrade
   case?  Maybe.  After all, each distribution packages things a bit
   differently and perhaps layers their cloud offerings, or OpenStack, or
   kuberenetes with Open vSwitch slightly differently.  Maybe that can be
   lumped into the other buckets.  Maybe each needs to be broken down.

Sorry - it looks like I haven't even come close to an answer for
anything.

I guess the other part that I'd like to think through is, what is the
actual goal?  It's one thing to not lose packet flows but we also need
to make sure that the new ovs-vswitchd gets the same OpenFlow flows,
etc. and that its internal state (MAC tables etc.) get populated from
the old ovs-vswitchd's state, otherwise when the new one takes over
there will be blips due to that change.


It's probably good to also understand which blips will always exist
(there will be some performance degradation while upgrading equivalent
to XXX, because of the YYY), and which can be handled gracefully.

The other aspect I'd like to think about is downgrades.  One would like
to believe that every upgrade goes perfectly, but of course it's not
true, and users may be more reluctant to upgrade if they believe that
reverting to the previous version is disruptive.  I am not sure that
downgrades are more difficult, in most ways, but at least they should be
considered.


Thanks for this, Ben!  It's a lot to digest, and I'll be asking even
more questions now. :)

On Fri, Jan 12, 2018 at 02:19:33PM -0500, Aaron Conole wrote:

IMPORTANT:  Please remember this is simply a strawman to frame a discussion
             around a concept called 'graceful restart.'  More to be explained.

Now that 2.9 work is frozen and the tree will be forked off, I assumed
more extreme and/or interesting ideas might be welcome.  As such, here's
something fairly small-ish that provides an interesting behavior called
'Graceful Restart.'  The idea is that when the OvS userspace is being
upgraded, we can leave the existing flows installed in the datapath allowing
existing flows to continue.  Once the new versions of the daemons take over,
the standard dump/sweep operations of the revalidator threads will resume
and "Everything Will Just Work(tm)."

Of course, there are some important corner cases and side effects that
need to be thought out.  I've listed the ones I know of here (no particular
order, though):


1. Only the active datapath flows (those installed in the kernel datapath
    at the time of 'reload') will remain while the daemons are down.  This
    means *any* new traffic (possibly even new connections between the same
    endpoints) will fail to pass.  This even means a ping between endpoints
    could start failing (ie: if neighbor entries expire, no ARP/ND can pass
    and the neighbor will not be resolved causing send failures - unless
    those flows are luckily still in the kernel datapath).

    1a.  This also means that some protocol exchanges might *seem* to
         work on first glance, but won't actually proceed.  I'm thinking
         cases where pings are used as 'keep alives.'  That's no different
         than existing system.  What will be different is the user expectation.
         The expectation with a "graceful" restart may be that no such failures
         would exist.

2. This is a strong knob that a user may accidentally trigger.  If they do,
    flows will *NEVER* die from the kernel datapath while the daemons are
    running.  This might be acceptable to keep around.  After all, it isn't
    a persistent database entry or anything.  The flag only exists for the
    lifetime of the userspace process (so a restart can also be an effect
    which 'clears' the behavior).  I'm not sure if this would be acceptable.

3. Traffic will pass with no userspace knowledge for a time.  I think this
    is okay - after all if the OvS daemon is killed flows will stick around.
    However, this behavior would go from "well, sometimes it could happen," to
    "we plan and/or expect such to happen."

4. This only covers the kernel datapath.  Userspace datapath implementations
    will still lose the entire datapath during restart.


There probably exists a better/more efficient/more functionally appropriate
way of achieving the desired effect.  This is simply to spawn some discussion
in the upstream community to see if there's a way to achieve this "graceful
restart" effect (ie: not losing existing packet flow) during planned
outages (upgrades, reloads, etc.)

Since the implementation is subject to complete and total change, I haven't
written any documentation for this feature yet.  I'm saving that work for
another spin after getting some feedback.  There may be other opportunity,
for instance, to integrate with something like ovs-ctl for a system-agnostic
implementation.

Aaron Conole (2):
   datapath: prevent deletion of flows / datapaths
   rhel: tell ovsctl to freeze the datapath

  lib/dpctl.c                                        | 27 +++++++++
  lib/dpif-netdev.c                                  |  2 +
  lib/dpif-netlink.c                                 | 65 ++++++++++++++++------
  lib/dpif-provider.h                                |  8 +++
  lib/dpif.c                                         | 22 ++++++++
  lib/dpif.h                                         |  2 +
  .../usr_lib_systemd_system_ovs-vswitchd.service.in |  2 +-
  utilities/ovs-ctl.in                               |  4 ++
  8 files changed, 115 insertions(+), 17 deletions(-)

--
2.14.3

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC 0/2] 'Graceful restart' of OvS

Reply via email to