Re: [ovs-dev] [RFC 0/2] 'Graceful restart' of OvS

Aaron Conole Wed, 24 Jan 2018 14:03:06 -0800

Ben Pfaff <[email protected]> writes:

> What I'd really like to start from is a high-level description of how an
> upgrade would take place.  This patch set covers one low-level part of
> that upgrade, but (as you recognize in your description) there is a
> bigger set of issues.  There have to be handoffs at multiple levels
> (datapath, controller connections, database connections, ...) and I'd
> really like to think the whole thing through.

I think there are a few 'types' of upgrade.  I had asked this internally
from the various projects who were requesting this feature, but hadn't
gotten anything concrete for requirements or use cases.  Here's my
rough guess on what exists for upgrade scenarios:

* 'Solution level' upgrades.
  These are the kinds of upgrades where a large project, which
  integrates Open vSwitch for the networking layer, provide upgrades for
  their nodes.  In these cases, I understand that usually the idea is to
  migrate, evacuate, or whatever the correct terminology would be for
  moving VMs and traffic to a 'different node.'

  I envision that the 'different node' would be running the upgraded
  version of Open vSwitch userspace (and possibly kernel space)
  already.  This scenario is preferred because:
    - It is deterministic (we know how migration / evacuation behaves,
      and we know how a node getting new traffic behaves)
    - It is the recommended way from multiple projects (OpenStack and
      Open Shift)
    - It provides a path to downgrade if something goes wrong (the
      original node hasn't upgraded yet)

  It does have some drawbacks.  Chiefly, it requires having an available
  node to use as the migration node.  I am told that such a requirement
  is a really high burden to place on some of the customers who deploy
  these setups.

* 'Node level' upgrades.
  These are the kinds of upgrades that spawned the RFC.  These are
  upgrades where customers want to run 'apt-get upgrade' or 'yum
  upgrade' and have the new software start up, not lose any flows, and
  none of their vms/pods/containers have to be migrated (no standby
  required).

  I don't know what it means to 'not lose any flows,' though.  I think
  that for the new software to start up and control the kernel datapath,
  the old software needs to have been shutdown.  Open vSwitch does
  provide a mechanism to save/restore the OpenFlow rules (the ovs-save
  script), and will do so when a restart is called.  That means we can
  make sure that the OpenFlow rules and the datapath rules are preserved
  by applying this series.

  But, as we need to preserve information that means additional
  serialization (whether it be as a shell script in the case of
  ovs-save, or json data, or even some kind of binary format),
  deserialization (even if it is executing a script or series of
  scripts), and a stable format for that information (and if something
  like the mac table changes, it will impose requirements on the upgrade
  / downgrade formats that need to be used).

  I'm not sure what the great advantage is - obviously we can tell
  users "hey just upgrade, even while traffic is running... mostly
  nothing bad will happen?"  There isn't a requirement to have an
  migration node, which probably has a real $ benefit to customers who
  have large data centers and don't need to tie up hardware.

* OVN / orchestration upgrades
  I'm not as familiar with OVN - is there anything *active* that gets
  handled?  Can whatever orchestration tool just be torn-down and
  restarted without impacting the network (not just OVN, but say some
  neutron API back-end that calls into OVS)?

* Any other users who will upgrade?
  I'm not sure.  Do we need to classify distros as a different upgrade
  case?  Maybe.  After all, each distribution packages things a bit
  differently and perhaps layers their cloud offerings, or OpenStack, or
  kuberenetes with Open vSwitch slightly differently.  Maybe that can be
  lumped into the other buckets.  Maybe each needs to be broken down.

Sorry - it looks like I haven't even come close to an answer for
anything.

> I guess the other part that I'd like to think through is, what is the
> actual goal?  It's one thing to not lose packet flows but we also need
> to make sure that the new ovs-vswitchd gets the same OpenFlow flows,
> etc. and that its internal state (MAC tables etc.) get populated from
> the old ovs-vswitchd's state, otherwise when the new one takes over
> there will be blips due to that change.

It's probably good to also understand which blips will always exist
(there will be some performance degradation while upgrading equivalent
to XXX, because of the YYY), and which can be handled gracefully.

> The other aspect I'd like to think about is downgrades.  One would like
> to believe that every upgrade goes perfectly, but of course it's not
> true, and users may be more reluctant to upgrade if they believe that
> reverting to the previous version is disruptive.  I am not sure that
> downgrades are more difficult, in most ways, but at least they should be
> considered.

Thanks for this, Ben!  It's a lot to digest, and I'll be asking even
more questions now. :)

> On Fri, Jan 12, 2018 at 02:19:33PM -0500, Aaron Conole wrote:
>> IMPORTANT:  Please remember this is simply a strawman to frame a discussion
>>             around a concept called 'graceful restart.'  More to be 
>> explained.
>> 
>> Now that 2.9 work is frozen and the tree will be forked off, I assumed
>> more extreme and/or interesting ideas might be welcome.  As such, here's
>> something fairly small-ish that provides an interesting behavior called
>> 'Graceful Restart.'  The idea is that when the OvS userspace is being
>> upgraded, we can leave the existing flows installed in the datapath allowing
>> existing flows to continue.  Once the new versions of the daemons take over,
>> the standard dump/sweep operations of the revalidator threads will resume
>> and "Everything Will Just Work(tm)."
>> 
>> Of course, there are some important corner cases and side effects that
>> need to be thought out.  I've listed the ones I know of here (no particular
>> order, though):
>> 
>> 
>> 1. Only the active datapath flows (those installed in the kernel datapath
>>    at the time of 'reload') will remain while the daemons are down.  This
>>    means *any* new traffic (possibly even new connections between the same
>>    endpoints) will fail to pass.  This even means a ping between endpoints
>>    could start failing (ie: if neighbor entries expire, no ARP/ND can pass
>>    and the neighbor will not be resolved causing send failures - unless
>>    those flows are luckily still in the kernel datapath).
>> 
>>    1a.  This also means that some protocol exchanges might *seem* to
>>         work on first glance, but won't actually proceed.  I'm thinking
>>         cases where pings are used as 'keep alives.'  That's no different
>>         than existing system.  What will be different is the user 
>> expectation.
>>         The expectation with a "graceful" restart may be that no such 
>> failures
>>         would exist.
>> 
>> 2. This is a strong knob that a user may accidentally trigger.  If they do,
>>    flows will *NEVER* die from the kernel datapath while the daemons are
>>    running.  This might be acceptable to keep around.  After all, it isn't
>>    a persistent database entry or anything.  The flag only exists for the
>>    lifetime of the userspace process (so a restart can also be an effect
>>    which 'clears' the behavior).  I'm not sure if this would be acceptable.
>> 
>> 3. Traffic will pass with no userspace knowledge for a time.  I think this
>>    is okay - after all if the OvS daemon is killed flows will stick around.
>>    However, this behavior would go from "well, sometimes it could happen," to
>>    "we plan and/or expect such to happen."
>> 
>> 4. This only covers the kernel datapath.  Userspace datapath implementations
>>    will still lose the entire datapath during restart.
>> 
>> 
>> There probably exists a better/more efficient/more functionally appropriate
>> way of achieving the desired effect.  This is simply to spawn some discussion
>> in the upstream community to see if there's a way to achieve this "graceful 
>> restart" effect (ie: not losing existing packet flow) during planned
>> outages (upgrades, reloads, etc.)
>> 
>> Since the implementation is subject to complete and total change, I haven't
>> written any documentation for this feature yet.  I'm saving that work for
>> another spin after getting some feedback.  There may be other opportunity,
>> for instance, to integrate with something like ovs-ctl for a system-agnostic
>> implementation.
>> 
>> Aaron Conole (2):
>>   datapath: prevent deletion of flows / datapaths
>>   rhel: tell ovsctl to freeze the datapath
>> 
>>  lib/dpctl.c                                        | 27 +++++++++
>>  lib/dpif-netdev.c                                  |  2 +
>>  lib/dpif-netlink.c                                 | 65 
>> ++++++++++++++++------
>>  lib/dpif-provider.h                                |  8 +++
>>  lib/dpif.c                                         | 22 ++++++++
>>  lib/dpif.h                                         |  2 +
>>  .../usr_lib_systemd_system_ovs-vswitchd.service.in |  2 +-
>>  utilities/ovs-ctl.in                               |  4 ++
>>  8 files changed, 115 insertions(+), 17 deletions(-)
>> 
>> -- 
>> 2.14.3
>> 
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC 0/2] 'Graceful restart' of OvS

Reply via email to