Tim Bell wrote:
> Michael has been posting very informative blogs on the summary of the
> mid-cycle meetups for Nova. The one on the Nova Network to Neutron
> migration was of particular interest to me as it raises a number of
> potential impacts for the CERN production cloud. The blog itself is at
> http://www.stillhq.com/openstack/juno/000014.html
> 
> I would welcome suggestions from the community on the approach to take
> and areas that the nova/neutron team could review to limit the impact on
> the cloud users.
> 
> For some background, CERN has been running nova-network in flat DHCP
> mode since our first Diablo deployment. We moved to production for our
> users in July last year and are currently supporting around 70,000
> cores, 6 cells, 100s of projects and thousands of VMs. Upgrades
> generally involve disabling the API layer while allowing running VMs to
> carry on without disruption. Within the time scale of the migration to
> Neutron (M release at the latest), these numbers are expected to double.

Thanks for bringing your concerns here. To start this discussion, it's
worth adding some context on the currently-proposed "cold" migration
path. During the Icehouse and Juno cycles the TC reviewed the gaps
between the integration requirements we now place on new entrants and
the currently-integrated projects. That resulted in a number of
identified gaps that we asked projects to address ASAP, ideally within
the Juno cycle.

Most of the Neutron gaps revolved around its failure to be a full
nova-network replacement -- some gaps around supporting basic modes of
operation, and a gap in providing a basic migration path. Neutron devs
promised to close that in Juno, but after a bit of discussion we
considered that a cold migration path was all we'd require them to
provide in Juno.

That doesn't mean a "hot" or "warm" migration path can't be worked on.
There are two questions to solve: how can we technically perform that
migration with a minimal amount of downtime, and is it reasonable to
mark nova-network deprecated until we've solved that issue.

On the first question, migration is typically an operational problem,
and operators could really help to design one that would be acceptable
to them. They may require developers to add features in the code to
support that process, but we seem to not even be at this stage. Ideally
I would like ops and devs to join to solve that technical challenge.

The answer to the second question lies in the multiple dimensions of
"deprecated".

On one side it means "is no longer in our future plans, new usage is now
discouraged, new development is stopped, explore your options to migrate
out of it". I think it's extremely important that we do that as early as
possible, to reduce duplication of effort and set expectations correctly.

On the other side it means "will be removed in release X" (not
necessarily the next release, but you set a countdown). To do that, you
need to be pretty confident that you'll have your ducks in a row at
removal date, and don't set up operators for a nightmare migration.

> For us, the concerns we have with the ‘cold’ approach would be on the
> user impact and operational risk of such a change. Specifically,
> 
> 1.      A big bang approach of shutting down the cloud, upgrade and the
> resuming the cloud would cause significant user disruption
> 
> 2.      The risks involved with a cloud of this size and the open source
> network drivers would be difficult to mitigate through testing and could
> lead to site wide downtime
> 
> 3.      Rebooting VMs may be possible to schedule in batches but would
> need to be staggered to keep availability levels

What minimal level of "hot" would be acceptable to you ?

-- 
Thierry Carrez (ttx)

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to