Tim Bell wrote: > Michael has been posting very informative blogs on the summary of the > mid-cycle meetups for Nova. The one on the Nova Network to Neutron > migration was of particular interest to me as it raises a number of > potential impacts for the CERN production cloud. The blog itself is at > http://www.stillhq.com/openstack/juno/000014.html > > I would welcome suggestions from the community on the approach to take > and areas that the nova/neutron team could review to limit the impact on > the cloud users. > > For some background, CERN has been running nova-network in flat DHCP > mode since our first Diablo deployment. We moved to production for our > users in July last year and are currently supporting around 70,000 > cores, 6 cells, 100s of projects and thousands of VMs. Upgrades > generally involve disabling the API layer while allowing running VMs to > carry on without disruption. Within the time scale of the migration to > Neutron (M release at the latest), these numbers are expected to double.
Thanks for bringing your concerns here. To start this discussion, it's worth adding some context on the currently-proposed "cold" migration path. During the Icehouse and Juno cycles the TC reviewed the gaps between the integration requirements we now place on new entrants and the currently-integrated projects. That resulted in a number of identified gaps that we asked projects to address ASAP, ideally within the Juno cycle. Most of the Neutron gaps revolved around its failure to be a full nova-network replacement -- some gaps around supporting basic modes of operation, and a gap in providing a basic migration path. Neutron devs promised to close that in Juno, but after a bit of discussion we considered that a cold migration path was all we'd require them to provide in Juno. That doesn't mean a "hot" or "warm" migration path can't be worked on. There are two questions to solve: how can we technically perform that migration with a minimal amount of downtime, and is it reasonable to mark nova-network deprecated until we've solved that issue. On the first question, migration is typically an operational problem, and operators could really help to design one that would be acceptable to them. They may require developers to add features in the code to support that process, but we seem to not even be at this stage. Ideally I would like ops and devs to join to solve that technical challenge. The answer to the second question lies in the multiple dimensions of "deprecated". On one side it means "is no longer in our future plans, new usage is now discouraged, new development is stopped, explore your options to migrate out of it". I think it's extremely important that we do that as early as possible, to reduce duplication of effort and set expectations correctly. On the other side it means "will be removed in release X" (not necessarily the next release, but you set a countdown). To do that, you need to be pretty confident that you'll have your ducks in a row at removal date, and don't set up operators for a nightmare migration. > For us, the concerns we have with the ‘cold’ approach would be on the > user impact and operational risk of such a change. Specifically, > > 1. A big bang approach of shutting down the cloud, upgrade and the > resuming the cloud would cause significant user disruption > > 2. The risks involved with a cloud of this size and the open source > network drivers would be difficult to mitigate through testing and could > lead to site wide downtime > > 3. Rebooting VMs may be possible to schedule in batches but would > need to be staggered to keep availability levels What minimal level of "hot" would be acceptable to you ? -- Thierry Carrez (ttx) _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
