Re: [openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

2014-08-29 Thread Joe Harrison
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 27/08/14 12:59, Tim Bell wrote:
 -Original Message- From: Michael Still
 [mailto:mi...@stillhq.com] Sent: 26 August 2014 22:20 To:
 OpenStack Development Mailing List (not for usage questions) 
 Subject: Re: [openstack-dev] [nova][neutron] Migration from
 nova-network to Neutron for large production clouds
 ...
 
 Mark and I finally got a chance to sit down and write out a basic
 proposal. It looks like this:
 
 
 Thanks... I've put a few questions inline and I'll ask the experts
 to review the steps when they're back from holidays
 
 == neutron step 0 == configure neutron to reverse proxy calls to
 Nova (part to be written)
 
 == nova-compute restart one == Freeze nova's network state
 (probably by stopping nova-api, but we could be smarter than that
 if required) Update all nova-compute nodes to point Neutron and
 remove nova-net agent for Neutron Nova aware L2 agent Enable
 Neutron Layer 2 agent on each node, this might have the side
 effect of causing the network configuration to be rebuilt for
 some instances API can be unfrozen at this time until ready for
 step 2
 
 
 - Would it be possible to only update some of the compute nodes ?
 We'd like to stage the upgrade if we can in view of scaling risks.
 Worst case, we'd look to do it cell by cell but those are quite
 large already (200+ hypervisors)

I have a few what-ifs when comes to this:-

- - What if the migration fails halfway through? How do we administrate
nova in this situation?

Unfortunately Tim, last time I checked Neutron has no awareness of
Nova's cells (and only recently became aware of nova regions) so I
don't see how this would be taken into account for a migration.

 
 == neutron restart two == Freeze nova's network state (probably
 by stopping nova-api, but we could be smarter than that if
 required) Dump/translate/restore date from Nova-Net to Neutron
 Configure Neutron to point to its own database Unfreeze Nova API
 

I think it's a good idea to be smarter.

 
 - Linked with the point above, we'd like to do the nova-net to
 neutron in stages if we can

Again, this sounds like a nightmare if it fails. This sounds like it's
meant to be one big transaction, but it is anything but.

For this to be done safely in a production cloud (which is one of the
few reasons to actually do a replacement instead of just swapping out
the component), we need to be able to run Neutron and Nova-net at the
same time or it *does* have to become a transactional migration.

If the migration fails at some stage, you're left in limbo. Does Nova
work? Does Neutron work?

There needs to be some sort of fault tolerance or rollback feature if
you're going down the all or nothing approach to stop a cloud being
left in an inconsistent (and impossible to administrate or operate via
APIs) state.

If the two of them (Nova-network and Neutron) could both exist and
operate at the same time in a cloud, it wouldn't have to be a one-shot
migration. If some nodes fail, that's fine as you could just let them
fall back to Nova-net and fix them whilst your cloud still works and
more importantly nova-api is up and running.

 
 *** Stopping point for linuxbridge to linuxbridge translation, or
 continue for rollout of new tech
 
 == nova-compute restart two == Configure OVS or new technology,
 ensure that proper ML2 driver is installed Restart Layer2 agent
 on each hypervisor where next gen networking should be enabled
 
 
 So, I want to stop using the word cold to describe this. Its
 more of a rolling upgrade than a cold migration. So... Would two
 shorter nova API outages be acceptable?
 
 
 Two Nova API outages would be OK for us.

I think the Nova API outages are the least concern in comparison to
being left in a halfway state in a production environment. Hopefully
these concerns can be addresses.

 
 Michael
 
 -- Rackspace Australia

Whilst I wholeheartedly agree that this migration plan seems like a
good idea (and reminds me of an Raiders of the Lost Ark-esque scene),
I'm afraid of what would happen if something went wrong in the middle
of this swap.

It wouldn't be a good idea to stop nova-api to fix this, as users and
services would be able to use it again.

Perhaps we should change the policy on nova-api during this migration
to only allow access to a special migration role or the like? This
would disable services or users from accessing Nova's api when a
special policy is applied for the migration, but allow administrators
to continue monitoring via the API and fix any problems. This seems
like a currently absent must-have.

I like the idea of the migration, but I hope that any and all what
if? questions have been addressed and the problems are mitigated.

I wish you and Mark lots of luck with this migration, but please make
sure it's not fragile and ensure it's fault tolerant!

Cheers,
Joe
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJUAIpaAAoJEHYEICnOV08jDrMQALq9oqx1Qj9j5AKNEPdofA+M

Re: [openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

2014-08-27 Thread Tim Bell
 -Original Message-
 From: Michael Still [mailto:mi...@stillhq.com]
 Sent: 26 August 2014 22:20
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova][neutron] Migration from nova-network to
 Neutron for large production clouds
...
 
 Mark and I finally got a chance to sit down and write out a basic proposal. It
 looks like this:
 

Thanks... I've put a few questions inline and I'll ask the experts to review 
the steps when they're back from holidays

 == neutron step 0 ==
 configure neutron to reverse proxy calls to Nova (part to be written)
 
 == nova-compute restart one ==
 Freeze nova's network state (probably by stopping nova-api, but we could be
 smarter than that if required) Update all nova-compute nodes to point Neutron
 and remove nova-net agent for Neutron Nova aware L2 agent Enable Neutron
 Layer 2 agent on each node, this might have the side effect of causing the
 network configuration to be rebuilt for some instances API can be unfrozen at
 this time until ready for step 2
 

- Would it be possible to only update some of the compute nodes ? We'd like to 
stage the upgrade if we can in view of scaling risks. Worst case, we'd look to 
do it cell by cell but those are quite large already (200+ hypervisors)

 == neutron restart two ==
 Freeze nova's network state (probably by stopping nova-api, but we could be
 smarter than that if required) Dump/translate/restore date from Nova-Net to
 Neutron Configure Neutron to point to its own database Unfreeze Nova API
 

- Linked with the point above, we'd like to do the nova-net to neutron in 
stages if we can

 *** Stopping point for linuxbridge to linuxbridge translation, or continue for
 rollout of new tech
 
 == nova-compute restart two ==
 Configure OVS or new technology, ensure that proper ML2 driver is installed
 Restart Layer2 agent on each hypervisor where next gen networking should be
 enabled
 
 
 So, I want to stop using the word cold to describe this. Its more of a 
 rolling
 upgrade than a cold migration. So... Would two shorter nova API outages be
 acceptable?
 

Two Nova API outages would be OK for us.

 Michael
 
 --
 Rackspace Australia
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

2014-08-26 Thread Tim Bell

 From: Michael Still [mailto:mi...@stillhq.com] 
 Sent: 25 August 2014 23:38
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova][neutron] Migration from nova-network to 
 Neutron for large production clouds

...

 Mark McClain and I discussed a possible plan for nova-network to neutron 
 upgrades at the Ops Meetup today, and it seemed generally acceptable. It 
 defines a cold migration as
 freezing the ability to create or destroy instances during the upgrade, and 
 then requiring a short network outage for each instance in the cell.
 This is why I'm trying to understand the no downtime use case better. Is it 
 literally no downtime, ever? Or is it a more simple no simultaneous downtime 
 for instances?
 Michael

The simultaneous downtime across the cloud is the one we really need to avoid. 
Short network outages (depending on how you define short) can be handled along 
with blocking API operations for short periods.

The other item was how to stage the upgrade.. with a cloud of a significant 
size and some concerns about scalability, we would like to be able to do the 
migration as a set of steps rather than a big bang. During the gap between the 
steps, we'd like to open the APIs for usage, such as new VMs get created on 
Neutron hypervisors. Would that be a possibility ?

Tim
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

2014-08-26 Thread Oleg Bondarev
Hi,

I'd like to encourage everybody interested to take a look and leave
comments on the Neutron migration spec here:
https://review.openstack.org/#/c/101921

The design currently includes both cold and live approaches, supports
host-by-host migration (as opposite to big bang)
and doesn't require to freeze the whole deployment during upgrade.

I've also started prototyping the above spec:
https://review.openstack.org/#/c/111755 - Neutron migration: synchronize IP
(de)allocations with Nova-net
https://review.openstack.org/#/c/115635 - Neutron migration as part of cold
migration



On Tue, Aug 26, 2014 at 1:59 PM, Tim Bell tim.b...@cern.ch wrote:


  From: Michael Still [mailto:mi...@stillhq.com]
  Sent: 25 August 2014 23:38
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova][neutron] Migration from nova-network
 to Neutron for large production clouds

 ...

  Mark McClain and I discussed a possible plan for nova-network to neutron
 upgrades at the Ops Meetup today, and it seemed generally acceptable. It
 defines a cold migration as
  freezing the ability to create or destroy instances during the upgrade,
 and then requiring a short network outage for each instance in the cell.
  This is why I'm trying to understand the no downtime use case better.
 Is it literally no downtime, ever? Or is it a more simple no simultaneous
 downtime for instances?
  Michael

 The simultaneous downtime across the cloud is the one we really need to
 avoid. Short network outages (depending on how you define short) can be
 handled along with blocking API operations for short periods.

 The other item was how to stage the upgrade.. with a cloud of a
 significant size and some concerns about scalability, we would like to be
 able to do the migration as a set of steps rather than a big bang. During
 the gap between the steps, we'd like to open the APIs for usage, such as
 new VMs get created on Neutron hypervisors. Would that be a possibility ?

 Tim
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

2014-08-26 Thread Michael Still
On Tue, Aug 26, 2014 at 7:59 PM, Tim Bell tim.b...@cern.ch wrote:


  From: Michael Still [mailto:mi...@stillhq.com]
  Sent: 25 August 2014 23:38
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova][neutron] Migration from nova-network to 
  Neutron for large production clouds

 ...

  Mark McClain and I discussed a possible plan for nova-network to neutron 
  upgrades at the Ops Meetup today, and it seemed generally acceptable. It 
  defines a cold migration as
  freezing the ability to create or destroy instances during the upgrade, and 
  then requiring a short network outage for each instance in the cell.
  This is why I'm trying to understand the no downtime use case better. Is 
  it literally no downtime, ever? Or is it a more simple no simultaneous 
  downtime for instances?
  Michael

 The simultaneous downtime across the cloud is the one we really need to 
 avoid. Short network outages (depending on how you define short) can be 
 handled along with blocking API operations for short periods.

 The other item was how to stage the upgrade.. with a cloud of a significant 
 size and some concerns about scalability, we would like to be able to do the 
 migration as a set of steps rather than a big bang. During the gap between 
 the steps, we'd like to open the APIs for usage, such as new VMs get created 
 on Neutron hypervisors. Would that be a possibility ?

Mark and I finally got a chance to sit down and write out a basic
proposal. It looks like this:

== neutron step 0 ==
configure neutron to reverse proxy calls to Nova (part to be written)

== nova-compute restart one ==
Freeze nova's network state (probably by stopping nova-api, but we
could be smarter than that if required)
Update all nova-compute nodes to point Neutron and remove nova-net
agent for Neutron Nova aware L2 agent
Enable Neutron Layer 2 agent on each node, this might have the side
effect of causing the network configuration to be rebuilt for some
instances
API can be unfrozen at this time until ready for step 2

== neutron restart two ==
Freeze nova's network state (probably by stopping nova-api, but we
could be smarter than that if required)
Dump/translate/restore date from Nova-Net to Neutron
Configure Neutron to point to its own database
Unfreeze Nova API

*** Stopping point for linuxbridge to linuxbridge translation, or
continue for rollout of new tech

== nova-compute restart two ==
Configure OVS or new technology, ensure that proper ML2 driver is installed
Restart Layer2 agent on each hypervisor where next gen networking
should be enabled


So, I want to stop using the word cold to describe this. Its more of
a rolling upgrade than a cold migration. So... Would two shorter nova
API outages be acceptable?

Michael

-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

2014-08-25 Thread Michael Still
On Thu, Aug 21, 2014 at 1:17 AM, Tim Bell tim.b...@cern.ch wrote:

  Michael has been posting very informative blogs on the summary of the
 mid-cycle meetups for Nova. The one on the Nova Network to Neutron
 migration was of particular interest to me as it raises a number of
 potential impacts for the CERN production cloud. The blog itself is at
 http://www.stillhq.com/openstack/juno/14.html



 I would welcome suggestions from the community on the approach to take and
 areas that the nova/neutron team could review to limit the impact on the
 cloud users.



 For some background, CERN has been running nova-network in flat DHCP mode
 since our first Diablo deployment. We moved to production for our users in
 July last year and are currently supporting around 70,000 cores, 6 cells,
 100s of projects and thousands of VMs. Upgrades generally involve disabling
 the API layer while allowing running VMs to carry on without disruption.
 Within the time scale of the migration to Neutron (M release at the
 latest), these numbers are expected to double.



 For us, the concerns we have with the ‘cold’ approach would be on the user
 impact and operational risk of such a change. Specifically,



 1.  A big bang approach of shutting down the cloud, upgrade and the
 resuming the cloud would cause significant user disruption

 2.  The risks involved with a cloud of this size and the open source
 network drivers would be difficult to mitigate through testing and could
 lead to site wide downtime

 3.  Rebooting VMs may be possible to schedule in batches but would
 need to be staggered to keep availability levels



 Note, we are not looking to use Neutron features initially, just to find a
 functional equivalent of the flat DHCP network.



 We would appreciate suggestions on how we could achieve a smooth migration
 for the simple flat DHCP models.


Thanks for sending this Tim. Sorry for my slow reply, a day long meeting
and some international travel got in the way. When we originally talked, I
said I needed to understand more of the background to your need for a zero
downtime upgrade. That said...

Mark McClain and I discussed a possible plan for nova-network to neutron
upgrades at the Ops Meetup today, and it seemed generally acceptable. It
defines a cold migration as freezing the ability to create or destroy
instances during the upgrade, and then requiring a short network outage for
each instance in the cell.

This is why I'm trying to understand the no downtime use case better. Is
it literally no downtime, ever? Or is it a more simple no simultaneous
downtime for instances?

Michael

-- 
Rackspace Australia
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

2014-08-21 Thread Thierry Carrez
Tim Bell wrote:
 Michael has been posting very informative blogs on the summary of the
 mid-cycle meetups for Nova. The one on the Nova Network to Neutron
 migration was of particular interest to me as it raises a number of
 potential impacts for the CERN production cloud. The blog itself is at
 http://www.stillhq.com/openstack/juno/14.html
 
 I would welcome suggestions from the community on the approach to take
 and areas that the nova/neutron team could review to limit the impact on
 the cloud users.
 
 For some background, CERN has been running nova-network in flat DHCP
 mode since our first Diablo deployment. We moved to production for our
 users in July last year and are currently supporting around 70,000
 cores, 6 cells, 100s of projects and thousands of VMs. Upgrades
 generally involve disabling the API layer while allowing running VMs to
 carry on without disruption. Within the time scale of the migration to
 Neutron (M release at the latest), these numbers are expected to double.

Thanks for bringing your concerns here. To start this discussion, it's
worth adding some context on the currently-proposed cold migration
path. During the Icehouse and Juno cycles the TC reviewed the gaps
between the integration requirements we now place on new entrants and
the currently-integrated projects. That resulted in a number of
identified gaps that we asked projects to address ASAP, ideally within
the Juno cycle.

Most of the Neutron gaps revolved around its failure to be a full
nova-network replacement -- some gaps around supporting basic modes of
operation, and a gap in providing a basic migration path. Neutron devs
promised to close that in Juno, but after a bit of discussion we
considered that a cold migration path was all we'd require them to
provide in Juno.

That doesn't mean a hot or warm migration path can't be worked on.
There are two questions to solve: how can we technically perform that
migration with a minimal amount of downtime, and is it reasonable to
mark nova-network deprecated until we've solved that issue.

On the first question, migration is typically an operational problem,
and operators could really help to design one that would be acceptable
to them. They may require developers to add features in the code to
support that process, but we seem to not even be at this stage. Ideally
I would like ops and devs to join to solve that technical challenge.

The answer to the second question lies in the multiple dimensions of
deprecated.

On one side it means is no longer in our future plans, new usage is now
discouraged, new development is stopped, explore your options to migrate
out of it. I think it's extremely important that we do that as early as
possible, to reduce duplication of effort and set expectations correctly.

On the other side it means will be removed in release X (not
necessarily the next release, but you set a countdown). To do that, you
need to be pretty confident that you'll have your ducks in a row at
removal date, and don't set up operators for a nightmare migration.

 For us, the concerns we have with the ‘cold’ approach would be on the
 user impact and operational risk of such a change. Specifically,
 
 1.  A big bang approach of shutting down the cloud, upgrade and the
 resuming the cloud would cause significant user disruption
 
 2.  The risks involved with a cloud of this size and the open source
 network drivers would be difficult to mitigate through testing and could
 lead to site wide downtime
 
 3.  Rebooting VMs may be possible to schedule in batches but would
 need to be staggered to keep availability levels

What minimal level of hot would be acceptable to you ?

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

2014-08-21 Thread Daniel P. Berrange
On Wed, Aug 20, 2014 at 03:17:40PM +, Tim Bell wrote:
 Michael has been posting very informative blogs on the summary of
 the mid-cycle meetups for Nova. The one on the Nova Network to
 Neutron migration was of particular interest to me as it raises a
 number of potential impacts for the CERN production cloud. The blog
 itself is at http://www.stillhq.com/openstack/juno/14.html

FWIW, I do *not* support the following policy statement written
there

  The current plan is to go forward with a cold upgrade path,
   unless a user comes forward with an absolute hard requirement
   for a live upgrade, and a plan to fund developers to work on it.

I think that saying that our users are responsible for providing or
identifying funding for live upgrades is user-hostile  unacceptable.
If we as a dev team want to take away major features that our users
currently rely on in production and the users then determine ( tell
us) that the proposed upgrade path is not practical, then it is the
*dev team's* reponsibility to figure out how address that, not the
users.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

2014-08-21 Thread Tim Bell

On 21 Aug 2014, at 12:38, Thierry Carrez thie...@openstack.org wrote:

 Tim Bell wrote:
 Michael has been posting very informative blogs on the summary of the
 mid-cycle meetups for Nova. The one on the Nova Network to Neutron
 migration was of particular interest to me as it raises a number of
 potential impacts for the CERN production cloud. The blog itself is at
 http://www.stillhq.com/openstack/juno/14.html
 
 I would welcome suggestions from the community on the approach to take
 and areas that the nova/neutron team could review to limit the impact on
 the cloud users.
 
 For some background, CERN has been running nova-network in flat DHCP
 mode since our first Diablo deployment. We moved to production for our
 users in July last year and are currently supporting around 70,000
 cores, 6 cells, 100s of projects and thousands of VMs. Upgrades
 generally involve disabling the API layer while allowing running VMs to
 carry on without disruption. Within the time scale of the migration to
 Neutron (M release at the latest), these numbers are expected to double.
 
 Thanks for bringing your concerns here. To start this discussion, it's
 worth adding some context on the currently-proposed cold migration
 path. During the Icehouse and Juno cycles the TC reviewed the gaps
 between the integration requirements we now place on new entrants and
 the currently-integrated projects. That resulted in a number of
 identified gaps that we asked projects to address ASAP, ideally within
 the Juno cycle.
 
 Most of the Neutron gaps revolved around its failure to be a full
 nova-network replacement -- some gaps around supporting basic modes of
 operation, and a gap in providing a basic migration path. Neutron devs
 promised to close that in Juno, but after a bit of discussion we
 considered that a cold migration path was all we'd require them to
 provide in Juno.
 
 That doesn't mean a hot or warm migration path can't be worked on.
 There are two questions to solve: how can we technically perform that
 migration with a minimal amount of downtime, and is it reasonable to
 mark nova-network deprecated until we've solved that issue.
 
 On the first question, migration is typically an operational problem,
 and operators could really help to design one that would be acceptable
 to them. They may require developers to add features in the code to
 support that process, but we seem to not even be at this stage. Ideally
 I would like ops and devs to join to solve that technical challenge.
 
 The answer to the second question lies in the multiple dimensions of
 deprecated.
 
 On one side it means is no longer in our future plans, new usage is now
 discouraged, new development is stopped, explore your options to migrate
 out of it. I think it's extremely important that we do that as early as
 possible, to reduce duplication of effort and set expectations correctly.
 
 On the other side it means will be removed in release X (not
 necessarily the next release, but you set a countdown). To do that, you
 need to be pretty confident that you'll have your ducks in a row at
 removal date, and don't set up operators for a nightmare migration.
 
 For us, the concerns we have with the ‘cold’ approach would be on the
 user impact and operational risk of such a change. Specifically,
 
 1.  A big bang approach of shutting down the cloud, upgrade and the
 resuming the cloud would cause significant user disruption
 
 2.  The risks involved with a cloud of this size and the open source
 network drivers would be difficult to mitigate through testing and could
 lead to site wide downtime
 
 3.  Rebooting VMs may be possible to schedule in batches but would
 need to be staggered to keep availability levels
 
 What minimal level of hot would be acceptable to you ?
 

I am wary of using phrases like not acceptable as they tend to lead to very 
binary discussions :-)

We could consider rebooting VMs. We would much rather not have to. Rebooting 
all at once would cause major difficulties.

Staggering the VM migrations would allow us to significantly reduce the risk as 
we could pause in the event of an operational issue. My assumption is that 
rollback would be a major development effort so I prefer a way to progress with 
caution.

Renumbering IPs of VMs would be painful also.

I think, as you say, a small team of developers and operators with this need 
can sit down to find the right balance between a simple migration and an 
implementation which does not require infinite development effort.

Since there is an upcoming Ops meet up next week in San Antonio (Michael S 
thought he would attend), I can suggest to Tom that he gets some volunteers and 
then we discuss further in Paris.

I'm all in favour of early announcements of depreciations so that we can start 
to work this through with the community. I'd also like to not leave it too late 
as we are adding new VMs and hypervisors all the time and so the scale 
challenges will increase.


[openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

2014-08-20 Thread Tim Bell
Michael has been posting very informative blogs on the summary of the mid-cycle 
meetups for Nova. The one on the Nova Network to Neutron migration was of 
particular interest to me as it raises a number of potential impacts for the 
CERN production cloud. The blog itself is at 
http://www.stillhq.com/openstack/juno/14.html

I would welcome suggestions from the community on the approach to take and 
areas that the nova/neutron team could review to limit the impact on the cloud 
users.

For some background, CERN has been running nova-network in flat DHCP mode since 
our first Diablo deployment. We moved to production for our users in July last 
year and are currently supporting around 70,000 cores, 6 cells, 100s of 
projects and thousands of VMs. Upgrades generally involve disabling the API 
layer while allowing running VMs to carry on without disruption. Within the 
time scale of the migration to Neutron (M release at the latest), these numbers 
are expected to double.

For us, the concerns we have with the 'cold' approach would be on the user 
impact and operational risk of such a change. Specifically,


1.  A big bang approach of shutting down the cloud, upgrade and the 
resuming the cloud would cause significant user disruption

2.  The risks involved with a cloud of this size and the open source 
network drivers would be difficult to mitigate through testing and could lead 
to site wide downtime

3.  Rebooting VMs may be possible to schedule in batches but would need to 
be staggered to keep availability levels

Note, we are not looking to use Neutron features initially, just to find a 
functional equivalent of the flat DHCP network.

We would appreciate suggestions on how we could achieve a smooth migration for 
the simple flat DHCP models.

Tim

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev