-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 27/08/14 12:59, Tim Bell wrote: >> -----Original Message----- From: Michael Still >> [mailto:[email protected]] Sent: 26 August 2014 22:20 To: >> OpenStack Development Mailing List (not for usage questions) >> Subject: Re: [openstack-dev] [nova][neutron] Migration from >> nova-network to Neutron for large production clouds > ... >> >> Mark and I finally got a chance to sit down and write out a basic >> proposal. It looks like this: >> > > Thanks... I've put a few questions inline and I'll ask the experts > to review the steps when they're back from holidays > >> == neutron step 0 == configure neutron to reverse proxy calls to >> Nova (part to be written) >> >> == nova-compute restart one == Freeze nova's network state >> (probably by stopping nova-api, but we could be smarter than that >> if required) Update all nova-compute nodes to point Neutron and >> remove nova-net agent for Neutron Nova aware L2 agent Enable >> Neutron Layer 2 agent on each node, this might have the side >> effect of causing the network configuration to be rebuilt for >> some instances API can be unfrozen at this time until ready for >> step 2 >> > > - Would it be possible to only update some of the compute nodes ? > We'd like to stage the upgrade if we can in view of scaling risks. > Worst case, we'd look to do it cell by cell but those are quite > large already (200+ hypervisors) I have a few what-ifs when comes to this:- - - What if the migration fails halfway through? How do we administrate nova in this situation? Unfortunately Tim, last time I checked Neutron has no awareness of Nova's cells (and only "recently" became aware of nova regions) so I don't see how this would be taken into account for a migration. > >> == neutron restart two == Freeze nova's network state (probably >> by stopping nova-api, but we could be smarter than that if >> required) Dump/translate/restore date from Nova-Net to Neutron >> Configure Neutron to point to its own database Unfreeze Nova API >> I think it's a good idea to be smarter. > > - Linked with the point above, we'd like to do the nova-net to > neutron in stages if we can Again, this sounds like a nightmare if it fails. This sounds like it's meant to be one big transaction, but it is anything but. For this to be done safely in a production cloud (which is one of the few reasons to actually do a replacement instead of just swapping out the component), we need to be able to run Neutron and Nova-net at the same time or it *does* have to become a transactional migration. If the migration fails at some stage, you're left in limbo. Does Nova work? Does Neutron work? There needs to be some sort of fault tolerance or rollback feature if you're going down the "all or nothing" approach to stop a cloud being left in an inconsistent (and impossible to administrate or operate via APIs) state. If the two of them (Nova-network and Neutron) could both exist and operate at the same time in a cloud, it wouldn't have to be a one-shot migration. If some nodes fail, that's fine as you could just let them fall back to Nova-net and fix them whilst your cloud still works and more importantly nova-api is up and running. > >> *** Stopping point for linuxbridge to linuxbridge translation, or >> continue for rollout of new tech >> >> == nova-compute restart two == Configure OVS or new technology, >> ensure that proper ML2 driver is installed Restart Layer2 agent >> on each hypervisor where next gen networking should be enabled >> >> >> So, I want to stop using the word "cold" to describe this. Its >> more of a rolling upgrade than a cold migration. So... Would two >> shorter nova API outages be acceptable? >> > > Two Nova API outages would be OK for us. I think the Nova API outages are the least concern in comparison to being left in a "halfway" state in a production environment. Hopefully these concerns can be addresses. > >> Michael >> >> -- Rackspace Australia Whilst I wholeheartedly agree that this migration plan seems like a good idea (and reminds me of an Raiders of the Lost Ark-esque scene), I'm afraid of what would happen if something went wrong in the middle of this swap. It wouldn't be a good idea to stop nova-api to fix this, as users and services would be able to use it again. Perhaps we should change the policy on nova-api during this migration to only allow access to a special "migration" role or the like? This would disable services or users from accessing Nova's api when a special policy is applied for the migration, but allow administrators to continue monitoring via the API and fix any problems. This seems like a currently absent must-have. I like the idea of the migration, but I hope that any and all "what if?" questions have been addressed and the problems are mitigated. I wish you and Mark lots of luck with this migration, but please make sure it's not fragile and ensure it's fault tolerant! Cheers, Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJUAIpaAAoJEHYEICnOV08jDrMQALq9oqx1Qj9j5AKNEPdofA+M jIKW5i0tPFRe+eCfhM3yozGroKDldUcvKGUaZ8B5FkkHT5n959NGhOjIxcxaCkOb rDU5+LaIQG9QBKK4mAvzfE1D8KxhbfM/xmiBBDhWjl96+HxGUusHwtxlmPgHdK44 mRxcl63HxvAX2IC2XL8ZJ9Qew/LsT+rf8xSfD2MA6xgas2e6rBSxEOTLb6GUxvBQ DMWC8KlZthkLjLec+cBTwoQDB8nR2q1YW+qW3mCj8tp0HPYQhqagDwh6p329PvWq 2u3mwjsyzYrLi7FBw//VT188WKFwMC1opkXfk01mMZDt7FMVxzAM+oqGxdoBytfu PxcesrQlVjbxhXroEZmArXQVuDOwPrsKq6yykkeFjsq1ybjBNZvA123BjHSMstAH kqKBlbSgrBb0BaKhDZ5AKYldeYoIjkclMfL/2lafsm6ciwh+5B6JtImiuOVTg2+R FDUJ1m3//7+fqOf4Qb33srCJsZhn8/3vZhmfdC1X5dVAma4mZXllsa9sk5dAHEXV v50UdLKfjKHkDRmLsLWiodoC1KlL1EB93bo5zs0WjUkjzp1Mvc3uPa92PjcHnTzX kNNWC9cMd7vcdsdqoqw3fM8vdsREWAbdN5XpLV2m2U1f87TbHwDqwhi2uApbNvDm Gu1xzz62ohOQaYx8zRAC =M7s1 -----END PGP SIGNATURE----- _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
