Re: [openstack-dev] [nova][ironic] A couple feature freeze exception requests
> Multitenant networking > == I haven't reviewed this one much either, but it looks smallish and if other people are good with it then I think it's probably something we should do. > Multi-compute usage via a hash ring > === I'm obviously +2 on this one :) --Dan __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][ironic] A couple feature freeze exception requests
On 8/1/2016 4:20 PM, Jim Rollenhagen wrote: Yes, I know this is stupid late for these. I'd like to request two exceptions to the non-priority feature freeze, for a couple of features in the Ironic driver. These were not requested at the normal time as I thought they were nowhere near ready. Multitenant networking == Ironic's top feature request for around 2 years now has been to make networking safe for multitenant use, as opposed to a flat network (including control plane access!) for all tenants. We've been working on a solution for 3 cycles now, and finally have the Ironic pieces of it done, after a heroic effort to finish things up this cycle. There's just one patch left to make it work, in the virt driver in Nova. That is here: https://review.openstack.org/#/c/297895/ It's important to note that this actually fixes some dead code we pushed on before this feature was done, and is only ~50 lines, half of which are comments/reno. Reviewers on this unearthed a problem on the ironic side, which I expect to be fixed in the next couple of days: https://review.openstack.org/#/q/topic:bug/1608511 We also have CI for this feature in ironic, and I have a depends-on testing all of this as a whole: https://review.openstack.org/#/c/347004/ Per Matt's request, I'm also adding that job to Nova's experimental queue: https://review.openstack.org/#/c/349595/ A couple folks from the ironic team have also done some manual testing of this feature, with the nova code in, using real switches. Merging this patch would bring a *huge* win for deployers and operators, and I don't think it's very risky. It'll be ready to go sometime this week, once that ironic chain is merged. I've reviewed this one and it looks good to me. It's dependent on python-ironicclient>=1.5.0 which Jim has a g-r bump up as a dependency. And the gate-tempest-dsvm-ironic-multitenant-network-nv job is testing this and passing on the test patch in ironic (and that job is in the nova experimental queue now). The upgrade procedure had some people scratching their heads in IRC this week so I've stated that we need clear documentation there, which will probably live here: http://docs.openstack.org/developer/ironic/deploy/upgrade-guide.html Since Ironic isn't in here: http://docs.openstack.org/ops-guide/ops_upgrades.html#update-services But the docs in the Ironic repo say that Nova should be upgraded first when going from Juno to Kilo so it's definitely important to get those docs updated for upgrades from Mitaka to Newton, but Jim said he'd do that this cycle. Given how long people have been asking for this in Ironic and the Ironic team has made it a priority to get it working on their side, and there is CI already and a small change in Nova, I'm OK with giving a non-priority FFE for this. Multi-compute usage via a hash ring === One of the major problems with the ironic virt driver today is that we don't support running multiple nova-compute daemons with the ironic driver loaded, because each compute service manages all ironic nodes and stomps on each other. There's currently a hack in the ironic virt driver to kind of make this work, but instance locking still isn't done: https://github.com/openstack/ironic/blob/master/ironic/nova/compute/manager.py That is also holding back removing the pluggable compute manager in nova: https://github.com/openstack/nova/blob/master/nova/conf/service.py#L64-L69 And as someone that runs a deployment using this hack, I can tell you first-hand that it doesn't work well. We (the ironic and nova community) have been working on fixing this for 2-3 cycles now, trying to find a solution that isn't terrible and doesn't break existing use cases. We've been conflating it with how we schedule ironic instances and keep managing to find a big wedge with each approach. The best approach we've found involves duplicating the compute capabilities and affinity filters in ironic. Some of us were talking at the nova midcycle and decided we should try the hash ring approach (like ironic uses to shard nodes between conductors) and see how it works out, even though people have said in the past that wouldn't work. I did a proof of concept last week, and started playing with five compute daemons in a devstack environment. Two nerd-snipey days later and I had a fully working solution, with unit tests, passing CI. That is here: https://review.openstack.org/#/c/348443/ We'll need to work on CI for this with multiple compute services. That shouldn't be crazy difficult, but I'm not sure we'll have it done this cycle (and it might get interesting trying to test computes joining and leaving the cluster). It also needs some testing at scale, which is hard to do in the upstream gate, but I'll be doing my best to ship this downstream as soon as I can, and iterating on any problems we see there. It's a huge win for operators, for only a few hundred lines (some of
Re: [openstack-dev] [nova][ironic] A couple feature freeze exception requests
On 08/01/2016 05:20 PM, Jim Rollenhagen wrote: Yes, I know this is stupid late for these. I'd like to request two exceptions to the non-priority feature freeze, for a couple of features in the Ironic driver. These were not requested at the normal time as I thought they were nowhere near ready. Multitenant networking == Ironic's top feature request for around 2 years now has been to make networking safe for multitenant use, as opposed to a flat network (including control plane access!) for all tenants. We've been working on a solution for 3 cycles now, and finally have the Ironic pieces of it done, after a heroic effort to finish things up this cycle. There's just one patch left to make it work, in the virt driver in Nova. That is here: https://review.openstack.org/#/c/297895/ Reviewed. +2 from me, under the assumption that Ironic must always be upgraded before Nova per our discussion on IRC on the same topic today. It's important to note that this actually fixes some dead code we pushed on before this feature was done, and is only ~50 lines, half of which are comments/reno. Reviewers on this unearthed a problem on the ironic side, which I expect to be fixed in the next couple of days: https://review.openstack.org/#/q/topic:bug/1608511 We also have CI for this feature in ironic, and I have a depends-on testing all of this as a whole: https://review.openstack.org/#/c/347004/ Per Matt's request, I'm also adding that job to Nova's experimental queue: https://review.openstack.org/#/c/349595/ A couple folks from the ironic team have also done some manual testing of this feature, with the nova code in, using real switches. Merging this patch would bring a *huge* win for deployers and operators, and I don't think it's very risky. It'll be ready to go sometime this week, once that ironic chain is merged. ++ Multi-compute usage via a hash ring === One of the major problems with the ironic virt driver today is that we don't support running multiple nova-compute daemons with the ironic driver loaded, because each compute service manages all ironic nodes and stomps on each other. There's currently a hack in the ironic virt driver to kind of make this work, but instance locking still isn't done: https://github.com/openstack/ironic/blob/master/ironic/nova/compute/manager.py That is also holding back removing the pluggable compute manager in nova: https://github.com/openstack/nova/blob/master/nova/conf/service.py#L64-L69 And as someone that runs a deployment using this hack, I can tell you first-hand that it doesn't work well. We (the ironic and nova community) have been working on fixing this for 2-3 cycles now, trying to find a solution that isn't terrible and doesn't break existing use cases. We've been conflating it with how we schedule ironic instances and keep managing to find a big wedge with each approach. The best approach we've found involves duplicating the compute capabilities and affinity filters in ironic. Some of us were talking at the nova midcycle and decided we should try the hash ring approach (like ironic uses to shard nodes between conductors) and see how it works out, even though people have said in the past that wouldn't work. I did a proof of concept last week, and started playing with five compute daemons in a devstack environment. Two nerd-snipey days later and I had a fully working solution, with unit tests, passing CI. That is here: https://review.openstack.org/#/c/348443/ w00t :) We'll need to work on CI for this with multiple compute services. That shouldn't be crazy difficult, but I'm not sure we'll have it done this cycle (and it might get interesting trying to test computes joining and leaving the cluster). It also needs some testing at scale, which is hard to do in the upstream gate, but I'll be doing my best to ship this downstream as soon as I can, and iterating on any problems we see there. It's a huge win for operators, for only a few hundred lines (some of which will be pulled out to oslo next cycle, as it's copied from ironic). The single compute mode would still be recommended while we iron out any issues here, and that mode is well-understood (as this will behave the same in that case). We have a couple of nova cores on board with helping get this through, and I think it's totally doable. Thanks for hearing me out, // jim __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][ironic] A couple feature freeze exception requests
Yes, I know this is stupid late for these. I'd like to request two exceptions to the non-priority feature freeze, for a couple of features in the Ironic driver. These were not requested at the normal time as I thought they were nowhere near ready. Multitenant networking == Ironic's top feature request for around 2 years now has been to make networking safe for multitenant use, as opposed to a flat network (including control plane access!) for all tenants. We've been working on a solution for 3 cycles now, and finally have the Ironic pieces of it done, after a heroic effort to finish things up this cycle. There's just one patch left to make it work, in the virt driver in Nova. That is here: https://review.openstack.org/#/c/297895/ It's important to note that this actually fixes some dead code we pushed on before this feature was done, and is only ~50 lines, half of which are comments/reno. Reviewers on this unearthed a problem on the ironic side, which I expect to be fixed in the next couple of days: https://review.openstack.org/#/q/topic:bug/1608511 We also have CI for this feature in ironic, and I have a depends-on testing all of this as a whole: https://review.openstack.org/#/c/347004/ Per Matt's request, I'm also adding that job to Nova's experimental queue: https://review.openstack.org/#/c/349595/ A couple folks from the ironic team have also done some manual testing of this feature, with the nova code in, using real switches. Merging this patch would bring a *huge* win for deployers and operators, and I don't think it's very risky. It'll be ready to go sometime this week, once that ironic chain is merged. Multi-compute usage via a hash ring === One of the major problems with the ironic virt driver today is that we don't support running multiple nova-compute daemons with the ironic driver loaded, because each compute service manages all ironic nodes and stomps on each other. There's currently a hack in the ironic virt driver to kind of make this work, but instance locking still isn't done: https://github.com/openstack/ironic/blob/master/ironic/nova/compute/manager.py That is also holding back removing the pluggable compute manager in nova: https://github.com/openstack/nova/blob/master/nova/conf/service.py#L64-L69 And as someone that runs a deployment using this hack, I can tell you first-hand that it doesn't work well. We (the ironic and nova community) have been working on fixing this for 2-3 cycles now, trying to find a solution that isn't terrible and doesn't break existing use cases. We've been conflating it with how we schedule ironic instances and keep managing to find a big wedge with each approach. The best approach we've found involves duplicating the compute capabilities and affinity filters in ironic. Some of us were talking at the nova midcycle and decided we should try the hash ring approach (like ironic uses to shard nodes between conductors) and see how it works out, even though people have said in the past that wouldn't work. I did a proof of concept last week, and started playing with five compute daemons in a devstack environment. Two nerd-snipey days later and I had a fully working solution, with unit tests, passing CI. That is here: https://review.openstack.org/#/c/348443/ We'll need to work on CI for this with multiple compute services. That shouldn't be crazy difficult, but I'm not sure we'll have it done this cycle (and it might get interesting trying to test computes joining and leaving the cluster). It also needs some testing at scale, which is hard to do in the upstream gate, but I'll be doing my best to ship this downstream as soon as I can, and iterating on any problems we see there. It's a huge win for operators, for only a few hundred lines (some of which will be pulled out to oslo next cycle, as it's copied from ironic). The single compute mode would still be recommended while we iron out any issues here, and that mode is well-understood (as this will behave the same in that case). We have a couple of nova cores on board with helping get this through, and I think it's totally doable. Thanks for hearing me out, // jim __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev