On Wed, Mar 11, 2015 at 5:59 AM, Sean Dague <s...@dague.net> wrote: > The last couple of days I was at the Operators Meetup acting as Nova > rep for the meeting. All the sessions were quite nicely recorded to > etherpads here - https://etherpad.openstack.org/p/PHL-ops-meetup > > There was both a specific Nova session - > https://etherpad.openstack.org/p/PHL-ops-nova-feedback as well as a > bunch of relevant pieces of information in other sessions. > > This is an attempt for some summary here, anyone else that was in > attendance please feel free to correct if I'm interpreting something > incorrectly. There was a lot of content there, so this is in no way > comprehensive list, just the highlights that I think make the most > sense for the Nova team. > > ========================= > Nova Network -> Neutron > ========================= > > This remains listed as the #1 issue from the Operator Community on > their burning issues list > (https://etherpad.openstack.org/p/PHL-ops-burning-issues L18). During > the tags conversation we straw polled the audience > (https://etherpad.openstack.org/p/PHL-ops-tags L45) and about 75% of > attendees were over on neutron already. However those on Nova Network > we disproportionally the largest clusters and longest standing > OpenStack users. > > Of those on nova-network about 1/2 had no interest in being on > Neutron (https://etherpad.openstack.org/p/PHL-ops-nova-feedback > L24). Some of the primary reasons were the following: > > - Complexity concerns - neutron has a lot more moving parts > - Performance concerns - nova multihost means there is very little > between guests and the fabric, which is really important for the HPC > workload use case for OpenStack. > - Don't want OVS - ovs adds additional complexity, and performance > concerns. Many large sites are moving off ovs back to linux bridge > with neutron because they are hitting OVS scaling limits (especially > if on UDP) - (https://etherpad.openstack.org/p/PHL-ops-OVS L142) > > The biggest disconnect in the model seems to be that Neutron assumes > you want self service networking. Most of these deploys don't. Or even > more importantly, they live in an organization where that is never > going to be an option. > > Neutron provider networks is close, except it doesn't provide for > floating IP / NAT. > > Going forward: I think the gap analysis probably needs to be revisited > with some of the vocal large deployers. I think we assumed the > functional parity gap was closed with DVR, but it's not clear in it's > current format it actually meets the n-net multihost users needs. > > =================== > EC2 going forward > =================== > > Having a sustaninable EC2 is of high interest to the operator > community. Many large deploys have some users that were using AWS > prior to using OpenStack, or currently are using both. They have > preexisting tooling for that. > > There didn't seem to be any objection to the approach of an external > proxy service for this function - > (https://etherpad.openstack.org/p/PHL-ops-nova-feedback L111). Mostly > the question is timing, and the fact that no one has validated the > stackforge project. The fact that we landed everything people need to > run this in Kilo is good, as these production deploys will be able to > test it for their users when they upgrade. > > ============================ > Burning Nova Features/Bugs > ============================ > > Hierarchical Projects Quotas > ---------------------------- > > Hugely desired feature by the operator community > (https://etherpad.openstack.org/p/PHL-ops-nova-feedback L116). Missed > Kilo. This made everyone sad. > > Action: we should queue this up as early Liberty priority item. > > Out of sync Quotas > ------------------ > > https://etherpad.openstack.org/p/PHL-ops-nova-feedback L63 > > The quotas code is quite racey (this is kind of a known if you look at > the bug tracker). It was actually marked as a top soft spot during > last fall's bug triage - > > http://lists.openstack.org/pipermail/openstack-dev/2014-September/046517.html > > There is an operator proposed spec for an approach here - > https://review.openstack.org/#/c/161782/ > > Action: we should make a solution here a top priority for enhanced > testing and fixing in Liberty. Addressing this would remove a lot of > pain from ops. > > To help us better track quota bugs I created a quotas tag:
https://bugs.launchpad.net/nova/+bugs?field.tag=quotas Next step is re-triage those bugs: mark fixed bugs as fixed, deduplicate bugs etc. > Reporting on Scheduler Fails > ---------------------------- > > Apparently, some time recently, we stopped logging scheduler fails > above DEBUG, and that behavior also snuck back into Juno as well > (https://etherpad.openstack.org/p/PHL-ops-nova-feedback L78). This > has made tracking down root cause of failures far more difficult. > > Action: this should hopefully be a quick fix we can get in for Kilo > and backport. > > ============================= > Additional Interesting Bits > ============================= > > Rabbit > ------ > > There was a whole session on Rabbit - > https://etherpad.openstack.org/p/PHL-ops-rabbit-queue > > Rabbit is a top operational concern for most large sites. Almost all > sites have a "restart everything that talks to rabbit" script because > during rabbit ha opperations queues tend to blackhole. > > All other queue systems OpenStack supports are worse than Rabbit (from > experience in that room). > > oslo.messaging < 1.6.0 was a significant regression in dependability > from the incubator code. It now seems to be getting better but still a > lot of issues. (L112) > > Operators *really* want the concept in > https://review.openstack.org/#/c/146047/ landed. (I asked them to > provide such feedback in gerrit). > > Nova Rolling Upgrades > --------------------- > > Most people really like the concept, couldn't find anyone that had > used it yet because Neutron doesn't support it, so they had to big > bang upgrades anyway. > > Galera Upstream Testing > ----------------------- > > The majority of deploys run with Galera MySQL. There was a question > about whether or not we could get that into upstream testing pipeline > as that's the common case. > > > -Sean > > -- > Sean Dague > http://dague.net > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev