> On 6/12/2014 10:31 AM, John Garbutt wrote: > > On 11 June 2014 20:07, Joe Gordon <[email protected]> wrote: > >> On Wed, Jun 11, 2014 at 11:38 AM, Matt Riedemann > >> <[email protected]> wrote: > >>> On 6/11/2014 10:01 AM, Eoghan Glynn wrote: > >>>> Thanks for bringing this to the list Matt, comments inline ... > >>>> > >>>>> tl;dr: some pervasive changes were made to nova to enable polling in > >>>>> ceilometer which broke some things and in my opinion shouldn't have > >>>>> been > >>>>> merged as a bug fix but rather should have been a blueprint. > >>>>> > >>>>> === > >>>>> > >>>>> The detailed version: > >>>>> > >>>>> I opened bug 1328694 [1] yesterday and found that came back to some > >>>>> changes made in ceilometer for bug 1262124 [2]. > >>>>> > >>>>> Upon further inspection, the original ceilometer bug 1262124 made some > >>>>> changes to the nova os-floating-ips API extension and the database API > >>>>> [3], and changes to python-novaclient [4] to enable ceilometer to use > >>>>> the new API changes (basically pass --all-tenants when listing floating > >>>>> IPs). > >>>>> > >>>>> The original nova change introduced bug 1328694 which spams the > >>>>> nova-api > >>>>> logs due to the ceilometer change [5] which does the polling, and right > >>>>> now in the gate ceilometer is polling every 15 seconds. > >>>> > >>>> > >>>> IIUC that polling cadence in the gate is in the process of being > >>>> reverted > >>>> to the out-of-the-box default of 600s. > >>>> > >>>>> I pushed a revert in ceilometer to fix the spam bug and a separate > >>>>> patch > >>>>> was pushed to nova to fix the problem in the network API. > >>>> > >>>> > >>>> Thank you for that. The revert is just now approved on the ceilometer > >>>> side, > >>>> and is wending its merry way through the gate. > >>>> > >>>>> The bigger problem I see here is that these changes were all made under > >>>>> the guise of a bug when I think this is actually a blueprint. We have > >>>>> changes to the nova API, changes to the nova database API, CLI changes, > >>>>> potential performance impacts (ceilometer can be hitting the nova > >>>>> database a lot when polling here), security impacts (ceilometer needs > >>>>> admin access to the nova API to list floating IPs for all tenants), > >>>>> documentation impacts (the API and CLI changes are not documented), > >>>>> etc. > >>>>> > >>>>> So right now we're left with, in my mind, two questions: > >>>>> > >>>>> 1. Do we just fix the spam bug 1328694 and move on, or > >>>>> 2. Do we revert the nova API/CLI changes and require this goes through > >>>>> the nova-spec blueprint review process, which should have happened in > >>>>> the first place. > >>>> > >>>> > >>>> So just to repeat the points I made on the unlogged #os-nova IRC channel > >>>> earlier, for posterity here ... > >>>> > >>>> Nova already exposed an all_tenants flag in multiple APIs (servers, > >>>> volumes, > >>>> security-groups etc.) and these would have: > >>>> > >>>> (a) generally pre-existed ceilometer's usage of the corresponding > >>>> APIs > >>>> > >>>> and: > >>>> > >>>> (b) been tracked and proposed at the time via straight-forward LP > >>>> bugs, > >>>> as opposed to being considered blueprint material > >>>> > >>>> So the manner of the addition of the all_tenants flag to the > >>>> floating_ips > >>>> API looks like it just followed existing custom & practice. > >>>> > >>>> Though that said, the blueprint process and in particular the nova-specs > >>>> aspect, has been tightened up since then. > >>>> > >>>> My preference would be to fix the issue in the underlying API, but to > >>>> use > >>>> this as "a teachable moment" ... i.e. to require more oversight (in the > >>>> form of a reviewed & approved BP spec) when such API changes are > >>>> proposed > >>>> in the future. > >>>> > >>>> Cheers, > >>>> Eoghan > >>>> > >>>>> Are there other concerns here? If there are no major objections to the > >>>>> code that's already merged, then #2 might be excessive but we'd still > >>>>> need docs changes. > >>>>> > >>>>> I've already put this on the nova meeting agenda for tomorrow. > >>>>> > >>>>> [1] https://bugs.launchpad.net/ceilometer/+bug/1328694 > >>>>> [2] https://bugs.launchpad.net/nova/+bug/1262124 > >>>>> [3] https://review.openstack.org/#/c/81429/ > >>>>> [4] https://review.openstack.org/#/c/83660/ > >>>>> [5] https://review.openstack.org/#/c/83676/ > >>>>> > >>>>> -- > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Matt Riedemann > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> OpenStack-dev mailing list > >>>>> [email protected] > >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>>>> > >>>> > >>>> _______________________________________________ > >>>> OpenStack-dev mailing list > >>>> [email protected] > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>>> > >>> > >>> While there is precedent for --all-tenants with some of the other APIs, > >>> I'm concerned about where this stops. When ceilometer wants polling on > >>> some > >>> other resources that the nova API exposes, will it need the same thing? > >>> Doing all of this polling for resources in all tenants in nova puts an > >>> undue > >>> burden on the nova API and the database. > >>> > >>> Can we do something with notifications here instead? That's where the > >>> nova-spec process would have probably caught this. > >> > >> ++ to notifications and not polling. > > > > Yeah, I think we need to revert this, and go through the specs > > process. Its been released in Juno-1 now, so this revert feels bad, > > but perhaps its the best of a bad situation? > > > > Word of caution, we need to get notifications versioned correctly if > > we want this as a more formal external "API". I think Heat have > > similar issues in this area, efficiently knowing about something > > happening in Nova. So we do need to "solve this". > > > > Some kind of callback to ceilometer's REST API sounds attractive, but > > messes up dependencies. Maybe implemented as a plugin to a Nova > > "notification system", could work better somehow? A versioned > > interface that issues "server life cycle" events (some versioned > > subset of notifications), and allow services to implement their own > > plugins to that interface. But maybe we should probably just version > > the current notifications, rather than re-invent another interface. > > > > Thanks, > > John > > > > _______________________________________________ > > OpenStack-dev mailing list > > [email protected] > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > The revert patch is here: > > https://review.openstack.org/#/c/99938/
As stated on gerrit ... I would disagree with jumping to revert this API flag, for the following reasons: 1. the underlying bug has a fix under review: https://review.openstack.org/83667 2. reverting has the effect of retroactively applying the policy introduced in: http://lists.openstack.org/pipermail/openstack-dev/2014-June/037536.html 3. there are existing all_tenants flags on other nova APIs 4. surely the core problem here is the *rate* at which this API was called by ceilometer in the gate, and the statndard way to protect a service in that case is to use rate-limiting, as opposed to just removing the API? Thanks, Eoghan _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
