On Wed, Jul 23, 2014 at 7:33 PM, Tim Simpson <[email protected]> wrote:
> To summarize, this is a conversation about the following LaunchPad bug: > https://launchpad.net/bugs/1325512 > and Gerrit review: https://review.openstack.org/#/c/97194/6 > > You are saying the function "_service_is_active" in addition to polling > the datastore service status also polls the status of the Nova resource. At > first I thought this wasn't the case, however looking at your pull request > I was surprised to see on line 320 ( > https://review.openstack.org/#/c/97194/6/trove/taskmanager/models.py) > polls Nova using the "get" method (which I wish was called "refresh" as to > me it sounds like a lazy-loader or something despite making a full GET > request each time). > So moving this polling out of there into the two respective > "create_server" methods as you have done is not only going to be useful for > Heat and avoid the issue of calling Nova 99 times you describe but it will > actually help operations teams to see more clearly that the issue was with > a server that didn't provision. We actually had an issue in Staging the > other day that took us forever to figure out because the > Agreed, i guess i would need to update bug-report to add more info about given issue, but i'm really glad to hear that proposed change would be useful. And i agree, that from operation/support team would be useful to track provisioning issues that has nothing common with Trove but tight to infrastructure. > server wasn't provisioning, but before anything checked that it was ACTIVE > the DNS code detected the server had no ip address (never mind it was in a > FAILED state) so the logs surfaced this as a DNS error. This change should > help us avoid such issues. > > Thanks, > > Tim > > > ------------------------------ > *From:* Denis Makogon [[email protected]] > *Sent:* Wednesday, July 23, 2014 7:30 AM > *To:* OpenStack Development Mailing List > *Subject:* [openstack-dev] [Trove] Guest prepare call polling mechanism > issue > > Hello, Stackers. > > > I’d like to discuss guestagent prepare call polling mechanism issue (see > [1]). > > Let me first describe why this is actually an issue and why it should be > fixed. For those of you who is familiar with Trove knows that Trove can > provision instances through Nova API and Heat API (see [2] and see [3]). > > > > What’s the difference between this two ways (in general)? The answer > is simple: > > - Heat-based provisioning method has polling mechanism that verifies that > stack provisioning was completed with successful state (see [4]) which > means that all stack resources are in ACTIVE state. > > - Nova-based provisioning method doesn’t do any polling (which is wrong, > since instance can’t fail as fast as possible because Trove-taskmanager > service doesn’t verify that launched server had reached ACTIVE state. > That’s the issue #1 - compute instance state is unknown, but right after > resources (deliverd by heat) already in ACTIVE states. > > Once one method [2] or [3] finished, taskmanager trying to prepare data > for guest (see [5]) and then it tries to send prepare call to guest (see > [6]). Here comes issue #2 - polling mechanism does at least 100 API calls > to Nova to define compute instance status. > > Also taskmanager does almost the same amount of calls to Trove backend to > discover guest status which is totally normal. > > So, here comes the question, why should i call 99 times Nova for > the same value if the value asked for the first time was completely > acceptable? > > > > There’s only one way to fix it. Since heat-based provisioning > delivers instance with status validation procedure, the same thing should > be done for nova-base provisioning (we should extract compute instance > status polling from guest prepare polling mechanism and integrate it into > [2]) and leave only guest status discovering in guest prepare polling > mechanism. > > > > > Benefits? Proposed fix will give an ability for fast-failing for > corrupted instances, it would reduce amount of redundant Nova API calls > while attempting to discover guest status. > > > Proposed fix for this issue - [7]. > > [1] - https://launchpad.net/bugs/1325512 > > [2] - > https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L198-L215 > > [3] - > https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L190-L197 > > [4] - > https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L420-L429 > > [5] - > https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L217-L256 > > [6] - > https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L254-L266 > > [7] - https://review.openstack.org/#/c/97194/ > > > Thoughts? > > Best regards, > > Denis Makogon > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
_______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
