Re: [openstack-dev] [Trove] Guest prepare call polling mechanism issue

2014-07-23 Thread Denis Makogon
On Thu, Jul 24, 2014 at 1:01 AM, Nikhil Manchanda 
wrote:

>
> Tim Simpson writes:
>
> > To summarize, this is a conversation about the following LaunchPad
> > bug: https://launchpad.net/bugs/1325512
> > and Gerrit review: https://review.openstack.org/#/c/97194/6
> >
> > You are saying the function "_service_is_active" in addition to
> > polling the datastore service status also polls the status of the Nova
> > resource. At first I thought this wasn't the case, however looking at
> > your pull request I was surprised to see on line 320
> > (https://review.openstack.org/#/c/97194/6/trove/taskmanager/models.py)
> > polls Nova using the "get" method (which I wish was called "refresh"
> > as to me it sounds like a lazy-loader or something despite making a
> > full GET request each time).  So moving this polling out of there into
> > the two respective "create_server" methods as you have done is not
> > only going to be useful for Heat and avoid the issue of calling Nova
> > 99 times you describe but it will actually help operations teams to
> > see more clearly that the issue was with a server that didn't
> > provision. We actually had an issue in Staging the other day that took
> > us forever to figure out because the server wasn't provisioning, but
> > before anything checked that it was ACTIVE the DNS code detected the
> > server had no ip address (never mind it was in a FAILED state) so the
> > logs surfaced this as a DNS error. This change should help us avoid
> > such issues.
> >
>
> Thanks for bringing this up, Tim / Denis.
>
> As Tim mentions, it does look like the '_service_is_active' call in
> the taskmanager also polls Nova to check whether the instance is in
> ERROR, causing some unnecessary, extra polling while figuring out the
> state of the Trove instance.
>
> Given this, it does seem reasonable to split up the polling into two
> separate methods, in a manner similar to what [1] is trying to
> accomplish. However, [1] does seems a bit rough around the edges, and
> needs a bit of cleaning up -- and I've commented on the review to this
> effect.
>
>
Of course, all comments are reasonable. Will send patchset soon.

Thanks,
Denis


> [1] https://review.openstack.org/#/c/97194
>
> Hope this helps,
>
> Thanks,
> Nikhil
>
> >
> > [...]
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Trove] Guest prepare call polling mechanism issue

2014-07-23 Thread Nikhil Manchanda

Tim Simpson writes:

> To summarize, this is a conversation about the following LaunchPad
> bug: https://launchpad.net/bugs/1325512
> and Gerrit review: https://review.openstack.org/#/c/97194/6
>
> You are saying the function "_service_is_active" in addition to
> polling the datastore service status also polls the status of the Nova
> resource. At first I thought this wasn't the case, however looking at
> your pull request I was surprised to see on line 320
> (https://review.openstack.org/#/c/97194/6/trove/taskmanager/models.py)
> polls Nova using the "get" method (which I wish was called "refresh"
> as to me it sounds like a lazy-loader or something despite making a
> full GET request each time).  So moving this polling out of there into
> the two respective "create_server" methods as you have done is not
> only going to be useful for Heat and avoid the issue of calling Nova
> 99 times you describe but it will actually help operations teams to
> see more clearly that the issue was with a server that didn't
> provision. We actually had an issue in Staging the other day that took
> us forever to figure out because the server wasn't provisioning, but
> before anything checked that it was ACTIVE the DNS code detected the
> server had no ip address (never mind it was in a FAILED state) so the
> logs surfaced this as a DNS error. This change should help us avoid
> such issues.
>

Thanks for bringing this up, Tim / Denis.

As Tim mentions, it does look like the '_service_is_active' call in
the taskmanager also polls Nova to check whether the instance is in
ERROR, causing some unnecessary, extra polling while figuring out the
state of the Trove instance.

Given this, it does seem reasonable to split up the polling into two
separate methods, in a manner similar to what [1] is trying to
accomplish. However, [1] does seems a bit rough around the edges, and
needs a bit of cleaning up -- and I've commented on the review to this
effect.

[1] https://review.openstack.org/#/c/97194

Hope this helps,

Thanks,
Nikhil

>
> [...]

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Trove] Guest prepare call polling mechanism issue

2014-07-23 Thread Denis Makogon
On Wed, Jul 23, 2014 at 7:33 PM, Tim Simpson 
wrote:

>  To summarize, this is a conversation about the following LaunchPad bug:
> https://launchpad.net/bugs/1325512
> and Gerrit review: https://review.openstack.org/#/c/97194/6
>
>  You are saying the function "_service_is_active" in addition to polling
> the datastore service status also polls the status of the Nova resource. At
> first I thought this wasn't the case, however looking at your pull request
> I was surprised to see on line 320 (
> https://review.openstack.org/#/c/97194/6/trove/taskmanager/models.py)
> polls Nova using the "get" method (which I wish was called "refresh" as to
> me it sounds like a lazy-loader or something despite making a full GET
> request each time).
> So moving this polling out of there into the two respective
> "create_server" methods as you have done is not only going to be useful for
> Heat and avoid the issue of calling Nova 99 times you describe but it will
> actually help operations teams to see more clearly that the issue was with
> a server that didn't provision. We actually had an issue in Staging the
> other day that took us forever to figure out because the
>

Agreed, i guess i would need to update bug-report to add more info about
given issue, but i'm really glad to hear that proposed change would be
useful. And i agree, that from operation/support team would be useful to
track provisioning issues that has nothing common with Trove but tight to
infrastructure.


> server wasn't provisioning, but before anything checked that it was ACTIVE
> the DNS code detected the server had no ip address (never mind it was in a
> FAILED state) so the logs surfaced this as a DNS error. This change should
> help us avoid such issues.
>
>  Thanks,
>
>  Tim
>
>
>  --
> *From:* Denis Makogon [dmako...@mirantis.com]
> *Sent:* Wednesday, July 23, 2014 7:30 AM
> *To:* OpenStack Development Mailing List
> *Subject:* [openstack-dev] [Trove] Guest prepare call polling mechanism
> issue
>
>Hello, Stackers.
>
>
>  I’d like to discuss guestagent prepare call polling mechanism issue (see
> [1]).
>
>  Let me first describe why this is actually an issue and why it should be
> fixed. For those of you who is familiar with Trove knows that Trove can
> provision instances through Nova API and Heat API (see [2] and see [3]).
>
>
>
> What’s the difference between this two ways (in general)? The answer
> is simple:
>
> - Heat-based provisioning method has polling mechanism that verifies that
> stack provisioning was completed with successful state (see [4]) which
> means that all stack resources are in ACTIVE state.
>
> - Nova-based provisioning method doesn’t do any polling (which is wrong,
> since instance can’t fail as fast as possible because Trove-taskmanager
> service doesn’t verify that launched server had reached ACTIVE state.
> That’s the issue #1 - compute instance state is unknown, but right after
> resources (deliverd by heat) already in ACTIVE states.
>
>  Once one method [2] or [3] finished, taskmanager trying to prepare data
> for guest (see [5]) and then it tries to send prepare call to guest (see
> [6]). Here comes issue #2 - polling mechanism does at least 100 API calls
> to Nova to define compute instance status.
>
> Also taskmanager does almost the same amount of calls to Trove backend to
> discover guest status which is totally normal.
>
>  So, here comes the question,  why should i call 99 times Nova for
> the same value if the value asked for the first time was completely
> acceptable?
>
>
>
> There’s only one way to fix it. Since heat-based provisioning
> delivers instance with status validation procedure, the same thing should
> be done for nova-base provisioning (we should extract compute instance
> status polling from guest prepare polling mechanism and integrate it into
> [2]) and leave only guest status discovering in guest prepare polling
> mechanism.
>
>
>
>
>  Benefits? Proposed fix will give an ability for fast-failing for
> corrupted instances, it would reduce amount of redundant Nova API calls
> while attempting to discover guest status.
>
>
>  Proposed fix for this issue - [7].
>
>  [1] - https://launchpad.net/bugs/1325512
>
> [2] -
> https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L198-L215
>
> [3] -
> https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L190-L197
>
> [4] -
> https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L420-L429
>
> [5] -
> https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L217-L

Re: [openstack-dev] [Trove] Guest prepare call polling mechanism issue

2014-07-23 Thread Tim Simpson
To summarize, this is a conversation about the following LaunchPad bug: 
https://launchpad.net/bugs/1325512
and Gerrit review: https://review.openstack.org/#/c/97194/6

You are saying the function "_service_is_active" in addition to polling the 
datastore service status also polls the status of the Nova resource. At first I 
thought this wasn't the case, however looking at your pull request I was 
surprised to see on line 320 
(https://review.openstack.org/#/c/97194/6/trove/taskmanager/models.py) polls 
Nova using the "get" method (which I wish was called "refresh" as to me it 
sounds like a lazy-loader or something despite making a full GET request each 
time).
So moving this polling out of there into the two respective "create_server" 
methods as you have done is not only going to be useful for Heat and avoid the 
issue of calling Nova 99 times you describe but it will actually help 
operations teams to see more clearly that the issue was with a server that 
didn't provision. We actually had an issue in Staging the other day that took 
us forever to figure out because the server wasn't provisioning, but before 
anything checked that it was ACTIVE the DNS code detected the server had no ip 
address (never mind it was in a FAILED state) so the logs surfaced this as a 
DNS error. This change should help us avoid such issues.

Thanks,

Tim



From: Denis Makogon [dmako...@mirantis.com]
Sent: Wednesday, July 23, 2014 7:30 AM
To: OpenStack Development Mailing List
Subject: [openstack-dev] [Trove] Guest prepare call polling mechanism issue


Hello, Stackers.


I’d like to discuss guestagent prepare call polling mechanism issue (see [1]).


Let me first describe why this is actually an issue and why it should be fixed. 
For those of you who is familiar with Trove knows that Trove can provision 
instances through Nova API and Heat API (see [2] and see [3]).



What’s the difference between this two ways (in general)? The answer is 
simple:

- Heat-based provisioning method has polling mechanism that verifies that stack 
provisioning was completed with successful state (see [4]) which means that all 
stack resources are in ACTIVE state.

- Nova-based provisioning method doesn’t do any polling (which is wrong, since 
instance can’t fail as fast as possible because Trove-taskmanager service 
doesn’t verify that launched server had reached ACTIVE state. That’s the issue 
#1 - compute instance state is unknown, but right after resources (deliverd by 
heat) already in ACTIVE states.


Once one method [2] or [3] finished, taskmanager trying to prepare data for 
guest (see [5]) and then it tries to send prepare call to guest (see [6]). Here 
comes issue #2 - polling mechanism does at least 100 API calls to Nova to 
define compute instance status.

Also taskmanager does almost the same amount of calls to Trove backend to 
discover guest status which is totally normal.


So, here comes the question,  why should i call 99 times Nova for the same 
value if the value asked for the first time was completely acceptable?



There’s only one way to fix it. Since heat-based provisioning delivers 
instance with status validation procedure, the same thing should be done for 
nova-base provisioning (we should extract compute instance status polling from 
guest prepare polling mechanism and integrate it into [2]) and leave only guest 
status discovering in guest prepare polling mechanism.





Benefits? Proposed fix will give an ability for fast-failing for corrupted 
instances, it would reduce amount of redundant Nova API calls while attempting 
to discover guest status.



Proposed fix for this issue - [7].


[1] - https://launchpad.net/bugs/1325512

[2] - 
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L198-L215

[3] - 
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L190-L197

[4] - 
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L420-L429

[5] - 
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L217-L256

[6] - 
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L254-L266

[7] - https://review.openstack.org/#/c/97194/



Thoughts?


Best regards,

Denis Makogon
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Trove] Guest prepare call polling mechanism issue

2014-07-23 Thread Denis Makogon
Hello, Stackers.


I’d like to discuss guestagent prepare call polling mechanism issue (see
[1]).

Let me first describe why this is actually an issue and why it should be
fixed. For those of you who is familiar with Trove knows that Trove can
provision instances through Nova API and Heat API (see [2] and see [3]).



What’s the difference between this two ways (in general)? The answer is
simple:

- Heat-based provisioning method has polling mechanism that verifies that
stack provisioning was completed with successful state (see [4]) which
means that all stack resources are in ACTIVE state.

- Nova-based provisioning method doesn’t do any polling (which is wrong,
since instance can’t fail as fast as possible because Trove-taskmanager
service doesn’t verify that launched server had reached ACTIVE state.
That’s the issue #1 - compute instance state is unknown, but right after
resources (deliverd by heat) already in ACTIVE states.

Once one method [2] or [3] finished, taskmanager trying to prepare data for
guest (see [5]) and then it tries to send prepare call to guest (see [6]).
Here comes issue #2 - polling mechanism does at least 100 API calls to Nova
to define compute instance status.

Also taskmanager does almost the same amount of calls to Trove backend to
discover guest status which is totally normal.

So, here comes the question,  why should i call 99 times Nova for the
same value if the value asked for the first time was completely acceptable?



There’s only one way to fix it. Since heat-based provisioning delivers
instance with status validation procedure, the same thing should be done
for nova-base provisioning (we should extract compute instance status
polling from guest prepare polling mechanism and integrate it into [2]) and
leave only guest status discovering in guest prepare polling mechanism.




Benefits? Proposed fix will give an ability for fast-failing for corrupted
instances, it would reduce amount of redundant Nova API calls while
attempting to discover guest status.


Proposed fix for this issue - [7].

[1] - https://launchpad.net/bugs/1325512

[2] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L198-L215

[3] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L190-L197

[4] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L420-L429

[5] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L217-L256

[6] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L254-L266

[7] - https://review.openstack.org/#/c/97194/


Thoughts?

Best regards,

Denis Makogon
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TROVE] Guest prepare call polling mechanism issue

2014-07-21 Thread Denis Makogon
Hello Stackers.


I’d like to discuss raised issue related to Trove-guestagent prepare call
polling mechanism issue (see [1]).

Let me first describe why this is actually an issue and why it should be
fixed. For those of you who is familiar with Trove knows that Trove can
provision instances through Nova API and Heat API (see [2] and see [3]).



What’s the difference between this two ways (in general)? The answer is
simple:

- Heat-based provisioning method has polling mechanism that verifies that
stack provisioning was completed with successful state (see [4]) which
means that all stack resources are in ACTIVE state.

- Nova-based provisioning method doesn’t do any polling (which is wrong,
since instance can’t fail as fast as possible because Trove-taskmanager
service doesn’t verify that launched server had reached ACTIVE state.
That’s the issue #1 - compute instance state is unknown, but right after
resources (deliverd by heat) already in ACTIVE states.

Once one method [2] or [3] finished, taskmanager trying to prepare data for
guest (see [5]) and then it tries to send prepare call to guest (see [6]).
Here comes issue #2 - polling mechanism does at least 100 API calls to Nova
to define compute instance status.

Also taskmanager does almost the same amount of calls to Trove backend to
discover guest status which is totally normal.

So, here comes the question,  why should i call 99 times Nova for the
same value if the value asked for the first time was completely ok?



There’s only one way to fix it. Since heat-based provisioning delivers
instance with status validation procedure, the same thing should be done
for nova-base provisioning (we should extract compute instance status
polling from guest prepare polling mechanism and integrate it into [2]) and
leave only guest status discovering in guest prepare polling mechanism.




Benefits? Proposed fix will give an ability for fast-failing for corrupted
instances, it would reduce amount of redundant Nova API calls while
attempting to discover guest status.


Proposed fix for this issue - [7].

[1] - https://launchpad.net/bugs/1325512

[2] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L198-L215

[3] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L190-L197

[4] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L420-L429

[5] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L217-L256

[6] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L254-L266

[7] - https://review.openstack.org/#/c/97194/


Thoughts?

Best regards,

Denis Makogon
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev