Matthew Booth wrote:
On Tue, Nov 10, 2015 at 6:46 PM, Joshua Harlow <[email protected]
<mailto:[email protected]>> wrote:
Matthew Booth wrote:
My patch to MessageHandlingServer is currently being reverted
because it
broke Nova tests:
https://review.openstack.org/#/c/235347/
Specifically it causes a number of tests to take a very long time to
execute, which ultimately results in the total build time limit
being
exceeded. This is very easy to replicate. The
test
nova.tests.functional.test_server_group.ServerGroupTest.test_boot_servers_with_affinity
is an example test which will always hit this issue. The problem is
that ServerGroupTest.setUp() does:
self.compute2 = self.start_service('compute',
host='host2')
self.addCleanup(self.compute2.kill)
The problem with this is that start_service() adds a fixture
which also
adds kill as a cleanup method. kill does stop(), wait(). This
means that
the resulting call order is: start, stop, wait, stop, wait. The
redundant call to kill is obviously a wart, but I feel we should
have
handled it anyway.
The problem is that we decided it should be possible to restart a
server. There are some unit tests in oslo.messaging that do
this. It's
not clear to me that there are any projects which do this, but after
this experience I feel like it would be good to check before
changing it :)
The implication of that is that after wait() the state wraps,
and we're
now waiting on start() again. Consequently, when the second
cleanup call
hangs.
We could fix Nova (at least the usage we have seen) by removing the
wrapping. After wait() if you want to start a server again you
need to
create a new one.
So, to be specific, lets consider the following 2 call sequences:
1. start stop wait stop wait
2. start stop wait start stop wait
What should they do? The behaviours with and without wrapping are:
1. start stop wait stop wait
WRAP: start stop wait HANG HANG
NO WRAP: start stop wait NO-OP NO-OP
2. start stop wait start stop wait
WRAP: start stop wait start stop wait
NO WRAP: start stop wait NO-OP NO-OP NO-OP
I'll refresh my memory on what they did before my change in the
morning.
Perhaps it might be simpler to codify the current behaviour, but
iirc I
proposed this because it was previously undefined due to races.
I personally prefer not allowing restarting, its needless code
complexity imho and a feature that people imho probably aren't using
anyway (just create a new server object if u are doing this), so I'd
be fine with doing the above NO WRAP and turning those into NO-OPs
(and for example raising a runtime error in the case of start stop
wait start ... to denote that restarting isn't
recommended/possible). If we have a strong enough reason to really
to start stop wait start ...
I might be convinced the code complexity is worth it but for now I'm
not convinced...
I agree, and in the hopefully unlikely event that we did break anybody,
at least they would get an obvious exception rather than a hang. A
lesson from breaking nova was that the log messages were generated and
were available in the failed test runs, but nobody noticed them.
Incidentally, I think I'd also merge my second patch into the first
before resubmitting, which adds timeouts and the option not to log.
+1
Makes sense to me.
IMHO we can remove the log output later if we determine it's to noisy
for folks (and/or not helping)...
Matt
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev