On Tue, Nov 10, 2015 at 6:46 PM, Joshua Harlow <[email protected]> wrote:
> Matthew Booth wrote: > >> My patch to MessageHandlingServer is currently being reverted because it >> broke Nova tests: >> >> https://review.openstack.org/#/c/235347/ >> >> Specifically it causes a number of tests to take a very long time to >> execute, which ultimately results in the total build time limit being >> exceeded. This is very easy to replicate. The >> test >> nova.tests.functional.test_server_group.ServerGroupTest.test_boot_servers_with_affinity >> is an example test which will always hit this issue. The problem is >> that ServerGroupTest.setUp() does: >> >> self.compute2 = self.start_service('compute', host='host2') >> self.addCleanup(self.compute2.kill) >> >> The problem with this is that start_service() adds a fixture which also >> adds kill as a cleanup method. kill does stop(), wait(). This means that >> the resulting call order is: start, stop, wait, stop, wait. The >> redundant call to kill is obviously a wart, but I feel we should have >> handled it anyway. >> >> The problem is that we decided it should be possible to restart a >> server. There are some unit tests in oslo.messaging that do this. It's >> not clear to me that there are any projects which do this, but after >> this experience I feel like it would be good to check before changing it >> :) >> >> The implication of that is that after wait() the state wraps, and we're >> now waiting on start() again. Consequently, when the second cleanup call >> hangs. >> >> We could fix Nova (at least the usage we have seen) by removing the >> wrapping. After wait() if you want to start a server again you need to >> create a new one. >> >> So, to be specific, lets consider the following 2 call sequences: >> >> 1. start stop wait stop wait >> 2. start stop wait start stop wait >> >> What should they do? The behaviours with and without wrapping are: >> >> 1. start stop wait stop wait >> WRAP: start stop wait HANG HANG >> NO WRAP: start stop wait NO-OP NO-OP >> >> 2. start stop wait start stop wait >> WRAP: start stop wait start stop wait >> NO WRAP: start stop wait NO-OP NO-OP NO-OP >> >> I'll refresh my memory on what they did before my change in the morning. >> Perhaps it might be simpler to codify the current behaviour, but iirc I >> proposed this because it was previously undefined due to races. >> > > I personally prefer not allowing restarting, its needless code complexity > imho and a feature that people imho probably aren't using anyway (just > create a new server object if u are doing this), so I'd be fine with doing > the above NO WRAP and turning those into NO-OPs (and for example raising a > runtime error in the case of start stop wait start ... to denote that > restarting isn't recommended/possible). If we have a strong enough reason > to really to start stop wait start ... > > I might be convinced the code complexity is worth it but for now I'm not > convinced... > I agree, and in the hopefully unlikely event that we did break anybody, at least they would get an obvious exception rather than a hang. A lesson from breaking nova was that the log messages were generated and were available in the failed test runs, but nobody noticed them. Incidentally, I think I'd also merge my second patch into the first before resubmitting, which adds timeouts and the option not to log. Matt
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
