----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31423/#review74620 -----------------------------------------------------------
Ship it! Thought this over- the benefits to overall reliability are significant and applications should still be motivated to be resilient to the edge case of 'ZK registration but application unresponsive'. Thanks Steve! src/test/python/apache/aurora/executor/test_thermos_executor.py <https://reviews.apache.org/r/31423/#comment121243> good idea - Joe Smith On Feb. 26, 2015, 7:20 a.m., Steve Niemitz wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/31423/ > ----------------------------------------------------------- > > (Updated Feb. 26, 2015, 7:20 a.m.) > > > Review request for Aurora, Brian Wickman and Zameer Manji. > > > Repository: aurora > > > Description > ------- > > Stop the announcer and status checkers before starting to kill the runners. > > This allows the task to be removed from the ZK ensemble before it begins > getting killed. The delay can be significant if the task takes some time to > shutdown, and during the time it stops responding to requests. > > > Diffs > ----- > > src/main/python/apache/aurora/executor/aurora_executor.py > 9c0282392dbb9cca308baf47adc1750c1f5cacc6 > src/test/python/apache/aurora/executor/BUILD > 2ee9b1233e9db47455ddccccffbc48691d379222 > src/test/python/apache/aurora/executor/test_thermos_executor.py > 8dbfb1db5eb7a6548820ff7cf82a9c7092f61d28 > > Diff: https://reviews.apache.org/r/31423/diff/ > > > Testing > ------- > > We're now running this in our production environments. Watching ZK, I can > confirm that the nodes are removed before process shutdown begins. Watching > the executor log also confirms this. > > I couldn't observe any other side effects either. > > > Thanks, > > Steve Niemitz > >