-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53263/
-----------------------------------------------------------
Review request for Ambari and Vitalyi Brodetskyi.
Bugs: AMBARI-18728
https://issues.apache.org/jira/browse/AMBARI-18728
Repository: ambari
Description
-------
This was caused by a very tricky race-condition in the way python
multiprocessing.thread works resulting in deadlock in ambari_agent.ActionQueue
thread.
The problem is the below flow:
If this all these three get executed at the same time (a very rear occasion):
1. Process1 executes queue.get(False)
2. Process2 executes queue.put(largeObjectWhichTakesLongTimeToPut)
3. Someone kills Process2.
This results in deadlock in process1 get. Which is caused by queue
locks/semaphores to being released during put of process2.
I have wrote a script test_race_condition.py to emulate this behaviour and
indeed could reproduce this and test the fix for it.
Diffs
-----
ambari-agent/src/main/python/ambari_agent/ActionQueue.py bf840e2
ambari-agent/src/main/python/ambari_agent/StatusCommandsExecutor.py 20acee4
Diff: https://reviews.apache.org/r/53263/diff/
Testing
-------
mvn clean test
Thanks,
Andrew Onischuk