Hi, I published this issue on the Hortonworks forum a while ago but didn't get any answer: https://community.hortonworks.com/questions/110609/ambari-agent-251-deadlock.html Hopefully somebody in this list can advise
The issue is that AMBARI-20070 fixes a potential concurrency issue (I never experienced) but in turn it creates a thread deadlock in the agents. If I run 2.5.1 agent, in a matter of minutes agents start becoming unresponsive (yellow icon in Ambari), before a day goes by all agents are marked as "unknown" and need to be restarted. A thread dump reveals that all working threads are waiting for a lock introduced in the fix, which is never released. I manually commented out the line fix_subprocess_popen() in /ambari-agent/src/main/python/ambari_agent/main.py and thanks to that I have been running 2.5.1 on development environments for months without any issues. I'm surprised nobody has seen this. So far I have only tested it on VMs, so that might be a factor. Thanks, Gonzalo
