In Ambari 2.1.1, the ambari-agent on two of my hosts occasionally quietly dies without any messages going to the log file or to stdout. I've also noticed that the memory usage in ambari_agent seems to creep up over time, and I suspect the crashes are related to this. Here's the snapshot from ps aux a few hours before the ambari agent process died quietly:
$ ps aux | grep ambari_agent root 3759 25.8 36.2 27152176 23872968 ? Sl Sep15 4708:55 /usr/bin/python2.6 /usr/lib/python2.6/site-packages/ambari_agent/main.py start (ambari_agent was at 25% cpu usage, 27GB of memory). This happens to be only affecting 2 hosts that I have; each have a number of master services (mostly Namenode, ResourceManager, HiveServer2). On my other machine with the same set of master services, ambari_agent was restarted a few days ago and is already up to 8gb of memory. On my machines without the master services - just datanodes / nodemanagers / etc - ambari is using ~1.7gb of memory (VSZ) and has been stable since I last upgraded Ambari in late August. I don't recall if this was happening in 2.1.0, or if it started in 2.1.1. I didn't have 2.1.0 deployed for very long. It wasn't happening in 2.0 - though I've also deployed Kerberos since then. Is this a known issue or has anyone else seen this?
