Andrew, Do you want to go ahead and file a bug in JIRA?
I’m just speculating, but this might be related? https://issues.apache.org/jira/browse/AMBARI-11349 We saw logs growing geometrically, and I think the change was to simply suppress log messages by changing the log level. So I’m wondering if we have some memory leakage because of that. I could be totally wrong. Yusaku On 9/29/15, 12:34 PM, "Andrew Robertson" <[email protected]> wrote: >In Ambari 2.1.1, the ambari-agent on two of my hosts occasionally >quietly dies without any messages going to the log file or to stdout. >I've also noticed that the memory usage in ambari_agent seems to creep >up over time, and I suspect the crashes are related to this. Here's >the snapshot from ps aux a few hours before the ambari agent process >died quietly: > >$ ps aux | grep ambari_agent >root 3759 25.8 36.2 27152176 23872968 ? Sl Sep15 4708:55 >/usr/bin/python2.6 >/usr/lib/python2.6/site-packages/ambari_agent/main.py start > >(ambari_agent was at 25% cpu usage, 27GB of memory). > >This happens to be only affecting 2 hosts that I have; each have a >number of master services (mostly Namenode, ResourceManager, >HiveServer2). On my other machine with the same set of master >services, ambari_agent was restarted a few days ago and is already up >to 8gb of memory. On my machines without the master services - just >datanodes / nodemanagers / etc - ambari is using ~1.7gb of memory >(VSZ) and has been stable since I last upgraded Ambari in late August. > >I don't recall if this was happening in 2.1.0, or if it started in >2.1.1. I didn't have 2.1.0 deployed for very long. It wasn't >happening in 2.0 - though I've also deployed Kerberos since then. > >Is this a known issue or has anyone else seen this? >
