Hi Rui Gao,
We have tested 2.2.2 with 1600 node cluster and 1 node AMS. The AMS was on a dedicated host with a local datanode. 32 GB RAM for the RegionServer. 16 GB for Collector and 1 GB Master. The local datanode writes to several disks and allows reads to scale with short circuit hdfs read enabled. 700 hosts should work with AMS just fine with proper configuration. Note Issue 1 here: https://cwiki.apache.org/confluence/display/AMBARI/Known+Issues Additionally make sure to set, ams-site ::: timeline.metrics.host.aggregator.ttl = 86400 We are actively working on AMS HA for horizontal seamless scaling. We will set the fix version n this very soon. https://issues.apache.org/jira/browse/AMBARI-15901 BR, Sid ________________________________ From: Rui Gao <[email protected]> Sent: Wednesday, September 07, 2016 7:36 PM To: [email protected] Subject: Re: About Ganglia configuration in hadoop-metrics2.properties.j2 Hi Sid, Thank you very much for your email. With Ambari Version 2.2.2, We tried to use AMS for a cluster with about 700 nodes, but ambary webUI became very heavy, makes the operations(like datanode restart) hard to implement, so we use ganglia for monitoring instead. I saw that Ambari 2.4 was released about 2 weeks ago, does the new version made some improvement? And for the NN ports related settings: The configuration in hdfs-site (hostname and port are replaced with fake value) is: ``````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````` <property> <name>dfs.namenode.rpc-address.clusterName.nn1</name> <value>nn1.hostname:9920</value> </property> <property> <name>dfs.namenode.rpc-address.clusterName.nn2</name> <value>nn2.hostname:9920</value> </property> -- <property> <name>dfs.namenode.servicerpc-address.clusterName.nn1</name> <value>nn1.hostname:9922</value> </property> <property> <name>dfs.namenode.servicerpc-address.clusterName.nn2</name> <value>nn2.hostname:9922</value> </property> ``````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````` For hadoop-metrics2.properties: ``````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````` # Namenode rpc ports customization(AMS) namenode.sink.timeline.metric.rpc.client.port=9920 namenode.sink.timeline.metric.rpc.datanode.port=9922 # Namenode rpc ports customization (ganglia) namenode.sink.ganglia.metric.rpc.client.port=9920 namenode.sink.ganglia.metric.rpc.datanode.port=9922 ``````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````` AMS works well, but ganglia do not work, I have check the rrd file names in /var/lib/ganglia/rrds , only get “rpcdetailed.rpcdetailed.CompleteAvgTime.rrd”. I think maybe the ganglia related configuration in hadoop-metrics2.properties is not right. If the configuration was right, we could see files like “rpcdetailed.rpcdetailed.9920.CompleteAvgTime.rrd” and “rpcdetailed.rpcdetailed.9922.CompleteAvgTime.rrd” according to your experience. Do you have any ideas about the right configuration for ganglia in hadoop-metrics2.properties ? Best regards, Rui From: Siddharth Wagle <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Thursday, September 8, 2016 at 02:59 To: "[email protected]" <[email protected]> Subject: Re: About Ganglia configuration in hadoop-metrics2.properties.j2 Note: AMS sink adds the client/service to the metric name, as far as I remember Ganglia does not handle it this way. Ganglia add port number to the metric name instead. So these metrics will not show up with Ganglia metrics. You can try to find out what the name of the metric is by look at the rrd file names in /var/lib/ganglia/rrds Then use UI or API to change the widget definition to pull that metric. BR, Sid ________________________________ From: Siddharth Wagle <[email protected]> Sent: Wednesday, September 07, 2016 9:10 AM To: [email protected] Subject: Re: About Ganglia configuration in hadoop-metrics2.properties.j2 Hi Rui Gao, Ambari no longer support Ganglia in version 2.1.0. Instead Ambari Metrics Service provides better capability to scale with regards to growing cluster size as well as customized dashboard based visualizations using Grafana. Regarding you question regarding the ports: Is the NN configured to listen on different ports for clients and datanodes? hdfs-site :: dfs.namenode.servicerpc-address Again these settings that you have below: rpcdetailed.rpcdetailed.client.CompleteAvgTime and rpcdetailed.rpcdetailed.datanode.CompleteAvgTime are verified to work with AMS backend in 2.2.2 and not Ganglia. https://cwiki.apache.org/confluence/display/AMBARI/Metrics BR, Sid ________________________________ From: Rui Gao <[email protected]> Sent: Wednesday, September 07, 2016 1:13 AM To: [email protected] Subject: About Ganglia configuration in hadoop-metrics2.properties.j2 Hello everyone, Nice to meet you all. I’m Gao, an engineer working in Japan. I’m using Ambari to deploy Hadoop clusters, and using ganglia to monitor clusters. In a cluster using Ambari Version 2.2.2.0, thanks to hadoop-metrics2.properties.j2, we could got following configurations in hadoop-metrics2.properties of namenodes: #Namenode rpc ports customization namenode.sink.timeline.metric.rpc.client.port=${rpc_port} namenode.sink.timeline.metric.rpc.datanode.port=${serviceRPC_port} These configurations help us to monitor the ports respectively in Ambari WebUI as two groups: rpcdetailed.rpcdetailed.client.CompleteAvgTime and rpcdetailed.rpcdetailed.datanode.CompleteAvgTime . I want to monitor these ports separately in ganglia, so we add #Namenode rpc ports customization namenode.sink.timeline.metric.rpc.client.port=${rpc_port} namenode.sink.timeline.metric.rpc.datanode.port=${serviceRPC_port} #Namenode rpc ports customization(ganglia) namenode.sink.ganglia.metric.rpc.client.port=${rpc_port} namenode.sink.ganglia.metric.rpc.datanode.port=${serviceRPC_port} in the testing cluster, which using Ambari Version 2.2.1.1, but we still can only get the metrics of one port. And can only get rpcdetailed.rpcdetailed.CompleteAvgTime, not “rpcdetailed.rpcdetailed.client.CompleteAvgTime and rpcdetailed.rpcdetailed.datanode.CompleteAvgTime” respectively. I wonder to ask that is there anyone knows how to get separated ports metrics for ganglia by editing ambari Hadoop-metrics2.properties.j2 file? And, is separated port monitoring can only be possible in Ambari Version 2.2.2.0 and newer versions, not supported in Ambari Version 2.2.1.1? Or, just I made mistakes about the configuration? Looking forward to your reply! Thank you very much! Best regards, Gao
