How big is your cluster in terms of number of nodes? You can tune settings for HBase based on cluster size.
Following are the instructions for writing metrics to HDFS instead of local FS. ams-site::: timeline.metrics.service.operation.mode = distributed ams-hbase-site::: hbase.rootdir = hdfs://<namenode-host>:8020/amshbase hbase.cluster.distributed = true -Sid ________________________________ From: Jayesh Thakrar <[email protected]> Sent: Wednesday, May 06, 2015 8:30 PM To: [email protected]; Siddharth Wagle; Jayesh Thakrar Subject: Re: Kafka broker metrics not appearing in REST API More info.... I was doing some "stress-testing" and interestingly, the Metrics Collector crashed 2 times and I had to restart it (don't like a file-based HBase for the metrics collector, but not very confident of configuring the system to point to an existing HBase cluster). Also, after this email thread, I looked up the metrics collector logs and see errors like this - METRIC_RECORD' at region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c., hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=243930 13:09:37,619 INFO [phoenix-1-thread-349921] RpcRetryingCaller:129 - Call exception, tries=11, retries=35, started=835564 ms ago, cancelled=false, msg=row 'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL��:�kafka_broker' on table 'METRIC_RECORD' at region=METRIC_RECORD,kafkark.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa475dd45., hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=243931 13:10:58,082 INFO [phoenix-1-thread-349920] RpcRetryingCaller:129 - Call exception, tries=12, retries=35, started=916027 ms ago, cancelled=false, msg=row '' on table 'METRIC_RECORD' at region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c., hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=243930 13:10:58,082 INFO [phoenix-1-thread-349921] RpcRetryingCaller:129 - Call exception, tries=12, retries=35, started=916027 ms ago, cancelled=false, msg=row 'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL��:�kafka_broker' on table 'METRIC_RECORD' at region=METRIC_RECORD,kafkark.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa475dd45., hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=243931 13:10:58,112 ERROR [Thread-25] TimelineMetricAggregator:221 - Exception during aggregating metrics. org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions: Sat Apr 25 13:10:58 UTC 2015, null, java.net.SocketTimeoutException: callTimeout=900000, callDuration=938097: row '' on table 'METRIC_RECORD' at region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c., hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=243930 at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:107) at org.apache.phoenix.iterate.ParallelIterators.getIterators(ParallelIterators.java:527) at org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48) at org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:63) at org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:90) at org.apache.phoenix.iterate.MergeSortTopNResultIterator.next(MergeSortTopNResultIterator.java:87) at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.TimelineMetricAggregator.aggregateMetricsFromResultSet(TimelineMetricAggregator.java:104) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.TimelineMetricAggregator.aggregate(TimelineMetricAggregator.java:72) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.doWork(AbstractTimelineAggregator.java:217) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.runOnce(AbstractTimelineAggregator.java:94) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.run(AbstractTimelineAggregator.java:70) ________________________________ From: Jayesh Thakrar <[email protected]> To: Siddharth Wagle <[email protected]>; "[email protected]" <[email protected]> Sent: Wednesday, May 6, 2015 10:07 PM Subject: Re: Kafka broker metrics not appearing in REST API Hi Siddharth, Yes, I am using Ambari 2.0 with Ambari Metrics service. The interesting thing is that I got them for some time and not anymore. And I also know that the metrics are being collected since i can see them on the dashboard. Any pointer for troubleshooting? And btw, it would be nice to have a count of messages received and not a computed metric count / min. TSDB does a good job of giving me cumulative and rate-per-sec graphs and numbers. Thanks in advance, Jayesh ________________________________ From: Siddharth Wagle <[email protected]> To: "[email protected]" <[email protected]>; Jayesh Thakrar <[email protected]> Sent: Wednesday, May 6, 2015 10:03 PM Subject: Re: Kafka broker metrics not appearing in REST API Hi Jayesh, Are you using Ambari 2.0 with Ambari Metrics service? BR, Sid ________________________________ From: Jayesh Thakrar <[email protected]> Sent: Wednesday, May 06, 2015 7:53 PM To: [email protected] Subject: Kafka broker metrics not appearing in REST API Hi, I have installed 2 clusters with Ambari and Storm and Kafka. After the install, I was able to get metrics for both Storm and Kafka via REST API. This worked fine for a week, but since the past 2 days, I have not been getting Kafka metrics. I need the metrics to push to an OpenTSDB cluster. I do get host metrics and Nimbus metrics but not KAFKA_BROKER metrics. I did have maintenance turned on for some time, but maintenance is turned off now. [jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=metrics' { "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=metrics", "ServiceComponentInfo" : { "cluster_name" : "ord_flume_kafka_prod", "component_name" : "NIMBUS", "service_name" : "STORM" }, "metrics" : { "storm" : { "nimbus" : { "freeslots" : 54.0, "supervisors" : 27.0, "topologies" : 0.0, "totalexecutors" : 0.0, "totalslots" : 54.0, "totaltasks" : 0.0, "usedslots" : 0.0 } } } } [jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields=metrics' { "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields=metrics", "ServiceComponentInfo" : { "cluster_name" : "ord_flume_kafka_prod", "component_name" : "KAFKA_BROKER", "service_name" : "KAFKA" } } [jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=metrics' { "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=metrics", "ServiceComponentInfo" : { "cluster_name" : "ord_flume_kafka_prod", "component_name" : "SUPERVISOR", "service_name" : "STORM" }
