More info....
I was doing some "stress-testing" and interestingly, the Metrics Collector 
crashed 2 times and I had to restart it (don't like a file-based HBase for the 
metrics collector, but not very confident of configuring the system to point to 
an existing HBase cluster).
Also, after this email thread, I looked up  the metrics collector logs and see 
errors like this -
METRIC_RECORD' at 
region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c., 
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, 
seqNum=24393013:09:37,619  INFO [phoenix-1-thread-349921] RpcRetryingCaller:129 
- Call exception, tries=11, retries=35, started=835564 ms ago, cancelled=false, 
msg=row 
'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL��:�kafka_broker'
 on table 'METRIC_RECORD' at 
region=METRIC_RECORD,kafkark.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa475dd45.,
 hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, 
seqNum=24393113:10:58,082  INFO [phoenix-1-thread-349920] RpcRetryingCaller:129 
- Call exception, tries=12, retries=35, started=916027 ms ago, cancelled=false, 
msg=row '' on table 'METRIC_RECORD' at 
region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c., 
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, 
seqNum=24393013:10:58,082  INFO [phoenix-1-thread-349921] RpcRetryingCaller:129 
- Call exception, tries=12, retries=35, started=916027 ms ago, cancelled=false, 
msg=row 
'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL��:�kafka_broker'
 on table 'METRIC_RECORD' at 
region=METRIC_RECORD,kafkark.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa475dd45.,
 hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, 
seqNum=24393113:10:58,112 ERROR [Thread-25] TimelineMetricAggregator:221 - 
Exception during aggregating 
metrics.org.apache.phoenix.exception.PhoenixIOException: 
org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, 
exceptions:Sat Apr 25 13:10:58 UTC 2015, null, java.net.SocketTimeoutException: 
callTimeout=900000, callDuration=938097: row '' on table 'METRIC_RECORD' at 
region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c., 
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=243930
        at 
org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:107)    
    at 
org.apache.phoenix.iterate.ParallelIterators.getIterators(ParallelIterators.java:527)
        at 
org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)
        at 
org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:63)
        at 
org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:90)
        at 
org.apache.phoenix.iterate.MergeSortTopNResultIterator.next(MergeSortTopNResultIterator.java:87)
        at 
org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739)        
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.TimelineMetricAggregator.aggregateMetricsFromResultSet(TimelineMetricAggregator.java:104)
        at 
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.TimelineMetricAggregator.aggregate(TimelineMetricAggregator.java:72)
        at 
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.doWork(AbstractTimelineAggregator.java:217)
        at 
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.runOnce(AbstractTimelineAggregator.java:94)
        at 
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.run(AbstractTimelineAggregator.java:70)


      From: Jayesh Thakrar <[email protected]>
 To: Siddharth Wagle <[email protected]>; "[email protected]" 
<[email protected]> 
 Sent: Wednesday, May 6, 2015 10:07 PM
 Subject: Re: Kafka broker metrics not appearing in REST API
   
Hi Siddharth,
Yes, I am using Ambari 2.0 with Ambari Metrics service.The interesting thing is 
that I got them for some time and not anymore.And I also know that the metrics 
are being collected since i can see them on the dashboard.Any pointer for 
troubleshooting?
And btw, it would be nice to have a count of messages received and not a 
computed metric count / min.TSDB does a good job of giving me cumulative and 
rate-per-sec graphs and numbers.
Thanks in advance,Jayesh
 

     From: Siddharth Wagle <[email protected]>
 To: "[email protected]" <[email protected]>; Jayesh Thakrar 
<[email protected]> 
 Sent: Wednesday, May 6, 2015 10:03 PM
 Subject: Re: Kafka broker metrics not appearing in REST API
   
#yiv1951040747 --P{margin-top:0;margin-bottom:0;}#yiv1951040747 Hi Jayesh,
Are you using Ambari 2.0 with Ambari Metrics service?
BR,Sid


From: Jayesh Thakrar <[email protected]>
Sent: Wednesday, May 06, 2015 7:53 PM
To: [email protected]
Subject: Kafka broker metrics not appearing in REST API Hi,
I have installed 2 clusters with Ambari and Storm and Kafka.After the install, 
I was able to get metrics for both Storm and Kafka via REST API.This worked 
fine for a week, but since the past 2 days, I have not been getting Kafka 
metrics.
I need the metrics to push to an OpenTSDB cluster.I do get host metrics and 
Nimbus metrics but not KAFKA_BROKER metrics.

I did have maintenance turned on for some time, but maintenance is turned off 
now.
[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 
'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=metrics'{
  "href" : 
"http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=metrics";,
  "ServiceComponentInfo" : {    "cluster_name" : "ord_flume_kafka_prod",    
"component_name" : "NIMBUS",    "service_name" : "STORM"  },  "metrics" : {    
"storm" : {      "nimbus" : {        "freeslots" : 54.0,        "supervisors" : 
27.0,        "topologies" : 0.0,        "totalexecutors" : 0.0,        
"totalslots" : 54.0,        "totaltasks" : 0.0,        "usedslots" : 0.0      } 
   }  }}
[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 
'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields=metrics'{
  "href" : 
"http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields=metrics";,
  "ServiceComponentInfo" : {    "cluster_name" : "ord_flume_kafka_prod",    
"component_name" : "KAFKA_BROKER",    "service_name" : "KAFKA"  }}
[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 
'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=metrics'{
  "href" : 
"http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=metrics";,
  "ServiceComponentInfo" : {    "cluster_name" : "ord_flume_kafka_prod",    
"component_name" : "SUPERVISOR",    "service_name" : "STORM"  }


   

  

Reply via email to