Hi Harsha and Sid,
So I was able to increase the memory for the metrics-collector and for the
hbase master from 512 MB to 1500 MB.The metrics collector then restarted
without any issues (I think that may have fixed any instability I had
earlier).Immediately after the restart, I was able to get Kafka metrics a few
times - and then they disappeared again.
Can you point me to anything to the codepath/logs that is involved with
satisfying the REST API for components?I looked at the logs in
/var/log/ambari-server/ambari-server.log, but there was nothing that can help
with this issue.
Greatly appreciate your help,Thanks,Jayesh
From: Jayesh Thakrar <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Wednesday, May 6, 2015 11:49 PM
Subject: Re: Kafka broker metrics not appearing in REST API
Thanks Sid.I had trouble restarting the metrics collector.Somehow it was
complaining about not being able to connect to the HBase embedded zookeeper on
port 61181.Anyway, after a few tries I was able to bring it up.I had brought
down the whole metrics collection system - and just completed a rolling restart
of the metrics collector.
After all the above I am back to the same reproducible situation of no Kafka
metrics.
However I examined the servers on the cluster and apparently there is not
sufficient free RAM - or should I say a lot of it is used by filesystem
buffer/caching (15 GB or 30%). This is not surprising as both Flume and Kafka
are heavy on (sequential) I/O.
I will need to "resolve" this situation before I can increase memory for HBase.
But all the same, thanks for the pointers - this has give me enough things to
look into.
[root@dtord01flm01p ~]# free -m total used free
shared buffers cachedMem: 48251 47657 593 32
4 32417-/+ buffers/cache: 15235 33015Swap: 8191
0 8191
From: Siddharth Wagle <[email protected]>
To: "[email protected]" <[email protected]>;
"[email protected]" <[email protected]>
Sent: Wednesday, May 6, 2015 11:31 PM
Subject: Re: Kafka broker metrics not appearing in REST API
Is this or can this be writing to its own disk?
Can you look at the hbase regionserver web ui, in your browser key in the
http://metric-collrctor-host:61310.
This the hbase master info port.Click on the link to regionserver. Look at
Queues and the block cache stats.
Queues should be empty for majority of the time, lets say when u refresh page a
few times. If not, this points to disk io bottleneck.
Cache hit ratio as i have observed this is around 70%, more the better.
Do you have available physical memory on that box? If yes, the hbase master
heap size in ams-hbase-env and the metric collector heap size in ams-env should
be bumped up. Default is 512m in both cases.
Sid
Sent by Outlook for Android
On Wed, May 6, 2015 at 9:16 PM -0700, "Jayesh Thakrar" <[email protected]>
wrote:
Here's the embedded HBase data dir size.
[jthakrar@dtord01flm03p data]$
pwd/localpart0/ambari-metrics-collector/hbase/data
[jthakrar@dtord01flm03p data]$ du -xhs *14G default72K hbase
From: Siddharth Wagle <[email protected]>
To: "[email protected]" <[email protected]>; "[email protected]"
<[email protected]>
Sent: Wednesday, May 6, 2015 11:00 PM
Subject: Re: Kafka broker metrics not appearing in REST API
We have tested the embeded mode to work with upto 400 node cluster and multiple
services running on it.
You can change the hbase.rootdir in ams-hbase-site and possibly write to
partition with separate disk mount.
And copy over the data from existing location. It would be good to know what is
the size of data written to hbase.rootdir to get an idea of what kind of write
volume are we looking at.
Sid
Sent by Outlook for Android
On Wed, May 6, 2015 at 8:52 PM -0700, "Jayesh Thakrar"<[email protected]>
wrote:
We have a 30-node cluster.Unfortunately, this is also our production cluster
and there's no HDFS as it is a dedicated Flume cluster.We have installed Ambari
+ Storm + Kafka (HDP) on a cluster on which we have production data being
flumed.The flume data is being sent to an HDFS cluster which is a little
overloaded, so we want to send flume data to Kafka and then "throttle" the data
being loaded into the HDFS cluster.
But you have given me an idea - maybe I can setup a new HBase file location so
that I can do away with HBase data corruption, if any.
It will take me some time to do that, will let you know once I have tried it
out.
Thanks,jayesh
From: Siddharth Wagle <[email protected]>
To: "[email protected]" <[email protected]>; Jayesh Thakrar
<[email protected]>
Sent: Wednesday, May 6, 2015 10:42 PM
Subject: Re: Kafka broker metrics not appearing in REST API
#yiv6330267542 -- -- --P {margin-top:0;margin-bottom:0;}#yiv6330267542 --P
{margin-top:0;margin-bottom:0;}#yiv6330267542 How big is your cluster in terms
of number of nodes?You can tune settings for HBase based on cluster size.
Following are the instructions for writing metrics to HDFS instead of local FS.
ams-site::: timeline.metrics.service.operation.mode = distributed
ams-hbase-site:::hbase.rootdir =
hdfs://<namenode-host>:8020/amshbasehbase.cluster.distributed = true
-Sid
From: Jayesh Thakrar <[email protected]>
Sent: Wednesday, May 06, 2015 8:30 PM
To: [email protected]; Siddharth Wagle; Jayesh Thakrar
Subject: Re: Kafka broker metrics not appearing in REST API More info....
I was doing some "stress-testing" and interestingly, the Metrics Collector
crashed 2 times and I had to restart it (don't like a file-based HBase for the
metrics collector, but not very confident of configuring the system to point to
an existing HBase cluster).
Also, after this email thread, I looked up the metrics collector logs and see
errors like this -
METRIC_RECORD' at
region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103,
seqNum=24393013:09:37,619 INFO [phoenix-1-thread-349921] RpcRetryingCaller:129
- Call exception, tries=11, retries=35, started=835564 ms ago, cancelled=false,
msg=row
'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL��:�kafka_broker'
on table 'METRIC_RECORD' at
region=METRIC_RECORD,kafkark.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa475dd45.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103,
seqNum=24393113:10:58,082 INFO [phoenix-1-thread-349920] RpcRetryingCaller:129
- Call exception, tries=12, retries=35, started=916027 ms ago, cancelled=false,
msg=row '' on table 'METRIC_RECORD' at
region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103,
seqNum=24393013:10:58,082 INFO [phoenix-1-thread-349921] RpcRetryingCaller:129
- Call exception, tries=12, retries=35, started=916027 ms ago, cancelled=false,
msg=row
'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL��:�kafka_broker'
on table 'METRIC_RECORD' at
region=METRIC_RECORD,kafkark.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa475dd45.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103,
seqNum=24393113:10:58,112 ERROR [Thread-25] TimelineMetricAggregator:221 -
Exception during aggregating
metrics.org.apache.phoenix.exception.PhoenixIOException:
org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36,
exceptions:Sat Apr 25 13:10:58 UTC 2015, null, java.net.SocketTimeoutException:
callTimeout=900000, callDuration=938097: row '' on table 'METRIC_RECORD' at
region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=243930
at
org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:107)
at
org.apache.phoenix.iterate.ParallelIterators.getIterators(ParallelIterators.java:527)
at
org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)
at
org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:63)
at
org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:90)
at
org.apache.phoenix.iterate.MergeSortTopNResultIterator.next(MergeSortTopNResultIterator.java:87)
at
org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739)
at
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.TimelineMetricAggregator.aggregateMetricsFromResultSet(TimelineMetricAggregator.java:104)
at
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.TimelineMetricAggregator.aggregate(TimelineMetricAggregator.java:72)
at
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.doWork(AbstractTimelineAggregator.java:217)
at
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.runOnce(AbstractTimelineAggregator.java:94)
at
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.run(AbstractTimelineAggregator.java:70)
From: Jayesh Thakrar <[email protected]>
To: Siddharth Wagle <[email protected]>; "[email protected]"
<[email protected]>
Sent: Wednesday, May 6, 2015 10:07 PM
Subject: Re: Kafka broker metrics not appearing in REST API
Hi Siddharth,
Yes, I am using Ambari 2.0 with Ambari Metrics service.The interesting thing is
that I got them for some time and not anymore.And I also know that the metrics
are being collected since i can see them on the dashboard.Any pointer for
troubleshooting?
And btw, it would be nice to have a count of messages received and not a
computed metric count / min.TSDB does a good job of giving me cumulative and
rate-per-sec graphs and numbers.
Thanks in advance,Jayesh
From: Siddharth Wagle <[email protected]>
To: "[email protected]" <[email protected]>; Jayesh Thakrar
<[email protected]>
Sent: Wednesday, May 6, 2015 10:03 PM
Subject: Re: Kafka broker metrics not appearing in REST API
#yiv6330267542 -- -- -- --P {margin-top:0;margin-bottom:0;}#yiv6330267542 Hi
Jayesh,
Are you using Ambari 2.0 with Ambari Metrics service?
BR,Sid
From: Jayesh Thakrar <[email protected]>
Sent: Wednesday, May 06, 2015 7:53 PM
To: [email protected]
Subject: Kafka broker metrics not appearing in REST API Hi,
I have installed 2 clusters with Ambari and Storm and Kafka.After the install,
I was able to get metrics for both Storm and Kafka via REST API.This worked
fine for a week, but since the past 2 days, I have not been getting Kafka
metrics.
I need the metrics to push to an OpenTSDB cluster.I do get host metrics and
Nimbus metrics but not KAFKA_BROKER metrics.
I did have maintenance turned on for some time, but maintenance is turned off
now.
[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin
'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=metrics'{
"href" :
"http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=metrics",
"ServiceComponentInfo" : { "cluster_name" : "ord_flume_kafka_prod",
"component_name" : "NIMBUS", "service_name" : "STORM" }, "metrics" : {
"storm" : { "nimbus" : { "freeslots" : 54.0, "supervisors" :
27.0, "topologies" : 0.0, "totalexecutors" : 0.0,
"totalslots" : 54.0, "totaltasks" : 0.0, "usedslots" : 0.0 }
} }}
[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin
'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields=metrics'{
"href" :
"http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields=metrics",
"ServiceComponentInfo" : { "cluster_name" : "ord_flume_kafka_prod",
"component_name" : "KAFKA_BROKER", "service_name" : "KAFKA" }}
[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin
'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=metrics'{
"href" :
"http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=metrics",
"ServiceComponentInfo" : { "cluster_name" : "ord_flume_kafka_prod",
"component_name" : "SUPERVISOR", "service_name" : "STORM" }