Hi, I see it is starting aggregation and follow by a couple of warning before the FATAL error.
2016-10-10 22:46:30,096 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Start aggregation cycle @ Mon Oct 10 22:46:30 EDT 2016, startTime = Mon Oct 10 22:40:30 EDT 2016, endTime = Mon Oct 10 22:45:30 EDT 2016 2016-10-10 22:47:31,618 WARN org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already closed java.io.IOException: Call to nvperf59.rtp.raleigh.ibm.com/9.42.125.150:31645 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1357, waitTime=60002, operationTimeout=60000 expired. at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1232) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1200) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651) at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:355) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:195) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:142) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:258) at org.apache.hadoop.hbase.client.ClientScanner.possiblyNextScanner(ClientScanner.java:241) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:532) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364) at org.apache.phoenix.iterate.ScanningResultIterator.next(ScanningResultIterator.java:50) at org.apache.phoenix.iterate.TableResultIterator.next(TableResultIterator.java:104) at org.apache.phoenix.iterate.ChunkedResultIterator$SingleChunkResultIterator.next(ChunkedResultIterator.java:149) at org.apache.phoenix.iterate.SpoolingResultIterator.<init>(SpoolingResultIterator.java:107) at org.apache.phoenix.iterate.SpoolingResultIterator.<init>(SpoolingResultIterator.java:74) at org.apache.phoenix.iterate.SpoolingResultIterator$SpoolingResultIteratorFactory.newIterator(SpoolingResultIterator.java:68) at org.apache.phoenix.iterate.ChunkedResultIterator.<init>(ChunkedResultIterator.java:92) at org.apache.phoenix.iterate.ChunkedResultIterator$ChunkedResultIteratorFactory.newIterator(ChunkedResultIterator.java:72) at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:94) at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:85) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1357, waitTime=60002, operationTimeout=60000 expired. at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1174) ... 28 more Sometime, the aggregation did success but when it failed, it will hit the FATAL error. Looks like hBase issue? Regards, Chin Wei On Fri, Oct 7, 2016 at 7:20 PM, Dmitry Sen <d...@hortonworks.com> wrote: > Hi, > > > Probably there is no enough resources on this single node to run all > deployed services. > > If it fails right after start, try to increase timeline.metrics. > service.watcher.initial.delay > > If AMS works fine, but just stops, you can set timeline.metrics.service. > watcher.disabled to true as a workaround. > > > ------------------------------ > *From:* Chin Wei Low <lowchin...@gmail.com> > *Sent:* Friday, October 07, 2016 5:31 AM > *To:* user@ambari.apache.org > *Subject:* Ambari Metric Collector keeps stopping with FATAL error > > Hi, > > I am running Ambari 2.2 and the Ambari Metric Collector keeps stopping > with the following FATAL error. This is a single node deployment with > embedded hBase. > > > FATAL org.apache.hadoop.yarn.server.applicationhistoryservice.metr > ics.timeline.TimelineMetricStoreWatcher: Error getting metrics from > TimelineMetricStore. Shutting down by TimelineMetricStoreWatcher. > > I have try to set *timeline.metrics.service.watcher.timeout *to 90, but > it does not help. > > Any help will be appreciated. > > > Regards, > Chin Wei > >