Hi Eric,
Any reason for using 2.2.0 ? There we a bunch of issues fixed in the later versions with regards to AMS. I would suggest using at least 2.2.2 for performance reasons. Cluster level data is much lesser in volume than Host level. So in terms of ttl, *.host.aggregator.* makes more sense instead of *.cluster.aggregator.*.ttl I would set memory configs to Collector = 1 GB and Master = 2 GB, RS = 1 GB in a embedded mode AMS. (ams-env and ams-hbase-env) Other recommendations are documented on the wiki. BR, Sid ________________________________ From: Eric Troies <[email protected]> Sent: Thursday, September 22, 2016 12:57 AM To: [email protected] Subject: Re: [metrics collector] stopping by itself Hi Siddharth, Thank you, We're using version 2.2.0.0 with about 150 hosts. Did not find any error like the one we have on confluence. I've set timeline.metrics.cluster.aggregator.second.ttl from 15 days to 3, will see if that helps. Regards, Olivier On Wed, Sep 21, 2016 at 6:04 PM, Siddharth Wagle <[email protected]<mailto:[email protected]>> wrote: Hi Eric, Please take a look at the troubleshooting section on the wiki: https://cwiki.apache.org/confluence/display/AMBARI/Troubleshooting How many node cluster do you have? What is the version of Ambari? BR, Sid ________________________________ From: Eric Troies <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 21, 2016 6:48 AM To: [email protected]<mailto:[email protected]> Subject: [metrics collector] stopping by itself Hi, After a few minutes running, I have my ambari collector stopping, with final lines in the log: 2016-09-21 13:13:58,573 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping phoenix metrics system... 2016-09-21 13:13:58,577 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system stopped. 2016-09-21 13:13:58,577 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system shutdown complete. 2016-09-21 13:13:58,578 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl: Stopping ApplicationHistory 2016-09-21 13:13:58,578 INFO org.apache.hadoop.ipc.Server: Stopping server on 60200 2016-09-21 13:13:58,581 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2016-09-21 13:13:58,581 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down ApplicationHistoryServer at hostname ************************************************************/ 2016-09-21 13:13:58,581 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 60200 Note that previously I've also been increasing the heap size to 1G because I had GC errors. Before I have a lot of stack trace like the following. Thanks, Eric 2016-09-21 13:13:58,534 WARN org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR javax.ws.rs<http://javax.ws.rs>.WebApplicationException: org.apache.phoenix.execute.Com<http://org.apache.phoenix.execute.Com>mitException: java.io.InterruptedIOException at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TimelineWebServices.postMetrics(TimelineWebServices.java:279) at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:895) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:843) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:804) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1243) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: org.apache.phoenix.execute.Com<http://org.apache.phoenix.execute.Com>mitException: java.io.InterruptedIOException at org.apache.phoenix.execute.MutationState.commit(MutationState.java:444) at org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:461) at org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:458) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:458) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.insertMetricRecords(PhoenixHBaseAccessor.java:429) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.putMetrics(HBaseTimelineMetricStore.java:323) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TimelineWebServices.postMetrics(TimelineWebServices.java:275) ... 46 more
