Hi Again, The following appears to work around the issue, but I'm not sure of the long term affect of running these commands, so do not run them unless you are willing to trash your cluster:
delete from topology_host_info; delete from topology_logical_task; delete from topology_host_task; delete from topology_host_request; delete from topology_hostgroup; delete from topology_logical_request; delete from topology_request; I need to test if I can add new hosts once these tables have had there entries cleared from them. I have a feeling I won't be able to scale automatically as some of the information to do this is held within these tables. This happens for me every time I install a cluster using a blueprint, and then scale using the api and host groups. Cheers On Wed, Sep 28, 2016 at 1:19 PM, cs user <[email protected]> wrote: > Hi All, > > I've just had the exact same issue upgrading from 2.2.1.0 to 2.4.0.1. I'm > using blueprints and then adding the nodes via a curl command, telling > ambari which host group they should be in. > > This time however, my topology_request table looks fine. It just loops > over and over again with these error messages: > > 28 Sep 2016 13:14:39,006 INFO [ambari-hearbeat-monitor] HostRequest:127 - > HostRequest: Successfully recovered host request for host: Host Assignment > Pending > 28 Sep 2016 13:14:39,007 INFO [ambari-hearbeat-monitor] > LogicalRequest:449 - LogicalRequest.createHostRequests: created new > outstanding host request ID = 6 > 28 Sep 2016 13:14:39,012 INFO [ambari-hearbeat-monitor] HostRequest:127 - > HostRequest: Successfully recovered host request for host: Host Assignment > Pending > 28 Sep 2016 13:14:39,012 INFO [ambari-hearbeat-monitor] > LogicalRequest:449 - LogicalRequest.createHostRequests: created new > outstanding host request ID = 8 > 28 Sep 2016 13:14:39,022 INFO [ambari-hearbeat-monitor] HostRequest:127 - > HostRequest: Successfully recovered host request for host: Host Assignment > Pending > 28 Sep 2016 13:14:39,022 INFO [ambari-hearbeat-monitor] > LogicalRequest:449 - LogicalRequest.createHostRequests: created new > outstanding host request ID = 2 > 28 Sep 2016 13:14:39,028 INFO [ambari-hearbeat-monitor] HostRequest:127 - > HostRequest: Successfully recovered host request for host: Host Assignment > Pending > 28 Sep 2016 13:14:39,028 INFO [ambari-hearbeat-monitor] > LogicalRequest:449 - LogicalRequest.createHostRequests: created new > outstanding host request ID = 5 > 28 Sep 2016 13:14:39,033 INFO [ambari-hearbeat-monitor] HostRequest:127 - > HostRequest: Successfully recovered host request for host: Host Assignment > Pending > 28 Sep 2016 13:14:39,033 INFO [ambari-hearbeat-monitor] > LogicalRequest:449 - LogicalRequest.createHostRequests: created new > outstanding host request ID = 7 > 28 Sep 2016 13:14:39,042 INFO [ambari-hearbeat-monitor] HostRequest:127 - > HostRequest: Successfully recovered host request for host: Host Assignment > Pending > 28 Sep 2016 13:14:39,042 INFO [ambari-hearbeat-monitor] > LogicalRequest:449 - LogicalRequest.createHostRequests: created new > outstanding host request ID = 4 > 28 Sep 2016 13:14:39,055 INFO [ambari-hearbeat-monitor] HostRequest:127 - > HostRequest: Successfully recovered host request for host: Host Assignment > Pending > 28 Sep 2016 13:14:39,056 INFO [ambari-hearbeat-monitor] > LogicalRequest:449 - LogicalRequest.createHostRequests: created new > outstanding host request ID = 1 > 28 Sep 2016 13:14:39,062 INFO [ambari-hearbeat-monitor] HostRequest:127 - > HostRequest: Successfully recovered host request for host: Host Assignment > Pending > 28 Sep 2016 13:14:39,062 INFO [ambari-hearbeat-monitor] > LogicalRequest:449 - LogicalRequest.createHostRequests: created new > outstanding host request ID = 3 > 28 Sep 2016 13:15:13,119 WARN [ambari-hearbeat-monitor] > HeartbeatMonitor:129 - Exception received > java.lang.NullPointerException > at java.lang.String.replace(String.java:2240) > at org.apache.ambari.server.topology.HostRequest. > getLogicalTasks(HostRequest.java:303) > at org.apache.ambari.server.topology.LogicalRequest. > getCommands(LogicalRequest.java:158) > at org.apache.ambari.server.topology.LogicalRequest. > getRequestStatus(LogicalRequest.java:231) > at org.apache.ambari.server.topology.TopologyManager. > isLogicalRequestFinished(TopologyManager.java:812) > at org.apache.ambari.server.topology.TopologyManager. > replayRequests(TopologyManager.java:766) > at org.apache.ambari.server.topology.TopologyManager. > ensureInitialized(TopologyManager.java:150) > at org.apache.ambari.server.topology.TopologyManager. > onHostHeartBeatLost(TopologyManager.java:485) > at org.apache.ambari.server.state.host.HostImpl$ > HostHeartbeatLostTransition.transition(HostImpl.java:408) > at org.apache.ambari.server.state.host.HostImpl$ > HostHeartbeatLostTransition.transition(HostImpl.java:396) > at org.apache.ambari.server.state.fsm.StateMachineFactory$ > SingleInternalArc.doTransition(StateMachineFactory.java:354) > at org.apache.ambari.server.state.fsm.StateMachineFactory. > doTransition(StateMachineFactory.java:294) > at org.apache.ambari.server.state.fsm.StateMachineFactory. > access$300(StateMachineFactory.java:39) > at org.apache.ambari.server.state.fsm.StateMachineFactory$ > InternalStateMachine.doTransition(StateMachineFactory.java:440) > at org.apache.ambari.server.state.host.HostImpl. > handleEvent(HostImpl.java:584) > at org.apache.ambari.server.agent.HeartbeatMonitor.doWork( > HeartbeatMonitor.java:160) > at org.apache.ambari.server.agent.HeartbeatMonitor.run( > HeartbeatMonitor.java:121) > at java.lang.Thread.run(Thread.java:745) > > > I've repeated this upgrade over and over in a development environment, the > initial install and upgrade is automated. Each time I get the same issue > once the server starts up, none of the agents can register, they just get > error 500 returned. > > Am I the only one who is hitting this issue? > > Cheers! > > > On Wed, Mar 9, 2016 at 7:09 PM, cs user <[email protected]> wrote: > >> So I was able to get past this error by running removing rows 9 and 10 >> from the table below. It appears that when two hosts I deleted came back , >> in effect totally new hosts but with the same hostname, it created a number >> of duplicate rows in the various topology tables. I deleted the duplicates >> from a number of these tables, but deleting the final two rows below fixed >> it for me...... I don't have a copy of how these looked, but some of them >> contained duplicate rows with the node names I had deleted and restored >> listed twice. Perhaps someone can shed some light on what may have caused >> this? >> >> >> Just to clarify, I have 7 hosts, so this table should contain 8 rows. 1 >> for the cluster, the remaining for the hosts. When things were failing it >> contained 10 rows. >> >> >> ambari=> select * from topology_request; >> >> id | action | cluster_id | bp_name | cluster_properties | >> cluster_attributes | description >> >> ----+-----------+------------+------------+----------------- >> ---+--------------------+--------------------------------------- >> >> 1 | PROVISION | 2 | testcluster | {} | {} >> | Provision Cluster 'testcluster' >> >> 2 | SCALE | 2 | testcluster | {} | {} >> | Scale Cluster 'testcluster' (+1 hosts) >> >> 3 | SCALE | 2 | testcluster | {} | {} >> | Scale Cluster 'testcluster' (+1 hosts) >> >> 4 | SCALE | 2 | testcluster | {} | {} >> | Scale Cluster 'testcluster' (+1 hosts) >> >> 5 | SCALE | 2 | testcluster | {} | {} >> | Scale Cluster 'testcluster' (+1 hosts) >> >> 6 | SCALE | 2 | testcluster | {} | {} >> | Scale Cluster 'testcluster' (+1 hosts) >> >> 7 | SCALE | 2 | testcluster | {} | {} >> | Scale Cluster 'testcluster' (+1 hosts) >> >> 8 | SCALE | 2 | testcluster | {} | {} >> | Scale Cluster 'testcluster' (+1 hosts) >> >> 9 | SCALE | 2 | testcluster | {} | {} >> | Scale Cluster 'testcluster' (+1 hosts) >> >> 10 | SCALE | 2 | testcluster | {} | {} >> | Scale Cluster 'testcluster' (+1 hosts) >> >> (10 rows) >> >> >> >> >> On Tue, Mar 8, 2016 at 3:00 PM, Jonathan Hurley <[email protected]> >> wrote: >> >>> That's very odd, especially since the upgrade doesn't touch the topology >>> tables. Are you using MySQL by any chance? If so, can you check to make >>> sure that your database engine is Innodb and not MyISAM. You have an >>> integrity violation here which doesn't seem possible unless you're using a >>> database which doesn't support foreign key constraints. >>> >>> There's probably some SQL which you can run to insert an entry into the >>> topology_logical_request table, but it's probably best to understand why >>> this happened first. >>> >>> On Mar 8, 2016, at 5:55 AM, cs user <[email protected]> wrote: >>> >>> Hi All, >>> >>> I've upgraded Ambari from version 2.1.2-377 to version 2.2.1.0-161. >>> >>> After performing the upgrade on the server, agents, upgrading the >>> database and starting everything up, I keep seeing the following error in >>> the logs on the server: >>> >>> 08 Mar 2016 10:07:05,087 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: Host Assignment >>> Pending >>> 08 Mar 2016 10:07:05,088 INFO [qtp-ambari-agent-55] LogicalRequest:420 >>> - LogicalRequest.createHostRequests: created new outstanding host >>> request ID = 3 >>> 08 Mar 2016 10:07:05,120 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: Host Assignment >>> Pending >>> 08 Mar 2016 10:07:05,120 INFO [qtp-ambari-agent-55] LogicalRequest:420 >>> - LogicalRequest.createHostRequests: created new outstanding host >>> request ID = 5 >>> 08 Mar 2016 10:07:05,134 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: Host Assignment >>> Pending >>> 08 Mar 2016 10:07:05,134 INFO [qtp-ambari-agent-55] LogicalRequest:420 >>> - LogicalRequest.createHostRequests: created new outstanding host >>> request ID = 8 >>> 08 Mar 2016 10:07:05,147 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: Host Assignment >>> Pending >>> 08 Mar 2016 10:07:05,148 INFO [qtp-ambari-agent-55] LogicalRequest:420 >>> - LogicalRequest.createHostRequests: created new outstanding host >>> request ID = 7 >>> 08 Mar 2016 10:07:05,158 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: Host Assignment >>> Pending >>> 08 Mar 2016 10:07:05,158 INFO [qtp-ambari-agent-55] LogicalRequest:420 >>> - LogicalRequest.createHostRequests: created new outstanding host >>> request ID = 6 >>> 08 Mar 2016 10:07:05,170 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: Host Assignment >>> Pending >>> 08 Mar 2016 10:07:05,170 INFO [qtp-ambari-agent-55] LogicalRequest:420 >>> - LogicalRequest.createHostRequests: created new outstanding host >>> request ID = 2 >>> 08 Mar 2016 10:07:05,184 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: Host Assignment >>> Pending >>> 08 Mar 2016 10:07:05,185 INFO [qtp-ambari-agent-55] LogicalRequest:420 >>> - LogicalRequest.createHostRequests: created new outstanding host >>> request ID = 1 >>> 08 Mar 2016 10:07:05,194 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: Host Assignment >>> Pending >>> 08 Mar 2016 10:07:05,194 INFO [qtp-ambari-agent-55] LogicalRequest:420 >>> - LogicalRequest.createHostRequests: created new outstanding host >>> request ID = 4 >>> 08 Mar 2016 10:07:05,290 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: >>> ambdevtestdc2host-group-21.node.example >>> 08 Mar 2016 10:07:05,328 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: >>> ambdevtestdc2host-group-51.node.example >>> 08 Mar 2016 10:07:05,384 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: >>> ambdevtestdc2host-group-11.node.example >>> 08 Mar 2016 10:07:05,428 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: >>> ambdevtestdc2host-group-41.node.example >>> 08 Mar 2016 10:07:05,507 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: >>> ambdevtestdc2host-group-31.node.example >>> 08 Mar 2016 10:07:05,575 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: >>> ambdevtestdc2host-group-53.node.example >>> 08 Mar 2016 10:07:05,627 INFO [qtp-ambari-agent-55] HostRequest:125 - >>> HostRequest: Successfully recovered host request for host: >>> ambdevtestdc2host-group-52.node.example >>> 08 Mar 2016 10:07:05,644 WARN [qtp-ambari-agent-55] ServletHandler:563 >>> - /agent/v1/register/ambdevtestdc2host-group-51.node.example >>> java.lang.NullPointerException >>> at org.apache.ambari.server.topology.PersistedStateImpl.getAllR >>> equests(PersistedStateImpl.java:157) >>> at org.apache.ambari.server.topology.TopologyManager.ensureInit >>> ialized(TopologyManager.java:131) >>> at org.apache.ambari.server.topology.TopologyManager.onHostRegi >>> stered(TopologyManager.java:315) >>> at org.apache.ambari.server.state.host.HostImpl$HostRegistratio >>> nReceived.transition(HostImpl.java:301) >>> at org.apache.ambari.server.state.host.HostImpl$HostRegistratio >>> nReceived.transition(HostImpl.java:266) >>> at org.apache.ambari.server.state.fsm.StateMachineFactory$Singl >>> eInternalArc.doTransition(StateMachineFactory.java:354) >>> at org.apache.ambari.server.state.fsm.StateMachineFactory.doTra >>> nsition(StateMachineFactory.java:294) >>> at org.apache.ambari.server.state.fsm.StateMachineFactory.acces >>> s$300(StateMachineFactory.java:39) >>> at org.apache.ambari.server.state.fsm.StateMachineFactory$Inter >>> nalStateMachine.doTransition(StateMachineFactory.java:440) >>> at org.apache.ambari.server.state.host.HostImpl.handleEvent( >>> HostImpl.java:570) >>> at org.apache.ambari.server.agent.HeartBeatHandler.handleRegist >>> ration(HeartBeatHandler.java:966) >>> at org.apache.ambari.server.agent.rest.AgentResource.register( >>> AgentResource.java:95) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>> ssorImpl.java:62) >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>> thodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:497) >>> at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invo >>> ke(JavaMethodInvokerFactory.java:60) >>> at com.sun.jersey.server.impl.model.method.dispatch.AbstractRes >>> ourceMethodDispatchProvider$TypeOutInvoker._dispatch(Abstr >>> actResourceMethodDispatchProvider.java:185) >>> at com.sun.jersey.server.impl.model.method.dispatch.ResourceJav >>> aMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) >>> at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept( >>> HttpMethodRule.java:302) >>> at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accep >>> t(RightHandPathRule.java:147) >>> at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accep >>> t(ResourceClassRule.java:108) >>> at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accep >>> t(RightHandPathRule.java:147) >>> at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule >>> .accept(RootResourceClassesRule.java:84) >>> at com.sun.jersey.server.impl.application.WebApplicationImpl._h >>> andleRequest(WebApplicationImpl.java:1542) >>> at com.sun.jersey.server.impl.application.WebApplicationImpl._h >>> andleRequest(WebApplicationImpl.java:1473) >>> at com.sun.jersey.server.impl.application.WebApplicationImpl.ha >>> ndleRequest(WebApplicationImpl.java:1419) >>> at com.sun.jersey.server.impl.application.WebApplicationImpl.ha >>> ndleRequest(WebApplicationImpl.java:1409) >>> at com.sun.jersey.spi.container.servlet.WebComponent.service(We >>> bComponent.java:409) >>> at com.sun.jersey.spi.container.servlet.ServletContainer.servic >>> e(ServletContainer.java:540) >>> at com.sun.jersey.spi.container.servlet.ServletContainer.servic >>> e(ServletContainer.java:715) >>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) >>> at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder >>> .java:684) >>> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte >>> r(ServletHandler.java:1496) >>> at org.apache.ambari.server.security.SecurityFilter.doFilter( >>> SecurityFilter.java:67) >>> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte >>> r(ServletHandler.java:1467) >>> at org.apache.ambari.server.api.AmbariPersistFilter.doFilter(Am >>> bariPersistFilter.java:47) >>> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte >>> r(ServletHandler.java:1467) >>> at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgen >>> tFilter.java:82) >>> at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter. >>> java:294) >>> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte >>> r(ServletHandler.java:1467) >>> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan >>> dler.java:501) >>> at org.eclipse.jetty.server.handler.ContextHandler.doHandle( >>> ContextHandler.java:1086) >>> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand >>> ler.java:429) >>> at org.eclipse.jetty.server.handler.ContextHandler.doScope( >>> ContextHandler.java:1020) >>> at org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped >>> Handler.java:135) >>> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl >>> erWrapper.java:116) >>> at org.eclipse.jetty.server.Server.handle(Server.java:370) >>> at org.eclipse.jetty.server.AbstractHttpConnection.handleReques >>> t(AbstractHttpConnection.java:494) >>> at org.eclipse.jetty.server.AbstractHttpConnection.content(Abst >>> ractHttpConnection.java:982) >>> at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandl >>> er.content(AbstractHttpConnection.java:1043) >>> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java: >>> 865) >>> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser. >>> java:240) >>> at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHtt >>> pConnection.java:82) >>> at org.eclipse.jetty.io.nio.SslConnection.handle(SslConnection. >>> java:196) >>> at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(Select >>> ChannelEndPoint.java:696) >>> at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectC >>> hannelEndPoint.java:53) >>> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued >>> ThreadPool.java:608) >>> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedT >>> hreadPool.java:543) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> >>> This is not specific to host group ambdevtestdc2host-group-51.node.example, >>> it is happening for all host groups. >>> >>> On the agents I see the following: >>> >>> <head> >>> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/> >>> <title>Error 500 Server Error</title> >>> </head> >>> <body> >>> <h2>HTTP ERROR: 500</h2> >>> <p>Problem accessing >>> /agent/v1/register/ambdevtestdc2host-group-51.node.example >>> Reason: >>> <pre> Server Error</pre></p> >>> <hr /><i><small>Powered by Jetty://</small></i> >>> >>> Is there a work around for this? It's just a test cluster, but it would >>> be good to know how to work around this, as I've seen it a number of times >>> now. Is there anything that can be modified in the database to resolve it? >>> >>> Thanks! >>> >>> >>> >>> >>> >> >
