Is there any other version of Jackson on your classpath on the cluster machines? What does running ‘hadoop classpath’ show? Are there any other Jackson inclusions in there?
Bikas From: Rajat Jain [mailto:[email protected]] Sent: Tuesday, October 06, 2015 12:18 AM To: Jonathan Eagles <[email protected]> Cc: [email protected]; Gopal Vijayaraghavan <[email protected]> Subject: Re: Flaky queries with 'TezSession has already shut down' Folks, any idea on the failure? On 5 October 2015 at 11:46, Rajat Jain <[email protected]<mailto:[email protected]>> wrote: YARN ships with these jars: /usr/lib/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/jackson-xc-1.9.13.jar /usr/lib/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/jackson-jaxrs-1.9.13.jar /usr/lib/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/jackson-core-asl-1.9.13.jar /usr/lib/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/jackson-mapper-asl-1.9.13.jar /usr/lib/hadoop/share/hadoop/yarn/lib/jackson-annotations-2.2.3.jar /usr/lib/hadoop/share/hadoop/yarn/lib/jackson-databind-2.2.3.jar /usr/lib/hadoop/share/hadoop/yarn/lib/jackson-core-2.2.3.jar /usr/lib/hadoop/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/jackson-xc-1.9.13.jar /usr/lib/hadoop/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/jackson-jaxrs-1.9.13.jar /usr/lib/hadoop/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/jackson-core-asl-1.9.13.jar /usr/lib/hadoop/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/jackson-mapper-asl-1.9.13.jar /usr/lib/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar /usr/lib/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar /usr/lib/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar /usr/lib/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar Tez ships with these jars: /usr/lib/tez/lib/jackson-xc-1.9.13.jar /usr/lib/tez/lib/jackson-annotations-2.2.3.jar /usr/lib/tez/lib/jackson-jaxrs-1.9.13.jar /usr/lib/tez/lib/jackson-core-asl-1.9.13.jar /usr/lib/tez/lib/jackson-databind-2.2.3.jar /usr/lib/tez/lib/jackson-core-2.2.3.jar /usr/lib/tez/lib/jackson-mapper-asl-1.9.13.jar On 5 October 2015 at 11:21, Jonathan Eagles <[email protected]<mailto:[email protected]>> wrote: What's happening here is that the Tez AM has a runtime dependency on jackson libraries due to ATSHistoryLoggingService (due to TimelineClientImpl in YARN). It looks like your application is shipping a different version of jackson that YARN was expecting. Can you check what two versions of jackson library you are running with? Jon On Mon, Oct 5, 2015 at 12:30 AM, Rajat Jain <[email protected]<mailto:[email protected]>> wrote: Here is the AM log. My setup doesn't use Ambari. Can you attach the /etc/security/limits.d/yarn.conf file? 2015-10-05 04:43:36,769 INFO [main] Configuration.deprecation: fs.default.name<http://fs.default.name> is deprecated. Instead, use fs.defaultFS 2015-10-05 04:43:36,972 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-10-05 04:43:37,077 INFO [main] app.DAGAppMaster: Using Standard Socket Factory for communication 2015-10-05 04:43:37,269 INFO [main] app.DAGAppMaster: Created DAGAppMaster for application appattempt_1444019975627_0001_000001, versionInfo=[ component=tez-dag, version=0.7.0, revision=e1c833e8be209145149cc6d3e1a01f8fa6e9b85f, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2015-09-29T13:26:49Z ] 2015-10-05 04:43:37,292 INFO [main] app.DAGAppMaster: Comparing client version with AM version, clientVersion=0.7.0, AMVersion=0.7.0 2015-10-05 04:43:37,947 WARN [main] shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2015-10-05 04:43:38,086 INFO [main] app.DAGAppMaster: Adding session token to jobTokenSecretManager for application 2015-10-05 04:43:38,090 INFO [main] common.AsyncDispatcher: Registering class org.apache.tez.dag.app.rm.container.AMContainerEventType for class org.apache.tez.dag.app.rm.container.AMContainerMap 2015-10-05 04:43:38,092 INFO [main] common.AsyncDispatcher: Registering class org.apache.tez.dag.app.rm.node.AMNodeEventType for class org.apache.tez.dag.app.rm.node.AMNodeTracker 2015-10-05 04:43:38,093 INFO [main] common.AsyncDispatcher: Registering class org.apache.tez.dag.app.dag.event.DAGAppMasterEventType for class org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler 2015-10-05 04:43:38,094 INFO [main] common.AsyncDispatcher: Registering class org.apache.tez.dag.app.dag.event.DAGEventType for class org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher 2015-10-05 04:43:38,094 INFO [main] common.AsyncDispatcher: Registering class org.apache.tez.dag.app.dag.event.VertexEventType for class org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher 2015-10-05 04:43:38,095 INFO [main] common.AsyncDispatcher: Registering class org.apache.tez.dag.app.dag.event.TaskEventType for class org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher 2015-10-05 04:43:38,096 INFO [main] common.AsyncDispatcher: Registering class org.apache.tez.dag.app.dag.event.TaskAttemptEventType for class org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher 2015-10-05 04:43:38,097 INFO [main] common.AsyncDispatcher: Registering class org.apache.tez.dag.app.dag.event.SpeculatorEventType for independent dispatch using: class org.apache.tez.dag.app.DAGAppMaster$SpeculatorEventHandler 2015-10-05 04:43:38,097 INFO [main] common.AsyncDispatcher: Registering class org.apache.tez.dag.app.dag.event.SpeculatorEventType for class org.apache.tez.dag.app.DAGAppMaster$SpeculatorEventHandler 2015-10-05 04:43:38,111 INFO [main] common.AsyncDispatcher: Registering class org.apache.tez.dag.app.rm.AMSchedulerEventType for class org.apache.tez.dag.app.rm.TaskSchedulerEventHandler 2015-10-05 04:43:38,113 INFO [main] common.AsyncDispatcher: Registering class org.apache.tez.dag.app.rm.NMCommunicatorEventType for class org.apache.tez.dag.app.launcher.ContainerLauncherImpl 2015-10-05 04:43:38,215 INFO [main] node.AMNodeTracker: blacklistDisablePercent is 33, blacklistingEnabled: true, maxTaskFailuresPerNode: 10 2015-10-05 04:43:38,217 INFO [main] web.WebUIService: Tez UI History URL: http://ec2-54-158-112-249.compute-1.amazonaws.com:10500/#/tez-app/application_1444019975627_0001<http://ec2-54-158-112-249.compute-1.amazonaws.com:10500/#/tez-app/application_1444019975627_00012015-10-05> 2015-10-05<http://ec2-54-158-112-249.compute-1.amazonaws.com:10500/#/tez-app/application_1444019975627_00012015-10-05> 04:43:38,219 INFO [main] launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500 2015-10-05 04:43:38,219 INFO [main] history.HistoryEventHandler: Initializing HistoryEventHandler 2015-10-05 04:43:38,224 INFO [main] ats.ATSHistoryLoggingService: Initializing ATSService 2015-10-05 04:43:38,806 INFO [main] impl.TimelineClientImpl: Timeline service address: http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/<http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/2015-10-05> 2015-10-05<http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/2015-10-05> 04:43:38,806 INFO [main] ats.ATSHistoryLoggingService: Using org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs 2015-10-05 04:43:38,971 INFO [main] impl.TimelineClientImpl: Timeline service address: http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/<http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/2015-10-05> 2015-10-05<http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/2015-10-05> 04:43:38,971 INFO [main] recovery.RecoveryService: Initializing RecoveryService 2015-10-05 04:43:38,974 INFO [main] history.HistoryEventHandler: [HISTORY][DAG:N/A][Event:APP_LAUNCHED]: applicationId=application_1444019975627_0001, appSubmitTime=1444020210133, launchTime=1444020217266 2015-10-05 04:43:38,975 INFO [main] history.HistoryEventHandler: [HISTORY][DAG:N/A][Event:AM_LAUNCHED]: appAttemptId=appattempt_1444019975627_0001_000001, appSubmitTime=1444020210133, launchTime=1444020217266 2015-10-05 04:43:38,982 INFO [ServiceThread:org.apache.tez.dag.history.HistoryEventHandler] history.HistoryEventHandler: Starting HistoryEventHandler 2015-10-05 04:43:38,982 INFO [ServiceThread:org.apache.tez.dag.history.HistoryEventHandler] ats.ATSHistoryLoggingService: Starting ATSService 2015-10-05 04:43:38,985 INFO [ServiceThread:org.apache.tez.dag.app.launcher.ContainerLauncherImpl] impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 2015-10-05 04:43:38,990 INFO [ServiceThread:DAGClientRPCServer] ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 2015-10-05 04:43:38,990 INFO [ServiceThread:org.apache.tez.dag.app.TaskAttemptListenerImpTezDag] ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 2015-10-05 04:43:38,997 INFO [ServiceThread:org.apache.tez.dag.history.HistoryEventHandler] recovery.RecoveryService: Starting RecoveryService 2015-10-05 04:43:39,001 INFO [Socket Reader #1 for port 47120] ipc.Server: Starting Socket Reader #1 for port 47120 2015-10-05 04:43:39,001 INFO [Socket Reader #1 for port 47029] ipc.Server: Starting Socket Reader #1 for port 47029 2015-10-05 04:43:39,027 INFO [IPC Server Responder] ipc.Server: IPC Server Responder: starting 2015-10-05 04:43:39,033 INFO [IPC Server listener on 47029] ipc.Server: IPC Server listener on 47029: starting 2015-10-05 04:43:39,035 INFO [IPC Server listener on 47120] ipc.Server: IPC Server listener on 47120: starting 2015-10-05 04:43:39,035 INFO [IPC Server Responder] ipc.Server: IPC Server Responder: starting 2015-10-05 04:43:39,035 INFO [ServiceThread:DAGClientRPCServer] client.DAGClientServer: Instantiated DAGClientRPCServer at ip-10-149-127-154.ec2.internal/10.149.127.154:47029<http://10.149.127.154:47029> 2015-10-05 04:43:39,070 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2015-10-05 04:43:39,074 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpRequestLog: Http request log for http.requests. is not defined 2015-10-05 04:43:39,083 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 2015-10-05 04:43:39,087 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 2015-10-05 04:43:39,087 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context static 2015-10-05 04:43:39,090 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: adding path spec: /* 2015-10-05 04:43:39,097 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: Jetty bound to port 34881 2015-10-05 04:43:39,098 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: jetty-6.1.26 2015-10-05 04:43:39,127 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: Extract jar:file:/media/ephemeral1/yarn/local/filecache/20/hadoop-yarn-common-2.6.0.jar!/webapps/ to /tmp/Jetty_0_0_0_0_34881_webapps____.97xzaz/webapp 2015-10-05 04:43:39,288 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet 2015-10-05 04:43:39,298 FATAL [HistoryEventHandlingThread] yarn.YarnUncaughtExceptionHandler: Thread Thread[HistoryEventHandlingThread,5,main] threw an Error. Shutting down now... java.lang.NoSuchMethodError: org.codehaus.jackson.map.ObjectMapper.setSerializationInclusion(Lorg/codehaus/jackson/map/annotate/JsonSerialize$Inclusion;)Lorg/codehaus/jackson/map/ObjectMapper; at org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:59) at org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.locateMapper(YarnJacksonJaxbJsonProvider.java:52) at org.codehaus.jackson.jaxrs.JacksonJsonProvider.writeTo(JacksonJsonProvider.java:501) at com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:204) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:226) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:162) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:237) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:472) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:321) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:342) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:52) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:186) at java.lang.Thread.run(Thread.java:745) 2015-10-05 04:43:39,303 INFO [HistoryEventHandlingThread] util.ExitUtil: Exiting with status -1 2015-10-05 04:43:39,307 INFO [Thread-2] app.DAGAppMaster: DAGAppMasterShutdownHook invoked 2015-10-05 04:43:39,307 INFO [Thread-2] app.DAGAppMaster: DAGAppMaster received a signal. Signaling TaskScheduler 2015-10-05 04:43:39,307 INFO [Thread-2] rm.TaskSchedulerEventHandler: TaskScheduler notified that iSignalled was : true 2015-10-05 04:43:39,397 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: Started [email protected]:34881<http://[email protected]:34881> 2015-10-05 04:43:39,397 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] webapp.WebApps: Web app / started at 34881 2015-10-05 04:43:39,608 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] webapp.WebApps: Registered webapp guice modules 2015-10-05 04:43:39,610 INFO [ServiceThread:org.apache.tez.dag.app.web.WebUIService] web.WebUIService: Instantiated WebUIService at http://ip-10-149-127-154.ec2.internal:34881/ui/<http://ip-10-149-127-154.ec2.internal:34881/ui/2015-10-05> 2015-10-05<http://ip-10-149-127-154.ec2.internal:34881/ui/2015-10-05> 04:43:39,645 INFO [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] rm.YarnTaskSchedulerService: TaskScheduler initialized with configuration: maxRMHeartbeatInterval: 1000, containerReuseEnabled: true, reuseRackLocal: true, reuseNonLocal: false, localitySchedulingDelay: 250, preemptionPercentage: 10, numHeartbeatsBetweenPreemptions: 3, idleContainerMinTimeout: 5000, idleContainerMaxTimeout: 10000, sessionMinHeldContainers: 0 2015-10-05 04:43:39,719 INFO [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] client.RMProxy: Connecting to ResourceManager at ec2-54-158-112-249.compute-1.amazonaws.com/10.157.14.248:8030<http://ec2-54-158-112-249.compute-1.amazonaws.com/10.157.14.248:80302015-10-05> 2015-10-05<http://ec2-54-158-112-249.compute-1.amazonaws.com/10.157.14.248:80302015-10-05> 04:43:39,812 INFO [main] history.HistoryEventHandler: [HISTORY][DAG:N/A][Event:AM_STARTED]: appAttemptId=appattempt_1444019975627_0001_000001, startTime=1444020219811 2015-10-05 04:43:39,812 INFO [main] app.DAGAppMaster: In Session mode. Waiting for DAG over RPC 2015-10-05 04:43:39,813 INFO [Thread-2] history.HistoryEventHandler: Stopping HistoryEventHandler 2015-10-05 04:43:39,814 INFO [Thread-2] recovery.RecoveryService: Stopping RecoveryService 2015-10-05 04:43:39,814 INFO [Thread-2] recovery.RecoveryService: Handle the remaining events in queue, queue size=0 2015-10-05 04:43:39,814 INFO [RecoveryEventHandlingThread] recovery.RecoveryService: EventQueue take interrupted. Returning 2015-10-05 04:43:39,814 INFO [Thread-2] ats.ATSHistoryLoggingService: Stopping ATSService, eventQueueBacklog=1 2015-10-05 04:43:39,814 WARN [Thread-2] ats.ATSHistoryLoggingService: ATSService being stopped, eventQueueBacklog=1, maxTimeLeftToFlush=-1, waitForever=true On 5 October 2015 at 10:48, Gopal Vijayaraghavan <[email protected]<mailto:[email protected]>> wrote: > The failures are always intermittent. Any idea why this happens? First up you should try 0.7.1, because of TEZ-2663. > Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has >already shutdown. Application application_1444019975627_0001 failed 2 >times due to AM Container for appattempt_1444019975627_0001_000002 exited >with exitCode: 255 Can you say what is printed in the AppMaster logs? I've seen this occasionally happen due to bad setup of cluster uid-limits. The ambari sets that up in a file named /etc/security/limits.d/yarn.conf (yarn.conf.j2) Check for that file. Otherwise as shuffle handler spawns threads, the container launchers will start to intermittently fail (the default is 1024 threads per-user, yarn.conf ups this to 65,000 threads). Cheers, Gopal
