What's happening here is that the Tez AM has a runtime dependency on jackson libraries due to ATSHistoryLoggingService (due to TimelineClientImpl in YARN). It looks like your application is shipping a different version of jackson that YARN was expecting. Can you check what two versions of jackson library you are running with?
Jon On Mon, Oct 5, 2015 at 12:30 AM, Rajat Jain <[email protected]> wrote: > Here is the AM log. My setup doesn't use Ambari. Can you attach the > /etc/security/limits.d/yarn.conf file? > > 2015-10-05 04:43:36,769 INFO [main] Configuration.deprecation: > fs.default.name is deprecated. Instead, use fs.defaultFS > 2015-10-05 04:43:36,972 WARN [main] util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2015-10-05 04:43:37,077 INFO [main] app.DAGAppMaster: Using Standard Socket > Factory for communication > 2015-10-05 04:43:37,269 INFO [main] app.DAGAppMaster: Created DAGAppMaster > for application appattempt_1444019975627_0001_000001, versionInfo=[ > component=tez-dag, version=0.7.0, > revision=e1c833e8be209145149cc6d3e1a01f8fa6e9b85f, > SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, > buildTime=2015-09-29T13:26:49Z ] > 2015-10-05 04:43:37,292 INFO [main] app.DAGAppMaster: Comparing client > version with AM version, clientVersion=0.7.0, AMVersion=0.7.0 > 2015-10-05 04:43:37,947 WARN [main] shortcircuit.DomainSocketFactory: The > short-circuit local reads feature cannot be used because libhadoop cannot be > loaded. > 2015-10-05 04:43:38,086 INFO [main] app.DAGAppMaster: Adding session token to > jobTokenSecretManager for application > 2015-10-05 04:43:38,090 INFO [main] common.AsyncDispatcher: Registering class > org.apache.tez.dag.app.rm.container.AMContainerEventType for class > org.apache.tez.dag.app.rm.container.AMContainerMap > 2015-10-05 04:43:38,092 INFO [main] common.AsyncDispatcher: Registering class > org.apache.tez.dag.app.rm.node.AMNodeEventType for class > org.apache.tez.dag.app.rm.node.AMNodeTracker > 2015-10-05 04:43:38,093 INFO [main] common.AsyncDispatcher: Registering class > org.apache.tez.dag.app.dag.event.DAGAppMasterEventType for class > org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler > 2015-10-05 04:43:38,094 INFO [main] common.AsyncDispatcher: Registering class > org.apache.tez.dag.app.dag.event.DAGEventType for class > org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher > 2015-10-05 04:43:38,094 INFO [main] common.AsyncDispatcher: Registering class > org.apache.tez.dag.app.dag.event.VertexEventType for class > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher > 2015-10-05 04:43:38,095 INFO [main] common.AsyncDispatcher: Registering class > org.apache.tez.dag.app.dag.event.TaskEventType for class > org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher > 2015-10-05 04:43:38,096 INFO [main] common.AsyncDispatcher: Registering class > org.apache.tez.dag.app.dag.event.TaskAttemptEventType for class > org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher > 2015-10-05 04:43:38,097 INFO [main] common.AsyncDispatcher: Registering class > org.apache.tez.dag.app.dag.event.SpeculatorEventType for independent dispatch > using: class org.apache.tez.dag.app.DAGAppMaster$SpeculatorEventHandler > 2015-10-05 04:43:38,097 INFO [main] common.AsyncDispatcher: Registering class > org.apache.tez.dag.app.dag.event.SpeculatorEventType for class > org.apache.tez.dag.app.DAGAppMaster$SpeculatorEventHandler > 2015-10-05 04:43:38,111 INFO [main] common.AsyncDispatcher: Registering class > org.apache.tez.dag.app.rm.AMSchedulerEventType for class > org.apache.tez.dag.app.rm.TaskSchedulerEventHandler > 2015-10-05 04:43:38,113 INFO [main] common.AsyncDispatcher: Registering class > org.apache.tez.dag.app.rm.NMCommunicatorEventType for class > org.apache.tez.dag.app.launcher.ContainerLauncherImpl > 2015-10-05 04:43:38,215 INFO [main] node.AMNodeTracker: > blacklistDisablePercent is 33, blacklistingEnabled: true, > maxTaskFailuresPerNode: 10 > 2015-10-05 04:43:38,217 INFO [main] web.WebUIService: Tez UI History URL: > http://ec2-54-158-112-249.compute-1.amazonaws.com:10500/#/tez-app/application_1444019975627_0001 > 2015-10-05 > <http://ec2-54-158-112-249.compute-1.amazonaws.com:10500/#/tez-app/application_1444019975627_00012015-10-05> > 04:43:38,219 INFO [main] launcher.ContainerLauncherImpl: Upper limit on the > thread pool size is 500 > 2015-10-05 04:43:38,219 INFO [main] history.HistoryEventHandler: Initializing > HistoryEventHandler > 2015-10-05 04:43:38,224 INFO [main] ats.ATSHistoryLoggingService: > Initializing ATSService > 2015-10-05 04:43:38,806 INFO [main] impl.TimelineClientImpl: Timeline service > address: > http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/ > 2015-10-05 > <http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/2015-10-05> > 04:43:38,806 INFO [main] ats.ATSHistoryLoggingService: Using > org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage > Timeline ACLs > 2015-10-05 04:43:38,971 INFO [main] impl.TimelineClientImpl: Timeline service > address: > http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/ > 2015-10-05 > <http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/2015-10-05> > 04:43:38,971 INFO [main] recovery.RecoveryService: Initializing > RecoveryService > 2015-10-05 04:43:38,974 INFO [main] history.HistoryEventHandler: > [HISTORY][DAG:N/A][Event:APP_LAUNCHED]: > applicationId=application_1444019975627_0001, appSubmitTime=1444020210133, > launchTime=1444020217266 > 2015-10-05 04:43:38,975 INFO [main] history.HistoryEventHandler: > [HISTORY][DAG:N/A][Event:AM_LAUNCHED]: > appAttemptId=appattempt_1444019975627_0001_000001, > appSubmitTime=1444020210133, launchTime=1444020217266 > 2015-10-05 04:43:38,982 INFO > [ServiceThread:org.apache.tez.dag.history.HistoryEventHandler] > history.HistoryEventHandler: Starting HistoryEventHandler > 2015-10-05 04:43:38,982 INFO > [ServiceThread:org.apache.tez.dag.history.HistoryEventHandler] > ats.ATSHistoryLoggingService: Starting ATSService > 2015-10-05 04:43:38,985 INFO > [ServiceThread:org.apache.tez.dag.app.launcher.ContainerLauncherImpl] > impl.ContainerManagementProtocolProxy: > yarn.client.max-cached-nodemanagers-proxies : 0 > 2015-10-05 04:43:38,990 INFO [ServiceThread:DAGClientRPCServer] > ipc.CallQueueManager: Using callQueue class > java.util.concurrent.LinkedBlockingQueue > 2015-10-05 04:43:38,990 INFO > [ServiceThread:org.apache.tez.dag.app.TaskAttemptListenerImpTezDag] > ipc.CallQueueManager: Using callQueue class > java.util.concurrent.LinkedBlockingQueue > 2015-10-05 04:43:38,997 INFO > [ServiceThread:org.apache.tez.dag.history.HistoryEventHandler] > recovery.RecoveryService: Starting RecoveryService > 2015-10-05 04:43:39,001 INFO [Socket Reader #1 for port 47120] ipc.Server: > Starting Socket Reader #1 for port 47120 > 2015-10-05 04:43:39,001 INFO [Socket Reader #1 for port 47029] ipc.Server: > Starting Socket Reader #1 for port 47029 > 2015-10-05 04:43:39,027 INFO [IPC Server Responder] ipc.Server: IPC Server > Responder: starting > 2015-10-05 04:43:39,033 INFO [IPC Server listener on 47029] ipc.Server: IPC > Server listener on 47029: starting > 2015-10-05 04:43:39,035 INFO [IPC Server listener on 47120] ipc.Server: IPC > Server listener on 47120: starting > 2015-10-05 04:43:39,035 INFO [IPC Server Responder] ipc.Server: IPC Server > Responder: starting > 2015-10-05 04:43:39,035 INFO [ServiceThread:DAGClientRPCServer] > client.DAGClientServer: Instantiated DAGClientRPCServer at > ip-10-149-127-154.ec2.internal/10.149.127.154:47029 > 2015-10-05 04:43:39,070 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: Logging > to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > org.mortbay.log.Slf4jLog > 2015-10-05 04:43:39,074 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpRequestLog: > Http request log for http.requests. is not defined > 2015-10-05 04:43:39,083 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: > Added global filter 'safety' > (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) > 2015-10-05 04:43:39,087 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: > Added filter AM_PROXY_FILTER > (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context > 2015-10-05 04:43:39,087 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: > Added filter AM_PROXY_FILTER > (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context > static > 2015-10-05 04:43:39,090 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: > adding path spec: /* > 2015-10-05 04:43:39,097 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: > Jetty bound to port 34881 > 2015-10-05 04:43:39,098 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: > jetty-6.1.26 > 2015-10-05 04:43:39,127 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: Extract > jar:file:/media/ephemeral1/yarn/local/filecache/20/hadoop-yarn-common-2.6.0.jar!/webapps/ > to /tmp/Jetty_0_0_0_0_34881_webapps____.97xzaz/webapp > 2015-10-05 04:43:39,288 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: NO JSP > Support for /, did not find org.apache.jasper.servlet.JspServlet > 2015-10-05 04:43:39,298 FATAL [HistoryEventHandlingThread] > yarn.YarnUncaughtExceptionHandler: Thread > Thread[HistoryEventHandlingThread,5,main] threw an Error. Shutting down > now... > java.lang.NoSuchMethodError: > org.codehaus.jackson.map.ObjectMapper.setSerializationInclusion(Lorg/codehaus/jackson/map/annotate/JsonSerialize$Inclusion;)Lorg/codehaus/jackson/map/ObjectMapper; > at > org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:59) > at > org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.locateMapper(YarnJacksonJaxbJsonProvider.java:52) > at > org.codehaus.jackson.jaxrs.JacksonJsonProvider.writeTo(JacksonJsonProvider.java:501) > at > com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300) > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:204) > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:226) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:162) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:237) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:472) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:321) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301) > at > org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:342) > at > org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:52) > at > org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:186) > at java.lang.Thread.run(Thread.java:745) > 2015-10-05 04:43:39,303 INFO [HistoryEventHandlingThread] util.ExitUtil: > Exiting with status -1 > 2015-10-05 04:43:39,307 INFO [Thread-2] app.DAGAppMaster: > DAGAppMasterShutdownHook invoked > 2015-10-05 04:43:39,307 INFO [Thread-2] app.DAGAppMaster: DAGAppMaster > received a signal. Signaling TaskScheduler > 2015-10-05 04:43:39,307 INFO [Thread-2] rm.TaskSchedulerEventHandler: > TaskScheduler notified that iSignalled was : true > 2015-10-05 04:43:39,397 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: Started > [email protected]:34881 > 2015-10-05 04:43:39,397 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] webapp.WebApps: Web > app / started at 34881 > 2015-10-05 04:43:39,608 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] webapp.WebApps: > Registered webapp guice modules > 2015-10-05 04:43:39,610 INFO > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] web.WebUIService: > Instantiated WebUIService at http://ip-10-149-127-154.ec2.internal:34881/ui/ > 2015-10-05 <http://ip-10-149-127-154.ec2.internal:34881/ui/2015-10-05> > 04:43:39,645 INFO > [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] > rm.YarnTaskSchedulerService: TaskScheduler initialized with configuration: > maxRMHeartbeatInterval: 1000, containerReuseEnabled: true, reuseRackLocal: > true, reuseNonLocal: false, localitySchedulingDelay: 250, > preemptionPercentage: 10, numHeartbeatsBetweenPreemptions: 3, > idleContainerMinTimeout: 5000, idleContainerMaxTimeout: 10000, > sessionMinHeldContainers: 0 > 2015-10-05 04:43:39,719 INFO > [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] > client.RMProxy: Connecting to ResourceManager at > ec2-54-158-112-249.compute-1.amazonaws.com/10.157.14.248:8030 > 2015-10-05 > <http://ec2-54-158-112-249.compute-1.amazonaws.com/10.157.14.248:80302015-10-05> > 04:43:39,812 INFO [main] history.HistoryEventHandler: > [HISTORY][DAG:N/A][Event:AM_STARTED]: > appAttemptId=appattempt_1444019975627_0001_000001, startTime=1444020219811 > 2015-10-05 04:43:39,812 INFO [main] app.DAGAppMaster: In Session mode. > Waiting for DAG over RPC > 2015-10-05 04:43:39,813 INFO [Thread-2] history.HistoryEventHandler: Stopping > HistoryEventHandler > 2015-10-05 04:43:39,814 INFO [Thread-2] recovery.RecoveryService: Stopping > RecoveryService > 2015-10-05 04:43:39,814 INFO [Thread-2] recovery.RecoveryService: Handle the > remaining events in queue, queue size=0 > 2015-10-05 04:43:39,814 INFO [RecoveryEventHandlingThread] > recovery.RecoveryService: EventQueue take interrupted. Returning > 2015-10-05 04:43:39,814 INFO [Thread-2] ats.ATSHistoryLoggingService: > Stopping ATSService, eventQueueBacklog=1 > 2015-10-05 04:43:39,814 WARN [Thread-2] ats.ATSHistoryLoggingService: > ATSService being stopped, eventQueueBacklog=1, maxTimeLeftToFlush=-1, > waitForever=true > > > On 5 October 2015 at 10:48, Gopal Vijayaraghavan <[email protected]> > wrote: > >> >> > The failures are always intermittent. Any idea why this happens? >> >> First up you should try 0.7.1, because of TEZ-2663. >> >> > Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has >> >already shutdown. Application application_1444019975627_0001 failed 2 >> >times due to AM Container for appattempt_1444019975627_0001_000002 exited >> >with exitCode: 255 >> >> Can you say what is printed in the AppMaster logs? >> >> I've seen this occasionally happen due to bad setup of cluster uid-limits. >> >> The ambari sets that up in a file named >> >> /etc/security/limits.d/yarn.conf (yarn.conf.j2) >> >> >> Check for that file. Otherwise as shuffle handler spawns threads, the >> container launchers will start to intermittently fail (the default is 1024 >> threads per-user, yarn.conf ups this to 65,000 threads). >> >> Cheers, >> Gopal >> >> >> >
