Is there any other version of Jackson on your classpath on the cluster 
machines? What does running ‘hadoop classpath’ show? Are there any other 
Jackson inclusions in there?

Bikas

From: Rajat Jain [mailto:[email protected]]
Sent: Tuesday, October 06, 2015 12:18 AM
To: Jonathan Eagles <[email protected]>
Cc: [email protected]; Gopal Vijayaraghavan <[email protected]>
Subject: Re: Flaky queries with 'TezSession has already shut down'

Folks, any idea on the failure?

On 5 October 2015 at 11:46, Rajat Jain 
<[email protected]<mailto:[email protected]>> wrote:
YARN ships with these jars:

/usr/lib/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/jackson-xc-1.9.13.jar
/usr/lib/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/jackson-jaxrs-1.9.13.jar
/usr/lib/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/jackson-core-asl-1.9.13.jar
/usr/lib/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/jackson-mapper-asl-1.9.13.jar
/usr/lib/hadoop/share/hadoop/yarn/lib/jackson-annotations-2.2.3.jar
/usr/lib/hadoop/share/hadoop/yarn/lib/jackson-databind-2.2.3.jar
/usr/lib/hadoop/share/hadoop/yarn/lib/jackson-core-2.2.3.jar
/usr/lib/hadoop/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/jackson-xc-1.9.13.jar
/usr/lib/hadoop/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/jackson-jaxrs-1.9.13.jar
/usr/lib/hadoop/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/jackson-core-asl-1.9.13.jar
/usr/lib/hadoop/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/jackson-mapper-asl-1.9.13.jar
/usr/lib/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar
/usr/lib/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar
/usr/lib/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar
/usr/lib/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar


Tez ships with these jars:

/usr/lib/tez/lib/jackson-xc-1.9.13.jar
/usr/lib/tez/lib/jackson-annotations-2.2.3.jar
/usr/lib/tez/lib/jackson-jaxrs-1.9.13.jar
/usr/lib/tez/lib/jackson-core-asl-1.9.13.jar
/usr/lib/tez/lib/jackson-databind-2.2.3.jar
/usr/lib/tez/lib/jackson-core-2.2.3.jar
/usr/lib/tez/lib/jackson-mapper-asl-1.9.13.jar




On 5 October 2015 at 11:21, Jonathan Eagles 
<[email protected]<mailto:[email protected]>> wrote:
What's happening here is that the Tez AM has a runtime dependency on jackson 
libraries due to ATSHistoryLoggingService (due to TimelineClientImpl in YARN). 
It looks like your application is shipping a different version of jackson that 
YARN was expecting. Can you check what two versions of jackson library you are 
running with?

Jon

On Mon, Oct 5, 2015 at 12:30 AM, Rajat Jain 
<[email protected]<mailto:[email protected]>> wrote:
Here is the AM log. My setup doesn't use Ambari. Can you attach the 
/etc/security/limits.d/yarn.conf file?


2015-10-05 04:43:36,769 INFO [main] Configuration.deprecation: 
fs.default.name<http://fs.default.name> is deprecated. Instead, use fs.defaultFS

2015-10-05 04:43:36,972 WARN [main] util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

2015-10-05 04:43:37,077 INFO [main] app.DAGAppMaster: Using Standard Socket 
Factory for communication

2015-10-05 04:43:37,269 INFO [main] app.DAGAppMaster: Created DAGAppMaster for 
application appattempt_1444019975627_0001_000001, versionInfo=[ 
component=tez-dag, version=0.7.0, 
revision=e1c833e8be209145149cc6d3e1a01f8fa6e9b85f, 
SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, 
buildTime=2015-09-29T13:26:49Z ]

2015-10-05 04:43:37,292 INFO [main] app.DAGAppMaster: Comparing client version 
with AM version, clientVersion=0.7.0, AMVersion=0.7.0

2015-10-05 04:43:37,947 WARN [main] shortcircuit.DomainSocketFactory: The 
short-circuit local reads feature cannot be used because libhadoop cannot be 
loaded.

2015-10-05 04:43:38,086 INFO [main] app.DAGAppMaster: Adding session token to 
jobTokenSecretManager for application

2015-10-05 04:43:38,090 INFO [main] common.AsyncDispatcher: Registering class 
org.apache.tez.dag.app.rm.container.AMContainerEventType for class 
org.apache.tez.dag.app.rm.container.AMContainerMap

2015-10-05 04:43:38,092 INFO [main] common.AsyncDispatcher: Registering class 
org.apache.tez.dag.app.rm.node.AMNodeEventType for class 
org.apache.tez.dag.app.rm.node.AMNodeTracker

2015-10-05 04:43:38,093 INFO [main] common.AsyncDispatcher: Registering class 
org.apache.tez.dag.app.dag.event.DAGAppMasterEventType for class 
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler

2015-10-05 04:43:38,094 INFO [main] common.AsyncDispatcher: Registering class 
org.apache.tez.dag.app.dag.event.DAGEventType for class 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher

2015-10-05 04:43:38,094 INFO [main] common.AsyncDispatcher: Registering class 
org.apache.tez.dag.app.dag.event.VertexEventType for class 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher

2015-10-05 04:43:38,095 INFO [main] common.AsyncDispatcher: Registering class 
org.apache.tez.dag.app.dag.event.TaskEventType for class 
org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher

2015-10-05 04:43:38,096 INFO [main] common.AsyncDispatcher: Registering class 
org.apache.tez.dag.app.dag.event.TaskAttemptEventType for class 
org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher

2015-10-05 04:43:38,097 INFO [main] common.AsyncDispatcher: Registering class 
org.apache.tez.dag.app.dag.event.SpeculatorEventType for independent dispatch 
using: class org.apache.tez.dag.app.DAGAppMaster$SpeculatorEventHandler

2015-10-05 04:43:38,097 INFO [main] common.AsyncDispatcher: Registering class 
org.apache.tez.dag.app.dag.event.SpeculatorEventType for class 
org.apache.tez.dag.app.DAGAppMaster$SpeculatorEventHandler

2015-10-05 04:43:38,111 INFO [main] common.AsyncDispatcher: Registering class 
org.apache.tez.dag.app.rm.AMSchedulerEventType for class 
org.apache.tez.dag.app.rm.TaskSchedulerEventHandler

2015-10-05 04:43:38,113 INFO [main] common.AsyncDispatcher: Registering class 
org.apache.tez.dag.app.rm.NMCommunicatorEventType for class 
org.apache.tez.dag.app.launcher.ContainerLauncherImpl

2015-10-05 04:43:38,215 INFO [main] node.AMNodeTracker: blacklistDisablePercent 
is 33, blacklistingEnabled: true, maxTaskFailuresPerNode: 10

2015-10-05 04:43:38,217 INFO [main] web.WebUIService: Tez UI History URL: 
http://ec2-54-158-112-249.compute-1.amazonaws.com:10500/#/tez-app/application_1444019975627_0001<http://ec2-54-158-112-249.compute-1.amazonaws.com:10500/#/tez-app/application_1444019975627_00012015-10-05>

2015-10-05<http://ec2-54-158-112-249.compute-1.amazonaws.com:10500/#/tez-app/application_1444019975627_00012015-10-05>
 04:43:38,219 INFO [main] launcher.ContainerLauncherImpl: Upper limit on the 
thread pool size is 500

2015-10-05 04:43:38,219 INFO [main] history.HistoryEventHandler: Initializing 
HistoryEventHandler

2015-10-05 04:43:38,224 INFO [main] ats.ATSHistoryLoggingService: Initializing 
ATSService

2015-10-05 04:43:38,806 INFO [main] impl.TimelineClientImpl: Timeline service 
address: 
http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/<http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/2015-10-05>

2015-10-05<http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/2015-10-05>
 04:43:38,806 INFO [main] ats.ATSHistoryLoggingService: Using 
org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage 
Timeline ACLs

2015-10-05 04:43:38,971 INFO [main] impl.TimelineClientImpl: Timeline service 
address: 
http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/<http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/2015-10-05>

2015-10-05<http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/2015-10-05>
 04:43:38,971 INFO [main] recovery.RecoveryService: Initializing RecoveryService

2015-10-05 04:43:38,974 INFO [main] history.HistoryEventHandler: 
[HISTORY][DAG:N/A][Event:APP_LAUNCHED]: 
applicationId=application_1444019975627_0001, appSubmitTime=1444020210133, 
launchTime=1444020217266

2015-10-05 04:43:38,975 INFO [main] history.HistoryEventHandler: 
[HISTORY][DAG:N/A][Event:AM_LAUNCHED]: 
appAttemptId=appattempt_1444019975627_0001_000001, appSubmitTime=1444020210133, 
launchTime=1444020217266

2015-10-05 04:43:38,982 INFO 
[ServiceThread:org.apache.tez.dag.history.HistoryEventHandler] 
history.HistoryEventHandler: Starting HistoryEventHandler

2015-10-05 04:43:38,982 INFO 
[ServiceThread:org.apache.tez.dag.history.HistoryEventHandler] 
ats.ATSHistoryLoggingService: Starting ATSService

2015-10-05 04:43:38,985 INFO 
[ServiceThread:org.apache.tez.dag.app.launcher.ContainerLauncherImpl] 
impl.ContainerManagementProtocolProxy: 
yarn.client.max-cached-nodemanagers-proxies : 0

2015-10-05 04:43:38,990 INFO [ServiceThread:DAGClientRPCServer] 
ipc.CallQueueManager: Using callQueue class 
java.util.concurrent.LinkedBlockingQueue

2015-10-05 04:43:38,990 INFO 
[ServiceThread:org.apache.tez.dag.app.TaskAttemptListenerImpTezDag] 
ipc.CallQueueManager: Using callQueue class 
java.util.concurrent.LinkedBlockingQueue

2015-10-05 04:43:38,997 INFO 
[ServiceThread:org.apache.tez.dag.history.HistoryEventHandler] 
recovery.RecoveryService: Starting RecoveryService

2015-10-05 04:43:39,001 INFO [Socket Reader #1 for port 47120] ipc.Server: 
Starting Socket Reader #1 for port 47120

2015-10-05 04:43:39,001 INFO [Socket Reader #1 for port 47029] ipc.Server: 
Starting Socket Reader #1 for port 47029

2015-10-05 04:43:39,027 INFO [IPC Server Responder] ipc.Server: IPC Server 
Responder: starting

2015-10-05 04:43:39,033 INFO [IPC Server listener on 47029] ipc.Server: IPC 
Server listener on 47029: starting

2015-10-05 04:43:39,035 INFO [IPC Server listener on 47120] ipc.Server: IPC 
Server listener on 47120: starting

2015-10-05 04:43:39,035 INFO [IPC Server Responder] ipc.Server: IPC Server 
Responder: starting

2015-10-05 04:43:39,035 INFO [ServiceThread:DAGClientRPCServer] 
client.DAGClientServer: Instantiated DAGClientRPCServer at 
ip-10-149-127-154.ec2.internal/10.149.127.154:47029<http://10.149.127.154:47029>

2015-10-05 04:43:39,070 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog

2015-10-05 04:43:39,074 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpRequestLog: 
Http request log for http.requests. is not defined

2015-10-05 04:43:39,083 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: Added 
global filter 'safety' 
(class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)

2015-10-05 04:43:39,087 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: Added 
filter AM_PROXY_FILTER 
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context

2015-10-05 04:43:39,087 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: Added 
filter AM_PROXY_FILTER 
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 
static

2015-10-05 04:43:39,090 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: 
adding path spec: /*

2015-10-05 04:43:39,097 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] http.HttpServer2: Jetty 
bound to port 34881

2015-10-05 04:43:39,098 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: 
jetty-6.1.26

2015-10-05 04:43:39,127 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: Extract 
jar:file:/media/ephemeral1/yarn/local/filecache/20/hadoop-yarn-common-2.6.0.jar!/webapps/
 to /tmp/Jetty_0_0_0_0_34881_webapps____.97xzaz/webapp

2015-10-05 04:43:39,288 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: NO JSP 
Support for /, did not find org.apache.jasper.servlet.JspServlet

2015-10-05 04:43:39,298 FATAL [HistoryEventHandlingThread] 
yarn.YarnUncaughtExceptionHandler: Thread 
Thread[HistoryEventHandlingThread,5,main] threw an Error.  Shutting down now...

java.lang.NoSuchMethodError: 
org.codehaus.jackson.map.ObjectMapper.setSerializationInclusion(Lorg/codehaus/jackson/map/annotate/JsonSerialize$Inclusion;)Lorg/codehaus/jackson/map/ObjectMapper;

  at 
org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:59)

  at 
org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.locateMapper(YarnJacksonJaxbJsonProvider.java:52)

  at 
org.codehaus.jackson.jaxrs.JacksonJsonProvider.writeTo(JacksonJsonProvider.java:501)

  at 
com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300)

  at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:204)

  at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)

  at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:226)

  at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:162)

  at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:237)

  at com.sun.jersey.api.client.Client.handle(Client.java:648)

  at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)

  at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)

  at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)

  at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:472)

  at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:321)

  at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)

  at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:342)

  at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:52)

  at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:186)

  at java.lang.Thread.run(Thread.java:745)

2015-10-05 04:43:39,303 INFO [HistoryEventHandlingThread] util.ExitUtil: 
Exiting with status -1

2015-10-05 04:43:39,307 INFO [Thread-2] app.DAGAppMaster: 
DAGAppMasterShutdownHook invoked

2015-10-05 04:43:39,307 INFO [Thread-2] app.DAGAppMaster: DAGAppMaster received 
a signal. Signaling TaskScheduler

2015-10-05 04:43:39,307 INFO [Thread-2] rm.TaskSchedulerEventHandler: 
TaskScheduler notified that iSignalled was : true

2015-10-05 04:43:39,397 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log: Started 
[email protected]:34881<http://[email protected]:34881>

2015-10-05 04:43:39,397 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] webapp.WebApps: Web app 
/ started at 34881

2015-10-05 04:43:39,608 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] webapp.WebApps: 
Registered webapp guice modules

2015-10-05 04:43:39,610 INFO 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] web.WebUIService: 
Instantiated WebUIService at 
http://ip-10-149-127-154.ec2.internal:34881/ui/<http://ip-10-149-127-154.ec2.internal:34881/ui/2015-10-05>

2015-10-05<http://ip-10-149-127-154.ec2.internal:34881/ui/2015-10-05> 
04:43:39,645 INFO 
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] 
rm.YarnTaskSchedulerService: TaskScheduler initialized with configuration: 
maxRMHeartbeatInterval: 1000, containerReuseEnabled: true, reuseRackLocal: 
true, reuseNonLocal: false, localitySchedulingDelay: 250, preemptionPercentage: 
10, numHeartbeatsBetweenPreemptions: 3, idleContainerMinTimeout: 5000, 
idleContainerMaxTimeout: 10000, sessionMinHeldContainers: 0

2015-10-05 04:43:39,719 INFO 
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] 
client.RMProxy: Connecting to ResourceManager at 
ec2-54-158-112-249.compute-1.amazonaws.com/10.157.14.248:8030<http://ec2-54-158-112-249.compute-1.amazonaws.com/10.157.14.248:80302015-10-05>

2015-10-05<http://ec2-54-158-112-249.compute-1.amazonaws.com/10.157.14.248:80302015-10-05>
 04:43:39,812 INFO [main] history.HistoryEventHandler: 
[HISTORY][DAG:N/A][Event:AM_STARTED]: 
appAttemptId=appattempt_1444019975627_0001_000001, startTime=1444020219811

2015-10-05 04:43:39,812 INFO [main] app.DAGAppMaster: In Session mode. Waiting 
for DAG over RPC

2015-10-05 04:43:39,813 INFO [Thread-2] history.HistoryEventHandler: Stopping 
HistoryEventHandler

2015-10-05 04:43:39,814 INFO [Thread-2] recovery.RecoveryService: Stopping 
RecoveryService

2015-10-05 04:43:39,814 INFO [Thread-2] recovery.RecoveryService: Handle the 
remaining events in queue, queue size=0

2015-10-05 04:43:39,814 INFO [RecoveryEventHandlingThread] 
recovery.RecoveryService: EventQueue take interrupted. Returning

2015-10-05 04:43:39,814 INFO [Thread-2] ats.ATSHistoryLoggingService: Stopping 
ATSService, eventQueueBacklog=1

2015-10-05 04:43:39,814 WARN [Thread-2] ats.ATSHistoryLoggingService: 
ATSService being stopped, eventQueueBacklog=1, maxTimeLeftToFlush=-1, 
waitForever=true

On 5 October 2015 at 10:48, Gopal Vijayaraghavan 
<[email protected]<mailto:[email protected]>> wrote:

> The failures are always intermittent. Any idea why this happens?

First up you should try 0.7.1, because of TEZ-2663.

> Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has
>already shutdown. Application application_1444019975627_0001 failed 2
>times due to AM Container for appattempt_1444019975627_0001_000002 exited
>with exitCode: 255

Can you say what is printed in the AppMaster logs?

I've seen this occasionally happen due to bad setup of cluster uid-limits.

The ambari sets that up in a file named

/etc/security/limits.d/yarn.conf (yarn.conf.j2)


Check for that file. Otherwise as shuffle handler spawns threads, the
container launchers will start to intermittently fail (the default is 1024
threads per-user, yarn.conf ups this to 65,000 threads).

Cheers,
Gopal





Reply via email to