[
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346175#comment-14346175
]
zhihai xu commented on YARN-2893:
---------------------------------
Hi [~vinodkv],
Sporadic job failures are due to the cascading sharing the credentials between
Jobs. Because the Credentials class is not thread-safe, if multiple jobs try to
access the shared credentials, we will have the race condition, which will
cause Sporadic job failures.
The shared credentials is introduced in JobConf constructor: If we create a new
job using JobConf from the old job, these two jobs will share the same
credentials.
{code}
public JobConf(Configuration conf) {
super(conf);
if (conf instanceof JobConf) {
JobConf that = (JobConf)conf;
credentials = that.credentials;
}
checkAndWarnDeprecation();
}
{code}
The credential from JobConf will be passed to YARNRunner#submitJob which will
call createApplicationSubmissionContext to configure Tokens in
ContainerLaunchContext
{code}
DataOutputBuffer dob = new DataOutputBuffer();
ts.writeTokenStorageToStream(dob);
ByteBuffer securityTokens = ByteBuffer.wrap(dob.getData(), 0,
dob.getLength());
ContainerLaunchContext amContainer =
ContainerLaunchContext.newInstance(localResources, environment,
vargsFinal, null, securityTokens, acls);
{code}
It looks like we have two other potential issues in JobConf and Credentials.
I created MAPREDUCE-6269 and HADOOP-11667 for separate discussion.
> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> ------------------------------------------------------------------------------
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.4.0
> Reporter: Gera Shegalov
> Assignee: zhihai xu
> Attachments: YARN-2893.000.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt
> tokens in the AM launch context.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)