[ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027979#comment-14027979
 ] 

Jason Lowe commented on YARN-2147:
----------------------------------

For example, here's a sample log from a client submitting a job that failed:

{noformat}
2014-05-14 10:36:16,111 [JobControl] INFO 
org.apache.hadoop.mapred.ResourceMgrDelegate  - Submitted application 
application_1394826486018_9924515 to ResourceManager at xx/xx:xx
2014-05-14 10:36:16,116 [JobControl] INFO 
org.apache.hadoop.mapreduce.JobSubmitter  - Cleaning up the staging area 
/user/xx/.staging/job_1394826486018_9924515
2014-05-14 10:36:16,117 [JobControl] ERROR 
org.apache.hadoop.security.UserGroupInformation  - PriviledgedActionException 
as:xx (auth:SIMPLE) cause:java.io.IOException: Failed to run job : Read timed 
out
2014-05-14 10:36:16,118 [JobControl] INFO 
org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob  - xx got an error 
while submitting 
java.io.IOException: Failed to run job : Read timed out
                at 
org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
                at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:410)
                at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
                at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
                at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
                at 
org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:601)
                at 
org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
                at 
org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
{noformat}

All the user sees is a read timeout but no details as to where it was 
connecting or what service was involved.  Was this a timeout connecting to the 
RM?  A timeout on the RM side?  Something else entirely?  Hard to tell from 
just "Read timed out".  Looking at the exception logged at the RM side the full 
stacktrace shows that it was timing out trying to grab a delegation token from 
a remote server for webhdfs.  Those kinds of details need to be conveyed back 
to the client, either via the full stacktrace from the RM exception or via a 
more informative exception message when delegation token renewal fails during 
app submission.

> client lacks delegation token exception details when application submit fails
> -----------------------------------------------------------------------------
>
>                 Key: YARN-2147
>                 URL: https://issues.apache.org/jira/browse/YARN-2147
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Jason Lowe
>            Priority: Minor
>
> When an client submits an application and the delegation token process fails 
> the client can lack critical details needed to understand the nature of the 
> error.  Only the message of the error exception is conveyed to the client, 
> which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to