[ 
https://issues.apache.org/jira/browse/YARN-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124450#comment-17124450
 ] 

Eric Badger commented on YARN-10300:
------------------------------------

bq. Eric Badger can you explain under what circumstances 
attempt.getMasterContainer().getNodeId().getHost() will succeed where 
attempt.getHost() fails?

{{attempt.getHost()}} grabs {{host}} within RMAppAttemptImpl.java. This is set 
during {{AMRegisteredTransition}}, so it will be "N/A" until the AM registers 
with the NM during the first heartbeat. 

{{attempt.getMasterContainer()}} grabs {{masterContainer}} within 
RMAppAttemptImpl.java. This is set during {{AMContainerAllocatedTransition}}. 

So from the time between when the container is allocated until the time of the 
first AM heartbeat, {{masterContainer}} will be set, but {{host}} will be "N/A"

bq. Also, do we need to add some null checks for these? getMasterContainer(), 
getNodeId(), getHost()?
Probably wouldn't hurt. 

bq. Note that the attempt.getHost() defaults to "N/A" before it is set - what 
do we get if NodeID().getHost() isn't valid yet? Is that even a possibility?
I don't know if it's possible to be invalid at the start or not. The 
{{Container}} is going to be created via {{newInstance}}, which requires a 
{{NodeId}} as a parameter. But that could potentially be sent in as null. But I 
think it will either be the correct nodeId or will be null, which I can 
interpret as "N/A". There are so many places that containers are instantiated 
in the scheduler that it'd be pretty tough to see if all of the cases have the 
nodeID set initially. 

I can add in the null checks and default the string to "N/A" if any of them 
don't exist. 

> appMasterHost not set in RM ApplicationSummary when AM fails before first 
> heartbeat
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-10300
>                 URL: https://issues.apache.org/jira/browse/YARN-10300
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>         Attachments: YARN-10300.001.patch
>
>
> {noformat}
> 2020-05-23 14:09:10,086 INFO resourcemanager.RMAppManager$ApplicationSummary: 
> appId=application_1586003420099_12444961,name=job_name,user=username,queue=queuename,state=FAILED,trackingUrl=https
>  
> ://cluster:port/applicationhistory/app/application_1586003420099_12444961,appMasterHost=N/A,startTime=1590241207309,finishTime=1590242950085,finalStatus=FAILED,memorySeconds=13750,vcoreSeconds=67,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\,
>  vCores:0>,applicationType=MAPREDUCE
> {noformat}
> {{appMasterHost=N/A}} should have the AM hostname instead of N/A



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to