[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()
[ https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246362#comment-14246362 ] Rohith commented on YARN-2025: -- Hi [~jianhe] , I encoutered same issue even after fix YARN-2834. And I found the scenario for the cause of issue i.e YARN-2340. We shall discuss more on yarn-2340. Possible NPE in schedulers#addApplicationAttempt() -- Key: YARN-2025 URL: https://issues.apache.org/jira/browse/YARN-2025 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2025.1.patch In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we don't check whether {{application}} is null. This can cause NPE in following sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) - addApplicationAttempt(). {code} SchedulerApplication application = applications.get(applicationAttemptId.getApplicationId()); String user = application.getUser(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()
[ https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223138#comment-14223138 ] Rohith commented on YARN-2025: -- Impact from this is both RM's are in standby and not able to recover at all. Possible NPE in schedulers#addApplicationAttempt() -- Key: YARN-2025 URL: https://issues.apache.org/jira/browse/YARN-2025 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2025.1.patch In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we don't check whether {{application}} is null. This can cause NPE in following sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) - addApplicationAttempt(). {code} SchedulerApplication application = applications.get(applicationAttemptId.getApplicationId()); String user = application.getUser(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()
[ https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223136#comment-14223136 ] Rohith commented on YARN-2025: -- I ran into weird scenario where I got the NPE in {{CapacityScheduler.addApplicationAttempt}} in a different manner. I could able to get some informationf from the logs but not fully since log were rolled out. Application final state is FAILED but ApplicationAttempt final state is null. This looks very strange that how can RMApp-FAILED but RMAppAttempt-null..? Extracted log from RM is below. Because of this scenario, application recovery throw NPE since RMAppAttempt tries to add attempt to scheduler but application details are not added to schedulers. {noformat} 2014-11-24 23:53:32,608 | INFO | main-EventThread | Recovering app: application_1416805604019_0038 with 1 attempts and final state = FAILED | org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:700) 2014-11-24 23:53:32,609 | INFO | main-EventThread | Recovering attempt: appattempt_1416805604019_0038_01 with final state: null | org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:735) {noformat} NPE trace as follows. {noformat} 2014-11-24 23:53:32,610 | ERROR | main-EventThread | Failed to load/recover state | org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:527) java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:607) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:941) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:97) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:963) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:931) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:698) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:803) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:95) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:825) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:808) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:681) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:335) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1148) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:523) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:927) {noformat} Possible NPE in schedulers#addApplicationAttempt() -- Key: YARN-2025 URL: https://issues.apache.org/jira/browse/YARN-2025 Project: Hadoop YARN Issue Type: Bug Reporter:
[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()
[ https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223142#comment-14223142 ] Hadoop QA commented on YARN-2025: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643596/YARN-2025.1.patch against trunk revision 555fa2d. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5916//console This message is automatically generated. Possible NPE in schedulers#addApplicationAttempt() -- Key: YARN-2025 URL: https://issues.apache.org/jira/browse/YARN-2025 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2025.1.patch In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we don't check whether {{application}} is null. This can cause NPE in following sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) - addApplicationAttempt(). {code} SchedulerApplication application = applications.get(applicationAttemptId.getApplicationId()); String user = application.getUser(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()
[ https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223993#comment-14223993 ] Tsuyoshi OZAWA commented on YARN-2025: -- Thanks for your point, [~rohithsharma]. I'll take a look. Possible NPE in schedulers#addApplicationAttempt() -- Key: YARN-2025 URL: https://issues.apache.org/jira/browse/YARN-2025 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2025.1.patch In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we don't check whether {{application}} is null. This can cause NPE in following sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) - addApplicationAttempt(). {code} SchedulerApplication application = applications.get(applicationAttemptId.getApplicationId()); String user = application.getUser(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()
[ https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224028#comment-14224028 ] Jian He commented on YARN-2025: --- bq. This looks very strange that how can RMApp-FAILED but RMAppAttempt-null..? YARN-2834 should fix this. [~rohithsharma], Are you running a build with the patch or without ? Possible NPE in schedulers#addApplicationAttempt() -- Key: YARN-2025 URL: https://issues.apache.org/jira/browse/YARN-2025 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2025.1.patch In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we don't check whether {{application}} is null. This can cause NPE in following sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) - addApplicationAttempt(). {code} SchedulerApplication application = applications.get(applicationAttemptId.getApplicationId()); String user = application.getUser(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()
[ https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224109#comment-14224109 ] Rohith commented on YARN-2025: -- Thanks Jian He for pointiing out the issue. I do not have applied with this patch, I will test by applying this patch. Possible NPE in schedulers#addApplicationAttempt() -- Key: YARN-2025 URL: https://issues.apache.org/jira/browse/YARN-2025 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2025.1.patch In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we don't check whether {{application}} is null. This can cause NPE in following sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) - addApplicationAttempt(). {code} SchedulerApplication application = applications.get(applicationAttemptId.getApplicationId()); String user = application.getUser(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()
[ https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990926#comment-13990926 ] Jian He commented on YARN-2025: --- Hi [~ozawa], I think attempt will not be added to scheduler if AppKilledTransition is called. Basically, addApplicationAttempt() will not be invoked if AppKilledTransition is called. There's a separate KillAttemptTransition which waits until the attempt is removed (if there's an attempt). Possible NPE in schedulers#addApplicationAttempt() -- Key: YARN-2025 URL: https://issues.apache.org/jira/browse/YARN-2025 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2025.1.patch In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we don't check whether {{application}} is null. This can cause NPE in following sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) - addApplicationAttempt(). {code} SchedulerApplication application = applications.get(applicationAttemptId.getApplicationId()); String user = application.getUser(); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()
[ https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13991014#comment-13991014 ] Hadoop QA commented on YARN-2025: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643596/YARN-2025.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3704//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3704//console This message is automatically generated. Possible NPE in schedulers#addApplicationAttempt() -- Key: YARN-2025 URL: https://issues.apache.org/jira/browse/YARN-2025 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2025.1.patch In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we don't check whether {{application}} is null. This can cause NPE in following sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) - addApplicationAttempt(). {code} SchedulerApplication application = applications.get(applicationAttemptId.getApplicationId()); String user = application.getUser(); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)