[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-12-14 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246362#comment-14246362
 ] 

Rohith commented on YARN-2025:
--

Hi [~jianhe] , I encoutered same issue even after fix YARN-2834. And I found 
the scenario for the cause of issue i.e YARN-2340. We shall discuss more on 
yarn-2340.

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-11-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223138#comment-14223138
 ] 

Rohith commented on YARN-2025:
--

Impact from this is both RM's are in standby and not able to recover at all.

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-11-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223136#comment-14223136
 ] 

Rohith commented on YARN-2025:
--

I ran into weird scenario where I got the NPE in 
{{CapacityScheduler.addApplicationAttempt}} in a different manner. I could able 
to get some informationf from the logs but not fully since log were rolled out.

Application final state is FAILED but ApplicationAttempt final state is 
null. This looks very strange that how can RMApp-FAILED but 
RMAppAttempt-null..?
Extracted log from RM is below. Because of this scenario, application recovery 
throw NPE since RMAppAttempt tries to add attempt to scheduler but application 
details are not added to schedulers.
{noformat}
2014-11-24 23:53:32,608 | INFO  | main-EventThread | Recovering app: 
application_1416805604019_0038 with 1 attempts and final state = FAILED | 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:700)
2014-11-24 23:53:32,609 | INFO  | main-EventThread | Recovering attempt: 
appattempt_1416805604019_0038_01 with final state: null | 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:735)
{noformat}

NPE trace as follows.
{noformat}
2014-11-24 23:53:32,610 | ERROR | main-EventThread | Failed to load/recover 
state | 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:527)
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:607)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:941)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:97)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:963)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:931)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:698)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:803)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:95)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:825)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:808)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:681)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:335)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1148)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:523)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:927)
{noformat}

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: 

[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223142#comment-14223142
 ] 

Hadoop QA commented on YARN-2025:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643596/YARN-2025.1.patch
  against trunk revision 555fa2d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5916//console

This message is automatically generated.

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-11-24 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223993#comment-14223993
 ] 

Tsuyoshi OZAWA commented on YARN-2025:
--

Thanks for your point, [~rohithsharma]. I'll take a look.

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-11-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224028#comment-14224028
 ] 

Jian He commented on YARN-2025:
---

bq. This looks very strange that how can RMApp-FAILED but RMAppAttempt-null..?
YARN-2834 should fix this. [~rohithsharma], Are you running a build with the 
patch or without ?

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-11-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224109#comment-14224109
 ] 

Rohith commented on YARN-2025:
--

Thanks Jian He for pointiing out the issue. I do not have applied with this 
patch, I will test by applying this patch.

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-05-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990926#comment-13990926
 ] 

Jian He commented on YARN-2025:
---

Hi [~ozawa],  I think attempt will not be added to scheduler if 
AppKilledTransition is called.  Basically, addApplicationAttempt() will not be 
invoked if AppKilledTransition is called. 
There's a separate KillAttemptTransition which waits until the attempt is 
removed (if there's an attempt).

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2025) Possible NPE in schedulers#addApplicationAttempt()

2014-05-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13991014#comment-13991014
 ] 

Hadoop QA commented on YARN-2025:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643596/YARN-2025.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3704//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3704//console

This message is automatically generated.

 Possible NPE in schedulers#addApplicationAttempt()
 --

 Key: YARN-2025
 URL: https://issues.apache.org/jira/browse/YARN-2025
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2025.1.patch


 In FifoScheduler/FairScheduler/CapacityScheduler#addApplicationAttempt(), we 
 don't check whether {{application}} is null. This can cause NPE in following 
 sequences: addApplication() - doneApplication() (e.g. AppKilledTransition) 
 - addApplicationAttempt().
 {code}
 SchedulerApplication application =
 applications.get(applicationAttemptId.getApplicationId());
 String user = application.getUser();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)