[jira] [Commented] (YARN-2223) NPE on ResourceManager recover

2022-05-17 Thread tony ke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538005#comment-17538005
 ] 

tony ke commented on YARN-2223:
---

we have fixed a few similar NPE on RM recovery problem recently-would u 
shell the issue url for me?[~jianhe] 

> NPE on ResourceManager recover
> --
>
> Key: YARN-2223
> URL: https://issues.apache.org/jira/browse/YARN-2223
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.1
> Environment: JDK 8u5
>Reporter: Jon Bringhurst
>Priority: Major
>
> I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
> https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).
> Both clusters have the same config (other than hostnames). Both are running 
> on JDK8u5 (I'm not sure if this is a factor here).
> One cluster started up without any errors. The other started up with the 
> following error on the RM:
> {noformat}
> 18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 1 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0001 with 8 attempts and final state = KILLED
> 18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_01 with final state: KILLED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_02 with final state: FAILED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_03 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_04 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_05 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_06 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_07 with final state: FAILED
> 18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_08 with final state: FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_01 State change from NEW to KILLED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_02 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_03 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_04 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_05 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_06 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_07 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_08 State change from NEW to FAILED
> 18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State 
> change from NEW to KILLED
> 18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 2 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0002 with 8 attempts and final state = KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_01 with final state: KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_02 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_03 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_04 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_05 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_06 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_07 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_08 with final state: FAILED
> 18:33:45,490  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0002_01 State change from NEW to KILLED
> 18:33:45,490  INFO RMAppAttemptIm

[jira] [Commented] (YARN-2223) NPE on ResourceManager recover

2015-05-01 Thread Jon Bringhurst (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523853#comment-14523853
 ] 

Jon Bringhurst commented on YARN-2223:
--

Hey [~jianhe], that sounds good to me -- I haven't seen this problem in a long 
time. We're running 2.6.0 now.

> NPE on ResourceManager recover
> --
>
> Key: YARN-2223
> URL: https://issues.apache.org/jira/browse/YARN-2223
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.1
> Environment: JDK 8u5
>Reporter: Jon Bringhurst
>
> I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
> https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).
> Both clusters have the same config (other than hostnames). Both are running 
> on JDK8u5 (I'm not sure if this is a factor here).
> One cluster started up without any errors. The other started up with the 
> following error on the RM:
> {noformat}
> 18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 1 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0001 with 8 attempts and final state = KILLED
> 18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_01 with final state: KILLED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_02 with final state: FAILED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_03 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_04 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_05 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_06 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_07 with final state: FAILED
> 18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_08 with final state: FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_01 State change from NEW to KILLED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_02 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_03 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_04 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_05 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_06 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_07 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_08 State change from NEW to FAILED
> 18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State 
> change from NEW to KILLED
> 18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 2 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0002 with 8 attempts and final state = KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_01 with final state: KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_02 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_03 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_04 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_05 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_06 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_07 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_08 with final state: FAILED
> 18:33:45,490  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0002_01 State change from NEW to KILLED
> 18:33:45,490  INFO RMAppAttemptImpl:659 - 
> appattem

[jira] [Commented] (YARN-2223) NPE on ResourceManager recover

2015-05-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523840#comment-14523840
 ] 

Jian He commented on YARN-2223:
---

we have fixed a few similar NPE on RM recovery problem recently. Probably this 
has been fixed in one of them. 
I'm closing this for now.  [~jonbringhurst], please feel free to reopen this if 
you still see this problem in latest build.

> NPE on ResourceManager recover
> --
>
> Key: YARN-2223
> URL: https://issues.apache.org/jira/browse/YARN-2223
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.1
> Environment: JDK 8u5
>Reporter: Jon Bringhurst
>
> I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
> https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).
> Both clusters have the same config (other than hostnames). Both are running 
> on JDK8u5 (I'm not sure if this is a factor here).
> One cluster started up without any errors. The other started up with the 
> following error on the RM:
> {noformat}
> 18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 1 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0001 with 8 attempts and final state = KILLED
> 18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_01 with final state: KILLED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_02 with final state: FAILED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_03 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_04 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_05 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_06 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_07 with final state: FAILED
> 18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_08 with final state: FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_01 State change from NEW to KILLED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_02 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_03 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_04 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_05 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_06 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_07 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_08 State change from NEW to FAILED
> 18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State 
> change from NEW to KILLED
> 18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 2 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0002 with 8 attempts and final state = KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_01 with final state: KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_02 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_03 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_04 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_05 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_06 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_07 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_08 with final state: FAILED
> 18:33:45,490  INFO RMAppAttemptImpl:659 - 
> appattempt_1

[jira] [Commented] (YARN-2223) NPE on ResourceManager recover

2014-06-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047017#comment-14047017
 ] 

Jian He commented on YARN-2223:
---

looks like the some attempt data is missing . Can you find out the list of 
attempt files are under the state-store directory for 
application_1398453545406_0001 ? 

> NPE on ResourceManager recover
> --
>
> Key: YARN-2223
> URL: https://issues.apache.org/jira/browse/YARN-2223
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.1
> Environment: JDK 8u5
>Reporter: Jon Bringhurst
>
> I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
> https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).
> Both clusters have the same config (other than hostnames). Both are running 
> on JDK8u5 (I'm not sure if this is a factor here).
> One cluster started up without any errors. The other started up with the 
> following error on the RM:
> {noformat}
> 18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 1 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0001 with 8 attempts and final state = KILLED
> 18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_01 with final state: KILLED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_02 with final state: FAILED
> 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_03 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_04 with final state: FAILED
> 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_05 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_06 with final state: FAILED
> 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_07 with final state: FAILED
> 18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0001_08 with final state: FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_01 State change from NEW to KILLED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_02 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_03 State change from NEW to FAILED
> 18:33:45,482  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_04 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_05 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_06 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_07 State change from NEW to FAILED
> 18:33:45,483  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0001_08 State change from NEW to FAILED
> 18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State 
> change from NEW to KILLED
> 18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
> application: 2 is invalid, because it is out of the range [1, 50]. Use the 
> global max attempts instead.
> 18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
> application_1398450350082_0002 with 8 attempts and final state = KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_01 with final state: KILLED
> 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_02 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_03 with final state: FAILED
> 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_04 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_05 with final state: FAILED
> 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_06 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_07 with final state: FAILED
> 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
> appattempt_1398450350082_0002_08 with final state: FAILED
> 18:33:45,490  INFO RMAppAttemptImpl:659 - 
> appattempt_1398450350082_0002_01 State change from NEW to KILLED
> 18:33:45,490