[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2016-01-22 Thread Cristian Barca (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112600#comment-15112600
 ] 

Cristian Barca commented on YARN-2559:
--

when submitting Unmanaged Application Masters instead

> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2559.1.patch, YARN-2559.2.patch, YARN-2559.3.patch, 
> YARN-2559.4.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2016-01-22 Thread Cristian Barca (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112596#comment-15112596
 ] 

Cristian Barca commented on YARN-2559:
--

Yes, saw it after a bit of digging. Correct, is supposed to be fixed by YARN 
4452 -- though the title is a bit vague (NPE when submitting unmanaged 
Application). I cannot confirm the fix since the Hadoop version I am working on 
is 2.7.0. The workaround I have in mind to overcome this issue in older 
versions is to just disable the GHS property, 
yarn.resourcemanager.system-metrics-publisher.enabled=false. Any other ideas 
how to do it w/o affecting the whole system, i.e. something more specific to an 
AppMaster?

Will confirm the fix on my next upgrade to 2.8.0.

I've added related links to those 2 guys here as well.

Thanks!

> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2559.1.patch, YARN-2559.2.patch, YARN-2559.3.patch, 
> YARN-2559.4.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2016-01-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112251#comment-15112251
 ] 

Naganarasimha G R commented on YARN-2559:
-

Hi [~cristib],
Can you please share the NPE stack trace ? We just fixed in YARN-4452 and 
YARN-4623 recently in trunk, 2.7.3 and 2.6.4. May be is it the same issue you 
are facing ?



> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2559.1.patch, YARN-2559.2.patch, YARN-2559.3.patch, 
> YARN-2559.4.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2016-01-22 Thread Cristian Barca (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112193#comment-15112193
 ] 

Cristian Barca commented on YARN-2559:
--

This is still not fixed for unmanaged AppMasters.



> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2559.1.patch, YARN-2559.2.patch, YARN-2559.3.patch, 
> YARN-2559.4.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2014-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138955#comment-14138955
 ] 

Hudson commented on YARN-2559:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1875 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1875/])
YARN-2559. Fixed NPE in SystemMetricsPublisher when retrieving 
FinalApplicationStatus. Contributed by Zhijie Shen (jianhe: rev 
ee21b13cbd4654d7181306404174329f12193613)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2559.1.patch, YARN-2559.2.patch, YARN-2559.3.patch, 
> YARN-2559.4.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2014-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138942#comment-14138942
 ] 

Hudson commented on YARN-2559:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1900 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1900/])
YARN-2559. Fixed NPE in SystemMetricsPublisher when retrieving 
FinalApplicationStatus. Contributed by Zhijie Shen (jianhe: rev 
ee21b13cbd4654d7181306404174329f12193613)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* hadoop-yarn-project/CHANGES.txt


> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2559.1.patch, YARN-2559.2.patch, YARN-2559.3.patch, 
> YARN-2559.4.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2014-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138814#comment-14138814
 ] 

Hudson commented on YARN-2559:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #684 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/684/])
YARN-2559. Fixed NPE in SystemMetricsPublisher when retrieving 
FinalApplicationStatus. Contributed by Zhijie Shen (jianhe: rev 
ee21b13cbd4654d7181306404174329f12193613)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java


> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2559.1.patch, YARN-2559.2.patch, YARN-2559.3.patch, 
> YARN-2559.4.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2014-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138401#comment-14138401
 ] 

Hadoop QA commented on YARN-2559:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12669586/YARN-2559.4.patch
  against trunk revision 47e5e19.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5016//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5016//console

This message is automatically generated.

> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Attachments: YARN-2559.1.patch, YARN-2559.2.patch, YARN-2559.3.patch, 
> YARN-2559.4.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2014-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138306#comment-14138306
 ] 

Hadoop QA commented on YARN-2559:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12669569/YARN-2559.3.patch
  against trunk revision 123f20d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5012//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5012//console

This message is automatically generated.

> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Attachments: YARN-2559.1.patch, YARN-2559.2.patch, YARN-2559.3.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2014-09-17 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138162#comment-14138162
 ] 

Jian He commented on YARN-2559:
---

looks good overall,  We may just call RMApp#getFinalApplicationStatus here?
{code}
(appAttempt.getFinalApplicationStatus() == null ?
  RMServerUtils.createFinalApplicationStatus(appState) :
appAttempt.getFinalApplicationStatus()
{code}

> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Attachments: YARN-2559.1.patch, YARN-2559.2.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2014-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138003#comment-14138003
 ] 

Hadoop QA commented on YARN-2559:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12669496/YARN-2559.2.patch
  against trunk revision e3803d0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5002//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5002//console

This message is automatically generated.

> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Attachments: YARN-2559.1.patch, YARN-2559.2.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2014-09-17 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137636#comment-14137636
 ] 

Jian He commented on YARN-2559:
---

To be consistent with the FinalApplicationStatus exposed on RM web UI and CLI,  
we may publish UNDEFINED state as well in case finalStatus is unavailable ?

> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Attachments: YARN-2559.1.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2014-09-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136622#comment-14136622
 ] 

Zhijie Shen commented on YARN-2559:
---

The test failure is not related. File a ticket for it: YARN-2564.

> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Attachments: YARN-2559.1.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2014-09-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136339#comment-14136339
 ] 

Hadoop QA commented on YARN-2559:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12669199/YARN-2559.1.patch
  against trunk revision 56119fe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4978//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4978//console

This message is automatically generated.

> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher
> --
>
> Key: YARN-2559
> URL: https://issues.apache.org/jira/browse/YARN-2559
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Generice History Service is enabled in Timelineserver 
> with 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> So that ResourceManager should Timeline Store for recording application 
> history information 
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Attachments: YARN-2559.1.patch
>
>
> ResourceManager sometime become un-responsive due to NPE in 
> SystemMetricsPublisher



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher

2014-09-16 Thread Karam Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135357#comment-14135357
 ] 

Karam Singh commented on YARN-2559:
---

Using Timeline Store for AHS/GHS
{code}
yarn.timeline-service.enabled=true
yarn.timeline-service.hostname=
yarn.timeline-service.address=:10200
yarn.timeline-service.webapp.address=<:8188
yarn.timeline-service.handler-thread-count=10
yarn.timeline-service.ttl-enable=true
yarn.timeline-service.ttl-ms=60480
yarn.timeline-service.leveldb-timeline-store.path=/tmp/timeline
yarn.timeline-service.generic-application-history.enabled=true
yarn.timeline-service.http-authentication.type=simple
yarn.resourcemanager.system-metrics-publisher.enabled=true
yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size=10
yarn.timeline-service.generic-application-history.store-class='' 
{code}

Restart ATS (TimelineServer)
Restart ResourceManager (RM)
After running some application. ResourceManager became unresponsive due to NPE 
in SystemMetricsPublisher
{code}
2014-09-15 05:01:21,279 FATAL event.AsyncDispatcher 
(AsyncDispatcher.java:dispatch(179)) - Error in dispatcher thread
   java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.publishAppAttemptFinishedEvent(SystemMetricsPublisher.java:318)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.handleSystemMetricsEvent(SystemMetricsPublisher.java:209)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:431)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:426)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
2014-09-15 05:01:21,279 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:dispatch(184)) - Exiting, bbye..
{code}

But instead of exiting we into infinite loop for flushingWaiting for 
AsyncDispatcher to drain.
Following are some more log information
{code}
2014-09-15 05:01:21,345 INFO  ahs.RMApplicationHistoryWriter 
(RMApplicationHistoryWriter.java:handleWritingApplicationHistoryEvent(157)) - 
Stored the finish data of application application_1410782288602_0004
2014-09-15 05:01:21,348 ERROR ahs.RMApplicationHistoryWriter 
(RMApplicationHistoryWriter.java:handleWritingApplicationHistoryEvent(211)) - 
Error when storing the finish data of container 
container_1410782288602_0004_01_01
2014-09-15 05:01:21,348 ERROR ahs.RMApplicationHistoryWriter 
(RMApplicationHistoryWriter.java:handleWritingApplicationHistoryEvent(211)) - 
Error when storing the finish data of container 
container_1410782288602_0004_01_02
2014-09-15 05:01:21,451 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
server on 8032
2014-09-15 05:01:21,462 INFO  ipc.Server (Server.java:run(832)) - Stopping IPC 
Server Responder
2014-09-15 05:01:21,462 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
server on 8033
2014-09-15 05:01:21,470 INFO  ipc.Server (Server.java:run(832)) - Stopping IPC 
Server Responder
2014-09-15 05:01:21,471 INFO  resourcemanager.ResourceManager 
(ResourceManager.java:transitionToStandby(994)) - Transitioning to standby state
2014-09-15 05:01:21,471 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(201)) - Stopping ResourceManager metrics system...
2014-09-15 05:01:21,473 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(207)) - ResourceManager metrics system stopped.
2014-09-15 05:01:21,473 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:shutdown(584)) - ResourceManager metrics system 
shutdown complete.
2014-09-15 05:01:21,474 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to stop, 
igonring any new events.
2014-09-15 05:01:21,480 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
server on 8030
2014-09-15 05:01:21,490 INFO  ipc.Server (Server.java:run(832)) - Stopping IPC 
Server Responder
2014-09-15 05:01:21,500 INFO  ipc.Server (Server.java:run(706)) - Stopping IPC 
Server listener on 8031
2014-09-15 05:01:21,501 INFO  ipc.Server (Server.java:run(832)) - Stopping IPC 
Server Responder
2014-09-15 05:01:21,502 INFO  util.AbstractLivelinessMonitor 
(AbstractLivelinessMonitor.java:run(127)) - NMLivelinessMonitor thread 
interrupted
2014-09-15 05:01:21,502 ERROR resourcemanager.ResourceManager 
(ResourceManager.java:run(607)) - Returning, interrupted : 
java.lang.InterruptedException
2014-09-15 05:01:21,504 DEBUG service.CompositeService 
(CompositeService.java:stop(151)) - Stopping service #9: Service Dispatcher in 
state Dispatcher: STARTED
2014-09-15 05:01:21,50