[jira] [Created] (YARN-4192) Add YARN metric logging periodically to a seperate file

2015-09-20 Thread nijel (JIRA)
nijel created YARN-4192:
---

 Summary: Add YARN metric logging periodically to a seperate file
 Key: YARN-4192
 URL: https://issues.apache.org/jira/browse/YARN-4192
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: nijel
Assignee: nijel
Priority: Minor


HDFS-8880 added a framework for logging metrics in a given interval.
This can be added to YARN as well

Any thoughts ? 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-20 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877467#comment-14877467
 ] 

nijel commented on YARN-4135:
-

thanks [~rohithsharma] and [~adhoot]

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-09-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3446:

Attachment: (was: YARN-3446.003.patch)

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-09-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3446:

Attachment: YARN-3446.003.patch

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-09-20 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-4195:
--

 Summary: Support of node-labels in the ReservationSystem "Plan"
 Key: YARN-4195
 URL: https://issues.apache.org/jira/browse/YARN-4195
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Carlo Curino


As part of YARN-4193 we need to enhance the InMemoryPlan (and related classes) 
to track the per-label available resources, as well as the per-label
reservation-allocations.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4196) Enhance ReservationAgent(s) to support node-label

2015-09-20 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-4196:
--

 Summary: Enhance ReservationAgent(s) to support node-label
 Key: YARN-4196
 URL: https://issues.apache.org/jira/browse/YARN-4196
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Carlo Curino


As part of umbrella jira YARN-4193 we need to extend the algorithm that place
a ReservationRequest in a Plan to handle multiple node-labels (and expressions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4194) Extend Reservation Definition Langauge (RDL) extensions to support not labels

2015-09-20 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-4194:
--

 Summary: Extend Reservation Definition Langauge (RDL) extensions 
to support not labels
 Key: YARN-4194
 URL: https://issues.apache.org/jira/browse/YARN-4194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Carlo Curino


This JIRA tracks changes to the APIs to the reservation system to support
the expressivity of node-labels.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4193) Support heterogeneity (node-labels) in the YARN ReservationSystem

2015-09-20 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-4193:
--

 Summary: Support heterogeneity (node-labels) in the YARN 
ReservationSystem
 Key: YARN-4193
 URL: https://issues.apache.org/jira/browse/YARN-4193
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Carlo Curino


This umbrella JIRA tracks the enhancement of the reservation system,
to support node-label expressions. The user will be allowed to book
reservations on specific node-labels. This is key as heterogeneity
of hardware/software/logical configuration should be addressable during
the reservation process. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-09-20 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4195:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4193

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM every x interval for distributed nodelabels

2015-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14899929#comment-14899929
 ] 

Hadoop QA commented on YARN-4176:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 15s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 58s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 50s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 20s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  0s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   8m 15s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  57m  7s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.TestDockerContainerExecutor |
|   | hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761329/0002-YARN-4176.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3a9c707 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9221/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9221/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9221/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9221/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9221/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9221/console |


This message was automatically generated.

> Resync NM nodelabels with RM every x interval for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4176) Resync NM nodelabels with RM every x interval for distributed nodelabels

2015-09-20 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4176:
---
Attachment: 0003-YARN-4176.patch

> Resync NM nodelabels with RM every x interval for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4176) Resync NM nodelabels with RM every x interval for distributed nodelabels

2015-09-20 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4176:
---
Attachment: 0002-YARN-4176.patch

Attaching patch after testcase correction.

> Resync NM nodelabels with RM every x interval for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM every x interval for distributed nodelabels

2015-09-20 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14899968#comment-14899968
 ] 

Bibin A Chundatt commented on YARN-4176:


Checkstyle issues not added as part of this patch to my understanding

> Resync NM nodelabels with RM every x interval for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM every x interval for distributed nodelabels

2015-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14899957#comment-14899957
 ] 

Hadoop QA commented on YARN-4176:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 10s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  1s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 51s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:red}-1{color} | checkstyle |   2m 36s | The applied patch generated  4 
new checkstyle issues (total was 25, now 29). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 46s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  56m 33s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761332/0003-YARN-4176.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3a9c707 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9222/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9222/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9222/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9222/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9222/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9222/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9222/console |


This message was automatically generated.

> Resync NM nodelabels with RM every x interval for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4194) Extend Reservation Definition Langauge (RDL) extensions to support node labels

2015-09-20 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4194:

Summary: Extend Reservation Definition Langauge (RDL) extensions to support 
node labels  (was: Extend Reservation Definition Langauge (RDL) extensions to 
support not labels)

> Extend Reservation Definition Langauge (RDL) extensions to support node labels
> --
>
> Key: YARN-4194
> URL: https://issues.apache.org/jira/browse/YARN-4194
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>
> This JIRA tracks changes to the APIs to the reservation system to support
> the expressivity of node-labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to kill AM attempts

2015-09-20 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900222#comment-14900222
 ] 

Rohith Sharma K S commented on YARN-261:


[~aklochkov] I'd appreciate if you would have look at the patch and provide 
your comments on the patch.

> Ability to kill AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, YARN-261--n2.patch, 
> YARN-261--n3.patch, YARN-261--n4.patch, YARN-261--n5.patch, 
> YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4192) Add YARN metric logging periodically to a seperate file

2015-09-20 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900226#comment-14900226
 ] 

Rohith Sharma K S commented on YARN-4192:
-

Sounds good to me. It will be help full for analyzing the issues. Can you 
provide more details about what metrics are logged? Is it queue metric or 
cluster metrics? If queue metric then does it reports all the hierarchical 
queues? 

> Add YARN metric logging periodically to a seperate file
> ---
>
> Key: YARN-4192
> URL: https://issues.apache.org/jira/browse/YARN-4192
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>
> HDFS-8880 added a framework for logging metrics in a given interval.
> This can be added to YARN as well
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.

2015-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900178#comment-14900178
 ] 

Hadoop QA commented on YARN-4095:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 23s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 12s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   7m 49s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  46m 21s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761363/YARN-4095.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3a9c707 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9226/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9226/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9226/console |


This message was automatically generated.

> Avoid sharing AllocatorPerContext object in LocalDirAllocator between 
> ShuffleHandler and LocalDirsHandlerService.
> -
>
> Key: YARN-4095
> URL: https://issues.apache.org/jira/browse/YARN-4095
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4095.000.patch, YARN-4095.001.patch
>
>
> Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
> {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
> {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static 
> TreeMap with configuration name as key
> {code}
>   private static Map  contexts = 
>  new TreeMap();
> {code}
> {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
> {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
> {{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
> object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value 
> in its {{Configuration}} object to exclude full and bad local dirs, 
> {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
> {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} 
> is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
> {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
> is changed. This will cause some overhead.
> {code}
>   String newLocalDirs = conf.get(contextCfgItemName);
>   if (!newLocalDirs.equals(savedLocalDirs)) {
> {code}
> So it will be a good improvement to not share the same 
> {{AllocatorPerContext}} instance between {{ShuffleHandler}} and 
> {{LocalDirsHandlerService}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM every x interval for distributed nodelabels

2015-09-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900185#comment-14900185
 ] 

Naganarasimha G R commented on YARN-4176:
-

Hi [~bibinchundatt],
Thanks for working on the patch. Few comments :
NodeStatusUpdaterImpl.java
# {{diffcheckstylehadoop-yarn-server-nodemanager.txt}} seems like issues 
reported here are related to the modifications of the patch
# {{areNodeLabelsUpdated || resyncElapsed}} could be {{areNodeLabelsUpdated || 
isResyncIntervalElapsed()}}, In the cases where node labels are updated then 
due to short circuit isResyncIntervalElapsed will be avoided
# in {{isResyncIntervalElapsed}} resyncInterval  need not be evaluated on every 
call we can push this to constructor or init


> Resync NM nodelabels with RM every x interval for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4198) CapacityScheduler locking / synchronization improvements

2015-09-20 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4198:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4193

> CapacityScheduler locking / synchronization improvements
> 
>
> Key: YARN-4198
> URL: https://issues.apache.org/jira/browse/YARN-4198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>
> In the context of YARN-4193 (which stresses the RM/CS performance) we found 
> several performance problems with  in the locking/synchronization of the 
> CapacityScheduler, as well as inconsistencies that do not normally surface 
> (incorrect locking-order of queues protected by CS locks etc). This JIRA 
> proposes several refactoring that improve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4197) Modify PlanFollower to handle reservation with node-labels

2015-09-20 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4197:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4193

> Modify PlanFollower to handle reservation with node-labels
> --
>
> Key: YARN-4197
> URL: https://issues.apache.org/jira/browse/YARN-4197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>
> This JIRA tracks all the changes needed in the PlanFollower(s) 
> to handle multiple node-labels, and properly publish reservation to the 
> underlying scheduler, as well as gathering from the NodeLabelsManager the
> availability of resources per-label and inform the Plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4196) Enhance ReservationAgent(s) to support node-label

2015-09-20 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4196:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4193

> Enhance ReservationAgent(s) to support node-label
> -
>
> Key: YARN-4196
> URL: https://issues.apache.org/jira/browse/YARN-4196
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>
> As part of umbrella jira YARN-4193 we need to extend the algorithm that place
> a ReservationRequest in a Plan to handle multiple node-labels (and 
> expressions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4198) CapacityScheduler locking / synchronization improvements

2015-09-20 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-4198:
--

 Summary: CapacityScheduler locking / synchronization improvements
 Key: YARN-4198
 URL: https://issues.apache.org/jira/browse/YARN-4198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Carlo Curino


In the context of YARN-4193 (which stresses the RM/CS performance) we found 
several performance problems with  in the locking/synchronization of the 
CapacityScheduler, as well as inconsistencies that do not normally surface 
(incorrect locking-order of queues protected by CS locks etc). This JIRA 
proposes several refactoring that improve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-20 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900227#comment-14900227
 ] 

Devaraj K commented on YARN-3964:
-

Thanks [~dian.fu] for the patch.

Patch has gone stale, Can you please update the patch? And also please take 
care of the above jenkins warnings in the updated patch.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4194) Extend Reservation Definition Langauge (RDL) extensions to support not labels

2015-09-20 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4194:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4193

> Extend Reservation Definition Langauge (RDL) extensions to support not labels
> -
>
> Key: YARN-4194
> URL: https://issues.apache.org/jira/browse/YARN-4194
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>
> This JIRA tracks changes to the APIs to the reservation system to support
> the expressivity of node-labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-09-20 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-4195:
--

Assignee: Carlo Curino

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900144#comment-14900144
 ] 

Hadoop QA commented on YARN-3446:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  4s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  59m 10s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 35s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761356/YARN-3446.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3a9c707 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9225/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9225/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9225/console |


This message was automatically generated.

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.

2015-09-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4095:

Attachment: YARN-4095.001.patch

> Avoid sharing AllocatorPerContext object in LocalDirAllocator between 
> ShuffleHandler and LocalDirsHandlerService.
> -
>
> Key: YARN-4095
> URL: https://issues.apache.org/jira/browse/YARN-4095
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4095.000.patch, YARN-4095.001.patch
>
>
> Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
> {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
> {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static 
> TreeMap with configuration name as key
> {code}
>   private static Map  contexts = 
>  new TreeMap();
> {code}
> {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
> {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
> {{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
> object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value 
> in its {{Configuration}} object to exclude full and bad local dirs, 
> {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
> {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} 
> is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
> {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
> is changed. This will cause some overhead.
> {code}
>   String newLocalDirs = conf.get(contextCfgItemName);
>   if (!newLocalDirs.equals(savedLocalDirs)) {
> {code}
> So it will be a good improvement to not share the same 
> {{AllocatorPerContext}} instance between {{ShuffleHandler}} and 
> {{LocalDirsHandlerService}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4167) NPE on RMActiveServices#serviceStop when store is null

2015-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900193#comment-14900193
 ] 

Hudson commented on YARN-4167:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8494 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8494/])
YARN-4167. NPE on RMActiveServices#serviceStop when store is null. (Bibin A 
Chundatt via rohithsharmaks) (rohithsharmaks: rev 
c9cb6a5960ad335a3ee93a6ee219eae5aad372f9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* hadoop-yarn-project/CHANGES.txt


> NPE on RMActiveServices#serviceStop when store is null
> --
>
> Key: YARN-4167
> URL: https://issues.apache.org/jira/browse/YARN-4167
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4167.patch, 0001-YARN-4167.patch, 
> 0002-YARN-4167.patch
>
>
> Configure 
> {{yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs}} 
> mismatching with {{yarn.nm.liveness-monitor.expiry-interval-ms}}
> On startup NPE is thrown on {{RMActiveServices#serviceStop}}
> {noformat}
> 2015-09-16 12:23:29,504 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state INITED; cause: 
> java.lang.IllegalArgumentException: 
> yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs should 
> be more than 3 X yarn.nm.liveness-monitor.expiry-interval-ms
> java.lang.IllegalArgumentException: 
> yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs should 
> be more than 3 X yarn.nm.liveness-monitor.expiry-interval-ms
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.(RMContainerTokenSecretManager.java:82)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService.createContainerTokenSecretManager(RMSecretManagerService.java:109)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService.(RMSecretManagerService.java:57)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createRMSecretManagerService(ResourceManager.java:)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:423)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:963)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1193)
> 2015-09-16 12:23:29,507 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error closing 
> store.
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:608)
>  at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>  at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>  at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:963)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1193
> {noformat}
> *Impact Area*: RM failover with wrong configuration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4167) NPE on RMActiveServices#serviceStop when store is null

2015-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900223#comment-14900223
 ] 

Hudson commented on YARN-4167:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #418 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/418/])
YARN-4167. NPE on RMActiveServices#serviceStop when store is null. (Bibin A 
Chundatt via rohithsharmaks) (rohithsharmaks: rev 
c9cb6a5960ad335a3ee93a6ee219eae5aad372f9)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


> NPE on RMActiveServices#serviceStop when store is null
> --
>
> Key: YARN-4167
> URL: https://issues.apache.org/jira/browse/YARN-4167
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4167.patch, 0001-YARN-4167.patch, 
> 0002-YARN-4167.patch
>
>
> Configure 
> {{yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs}} 
> mismatching with {{yarn.nm.liveness-monitor.expiry-interval-ms}}
> On startup NPE is thrown on {{RMActiveServices#serviceStop}}
> {noformat}
> 2015-09-16 12:23:29,504 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state INITED; cause: 
> java.lang.IllegalArgumentException: 
> yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs should 
> be more than 3 X yarn.nm.liveness-monitor.expiry-interval-ms
> java.lang.IllegalArgumentException: 
> yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs should 
> be more than 3 X yarn.nm.liveness-monitor.expiry-interval-ms
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.(RMContainerTokenSecretManager.java:82)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService.createContainerTokenSecretManager(RMSecretManagerService.java:109)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService.(RMSecretManagerService.java:57)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createRMSecretManagerService(ResourceManager.java:)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:423)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:963)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1193)
> 2015-09-16 12:23:29,507 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error closing 
> store.
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:608)
>  at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>  at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>  at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:963)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1193
> {noformat}
> *Impact Area*: RM failover with wrong configuration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-09-20 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900067#comment-14900067
 ] 

zhihai xu commented on YARN-3446:
-

Hi [~kasha], Thanks for the review! I attached a new patch YARN-3446.003.patch, 
which addressed your first comment. I also added more test cases to verify 
{{getHeadroom}} with blacklisted nodes remove and addition.
About your second comment: IMHO, if we didn't do the optimization, that will be 
a very big overhead for a large cluster. For example, we have 2000 AM running 
on 5000 nodes cluster, For each AM, we need go through 5000 nodes list to find 
the blacklisted {{SchedulerNode}} in the heartbeat. With 2000 AM, it will loop 
10,000,000 times. Normally number of blacklisted nodes should be very small for 
each application. So iterating on the blacklisted nodes may not be a 
performance issue. Also AM won't change blacklisted nodes frequently.
About your third comment, it is because currently {{SchedulerNode}} are stored 
in {{AbstractYarnScheduler#nodes}} with key {{NodeId}}. But 
{{AppSchedulingInfo}} stores the blacklisted nodes using {{String}} Node Name 
or Rack Name. I can't find an easy way to translate Node Name and Rack Name to 
{{NodeId}}. So it looks like we need iterate through 
{{AbstractYarnScheduler#nodes}} to find the blacklisted {{SchedulerNode}} if we 
use {{AppSchedulingInfo#getBlacklist}}. That means for a 5000 nodes cluster, we 
need loop 5000 times, a big overhead. {{AbstractYarnScheduler#nodes}} are 
defined at the following code:
{code}
  protected Map nodes = new ConcurrentHashMap();
{code}

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900071#comment-14900071
 ] 

Hadoop QA commented on YARN-3446:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 33s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 50s | The applied patch generated  4 
new checkstyle issues (total was 89, now 92). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  58m 31s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  97m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761350/YARN-3446.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3a9c707 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9224/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9224/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9224/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9224/console |


This message was automatically generated.

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2369) Environment variable handling assumes values should be appended

2015-09-20 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated YARN-2369:
--
Attachment: YARN-2369-7.patch

[~jlowe] sorry this took so long, I got sidetracked with other projects!  I 
think I've now included your suggestions in v7 of this patch and there are a 
couple of checkstyle warnings as you might have expected.  If there's anything 
missing from this please let me know :)

> Environment variable handling assumes values should be appended
> ---
>
> Key: YARN-2369
> URL: https://issues.apache.org/jira/browse/YARN-2369
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jason Lowe
>Assignee: Dustin Cote
> Attachments: YARN-2369-1.patch, YARN-2369-2.patch, YARN-2369-3.patch, 
> YARN-2369-4.patch, YARN-2369-5.patch, YARN-2369-6.patch, YARN-2369-7.patch
>
>
> When processing environment variables for a container context the code 
> assumes that the value should be appended to any pre-existing value in the 
> environment.  This may be desired behavior for handling path-like environment 
> variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a 
> non-intuitive and harmful way to handle any variable that does not have 
> path-like semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended

2015-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900037#comment-14900037
 ] 

Hadoop QA commented on YARN-2369:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  0s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 19s | The applied patch generated  1 
new checkstyle issues (total was 176, now 173). |
| {color:red}-1{color} | checkstyle |   2m 51s | The applied patch generated  4 
new checkstyle issues (total was 517, now 521). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 57s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  23m 35s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | mapreduce tests |   0m 47s | Tests passed in 
hadoop-mapreduce-client-common. |
| {color:green}+1{color} | mapreduce tests |   1m 46s | Tests passed in 
hadoop-mapreduce-client-core. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| | |  78m 18s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761348/YARN-2369-7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3a9c707 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9223/artifact/patchprocess/diffcheckstylehadoop-common.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9223/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9223/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-mapreduce-client-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9223/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt
 |
| hadoop-mapreduce-client-core test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9223/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9223/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9223/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9223/console |


This message was automatically generated.

> Environment variable handling assumes values should be appended
> ---
>
> Key: YARN-2369
> URL: https://issues.apache.org/jira/browse/YARN-2369
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jason Lowe
>Assignee: Dustin Cote
> Attachments: YARN-2369-1.patch, YARN-2369-2.patch, YARN-2369-3.patch, 
> YARN-2369-4.patch, YARN-2369-5.patch, YARN-2369-6.patch, YARN-2369-7.patch
>
>
> When processing environment variables for a container context the code 
> assumes that the value should be appended to any pre-existing value in the 
> environment.  This may be desired behavior for handling path-like environment 
> variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a 
> non-intuitive and harmful way to handle any variable that does not have 
> path-like semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-09-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3446:

Attachment: YARN-3446.003.patch

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)