[jira] [Commented] (YARN-4849) [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add licenses.

2016-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229715#comment-15229715
 ] 

Hadoop QA commented on YARN-4849:
-

(!) A patch to the testing environment has been detected. 
Re-executing against the patched versions to perform further tests. 
The console is at 
https://builds.apache.org/job/PreCommit-YARN-Build/10981/console in case of 
problems.


> [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add 
> licenses.
> ---
>
> Key: YARN-4849
> URL: https://issues.apache.org/jira/browse/YARN-4849
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4849-YARN-3368.1.patch, 
> YARN-4849-YARN-3368.2.patch, YARN-4849-YARN-3368.3.patch, 
> YARN-4849-YARN-3368.4.patch, YARN-4849-YARN-3368.5.patch, 
> YARN-4849-YARN-3368.6.patch, YARN-4849-YARN-3368.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4849) [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add licenses.

2016-04-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4849:
-
Attachment: YARN-4849-YARN-3368.7.patch

Attached ver.7 patch, addressed all comments from [~sunilg]. Thanks for review!

Please let me know your thoughts on latest patch.

> [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add 
> licenses.
> ---
>
> Key: YARN-4849
> URL: https://issues.apache.org/jira/browse/YARN-4849
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4849-YARN-3368.1.patch, 
> YARN-4849-YARN-3368.2.patch, YARN-4849-YARN-3368.3.patch, 
> YARN-4849-YARN-3368.4.patch, YARN-4849-YARN-3368.5.patch, 
> YARN-4849-YARN-3368.6.patch, YARN-4849-YARN-3368.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes

2016-04-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229688#comment-15229688
 ] 

Rohith Sharma K S commented on YARN-4794:
-

Thanks [~jianhe] for working on the patch. Nice issue!!:-)

Making clear to folks before giving +1 that since NMClientImpl is annotated as 
@Private, removing protected method should not  be any compatible issue.

+1 LGTM, if there is no objections I will commit it EOD

> Distributed shell app gets stuck on stopping containers after App completes
> ---
>
> Key: YARN-4794
> URL: https://issues.apache.org/jira/browse/YARN-4794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-4794.1.patch
>
>
> Distributed shell app gets stuck on stopping containers after App completes 
> with the following exception
> {code:title = app log}
> 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application 
> completed. Stopping running containers
> 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to 
> the server : java.nio.channels.ClosedByInterruptException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3196) [Compatibility] Make TS next gen be compatible with the current TS

2016-04-06 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3196:
--
Labels:   (was: yarn-2928-1st-milestone)

> [Compatibility] Make TS next gen be compatible with the current TS
> --
>
> Key: YARN-3196
> URL: https://issues.apache.org/jira/browse/YARN-3196
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Junping Du
>
> File a jira to make sure that we don't forget to be compatible with the 
> current TS, such that we can smoothly move users to new TS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4821) have a separate NM timeline publishing interval

2016-04-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229677#comment-15229677
 ] 

Sangjin Lee commented on YARN-4821:
---

I think it would be good if we can get this in, as long as it is not too 
complicated to implement. What do you think?

> have a separate NM timeline publishing interval
> ---
>
> Key: YARN-4821
> URL: https://issues.apache.org/jira/browse/YARN-4821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> Currently the interval with which NM publishes container CPU and memory 
> metrics is tied to {{yarn.nodemanager.resource-monitor.interval-ms}} whose 
> default is 3 seconds. This is too aggressive.
> There should be a separate configuration that controls how often 
> {{NMTimelinePublisher}} publishes container metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3959) Store application related configurations in Timeline Service v2

2016-04-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229676#comment-15229676
 ] 

Sangjin Lee commented on YARN-3959:
---

I think it would be nice if we can get this in. [~varun_saxena]?

> Store application related configurations in Timeline Service v2
> ---
>
> Key: YARN-3959
> URL: https://issues.apache.org/jira/browse/YARN-3959
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
>
> We already have configuration field in HBase schema for application entity. 
> We need to make sure AM write it out when it get launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-04-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229673#comment-15229673
 ] 

Sangjin Lee commented on YARN-4736:
---

Agreed. Please feel free to remove the label.

> Issues with HBaseTimelineWriterImpl
> ---
>
> Key: YARN-4736
> URL: https://issues.apache.org/jira/browse/YARN-4736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Vrushali C
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: NM_Hang_hbase1.0.3.tar.gz, hbaseException.log, 
> threaddump.log
>
>
> Faced some issues while running ATSv2 in single node Hadoop cluster and in 
> the same node had launched Hbase with embedded zookeeper.
> # Due to some NPE issues i was able to see NM was trying to shutdown, but the 
> NM daemon process was not completed due to the locks.
> # Got some exception related to Hbase after application finished execution 
> successfully. 
> will attach logs and the trace for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery

2016-04-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229595#comment-15229595
 ] 

Naganarasimha G R commented on YARN-3971:
-

[~bibinchundatt], Thanks for reopening this jira and yes agree with your 
analysis its wrong to handle in this way. So whats the approach you have in 
your mind ?
I could think of having a flag in CommonNodeLabelsManager which set before 
calling initNodeLabelStore and reset after the call finishes. Thoughts?
Also this time we need to better correct the test case too.

> Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
> recovery
> --
>
> Key: YARN-3971
> URL: https://issues.apache.org/jira/browse/YARN-3971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 
> 0003-YARN-3971.patch, 0004-YARN-3971.patch, 0005-YARN-3971.patch
>
>
> Steps to reproduce 
> # Create label x,y
> # Delete label x,y
> # Create label x,y add capacity scheduler xml for labels x and y too
> # Restart RM 
>  
> Both RM will become Standby.
> Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
> {code}
> 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
> queue=a1 is using this label. Please remove label on queue before remove the 
> label
> java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
> label. Please remove label on queue before remove the label
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-04-06 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229557#comment-15229557
 ] 

Tao Jie commented on YARN-4855:
---

[~sunilg] [~leftnoteasy] [~naganarasimha] Thanks for your comments.
I agree "--fail-on-unkown-nodes" makes the option more clear.
{quote}
also one more aspect is does it require a protocol change can we do it in the 
client side ?
{quote}
When we do verification on client side, client should request for all active 
node list (if verify on server side, node list is available in memory). My 
earlier concern is that it will bring more load if we do this verification as 
default behavior. However when we keep the default behavior as it used to be, 
and add one option for the node verification, I think on client side is more 
concise.

More consideration,
When we use *yarn rmadmin -replaceLabelsOnNode* and add node label to wrong 
nodes, this information seems to leak on RM. It stores in RM memory/filesystem, 
but could not be found in website or by shell cmd. As a result, it is difficult 
to remove or correct. Maybe it could be fixed in another Jira.
Please correct me if I am wrong, thanks!


> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Minor
> Attachments: YARN-4855.001.patch
>
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be 
> denied if node does not exist.
> When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would 
> add nodelabels no matter whether node exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4886) Add HDFS caller context for EntityGroupFSTimelineStore

2016-04-06 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229529#comment-15229529
 ] 

Xuan Gong commented on YARN-4886:
-

+1 lgtm. pending Jenkins

> Add HDFS caller context for EntityGroupFSTimelineStore
> --
>
> Key: YARN-4886
> URL: https://issues.apache.org/jira/browse/YARN-4886
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4886-trunk.001.patch
>
>
> We need to add a HDFS caller context for the entity group FS storage for 
> better audit log debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4726) [Umbrella] Allocation reuse for application upgrades

2016-04-06 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229494#comment-15229494
 ] 

Arun Suresh commented on YARN-4726:
---

[~kasha], we have posted a slightly scoped down version of the doc on 
YARN-4876. To be used as initial building block for application upgrades.. 
Without full blown API changes described in YARN-1040

We plan to pursue YARN-4876 as phase 1 for the general purpose problem of 
decoupling allocation from container life cycle

> [Umbrella] Allocation reuse for application upgrades
> 
>
> Key: YARN-4726
> URL: https://issues.apache.org/jira/browse/YARN-4726
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>
> See overview doc at YARN-4692, copying the sub-section to track all related 
> efforts.
> Once auto-­restart of containers is taken care of (YARN-4725), we need to 
> address what I believe is the second most important reason for service 
> containers to restart : upgrades. Once a service is running on YARN, the way 
> container allocation-­lifecycle works, any time the container exits, YARN 
> will reclaim the resources. During an upgrade, with multitude of other 
> applications running in the system, giving up and getting back resources 
> allocated to the service is hard to manage. Things like N​ode­Labels in YARN 
> ​help this cause but are not straight­forward to use to address the 
> app­-specific use­cases.
> We need a first class way of letting application reuse the same 
> resource­allocation for multiple launches of the processes inside the 
> container. This is done by decoupling allocation lifecycle and the process 
> life­cycle.
> The JIRA YARN-1040 initiated this conversation. We need two things here: 
>  - (1) (​Task) ​the ApplicationMaster should be able to use the same 
> container-allocation and issue multiple s​tartContainer​requests to the 
> NodeManager.
>  - (2) [(Task) To support the upgrade of the ApplicationMaster itself, 
> clients should be able to inform YARN to restart AM within the same 
> allocation but with new bits.
> The JIRAs YARN-3417 and YARN-4470 talk about the second task above ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

2016-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229489#comment-15229489
 ] 

Hadoop QA commented on YARN-2883:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 58s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 7s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 37s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 13 new + 
425 unchanged - 4 fixed = 438 total (was 429) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 9s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | 

[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-06 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229486#comment-15229486
 ] 

sandflee commented on YARN-4924:


thanks [~nroberts], another thought, seems it's not nessesary for NM to store 
FINISH_APP event, for RM will check the running app when NM register, we just 
make sure that when NM register RM, it had recovered all containers with best, 
yes? [~jlowe]

> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-04-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229478#comment-15229478
 ] 

Sunil G commented on YARN-4855:
---

bq.he need not worry whether the node is temporarily down and he can just apply 
the label mappings
Thanks [~Naganarasimha] for the detailed clarification. This makes sense to me.

bq.How about renaming it to "--fail-on-unkown-nodes"
+1 for this as we all agree that current naming is confusing. Thanks 
[~leftnoteasy] and [~Naganarasimha].

> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Minor
> Attachments: YARN-4855.001.patch
>
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be 
> denied if node does not exist.
> When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would 
> add nodelabels no matter whether node exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery

2016-04-06 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reopened YARN-3971:


> Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
> recovery
> --
>
> Key: YARN-3971
> URL: https://issues.apache.org/jira/browse/YARN-3971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 
> 0003-YARN-3971.patch, 0004-YARN-3971.patch, 0005-YARN-3971.patch
>
>
> Steps to reproduce 
> # Create label x,y
> # Delete label x,y
> # Create label x,y add capacity scheduler xml for labels x and y too
> # Restart RM 
>  
> Both RM will become Standby.
> Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
> {code}
> 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
> queue=a1 is using this label. Please remove label on queue before remove the 
> label
> java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
> label. Please remove label on queue before remove the label
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery

2016-04-06 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229476#comment-15229476
 ] 

Bibin A Chundatt commented on YARN-3971:


[~wangda]/[~Naganarasimha]
The issue still exists can we reopen the jira so that will provide an updated 
patch.

The check done last time was for checking the ServiceState=STARTED . But always 
the service state will be STARTED when {{FileSystemNodeLabelsStore#recover}} is 
called and not in INIT since the call is from AbstractService.start() and 
before the serviceStart() is done the state is set to STARTED.

{noformat}
 synchronized (stateChangeLock) {
  if (stateModel.enterState(STATE.STARTED) != STATE.STARTED) {
try {
  startTime = System.currentTimeMillis();
  serviceStart();

}
{noformat}
{{stateModel.enterState(STATE.STARTED)}} its directly set to 
{{SERVICE.STATE=STARTED}}


> Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
> recovery
> --
>
> Key: YARN-3971
> URL: https://issues.apache.org/jira/browse/YARN-3971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 
> 0003-YARN-3971.patch, 0004-YARN-3971.patch, 0005-YARN-3971.patch
>
>
> Steps to reproduce 
> # Create label x,y
> # Delete label x,y
> # Create label x,y add capacity scheduler xml for labels x and y too
> # Restart RM 
>  
> Both RM will become Standby.
> Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
> {code}
> 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
> queue=a1 is using this label. Please remove label on queue before remove the 
> label
> java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
> label. Please remove label on queue before remove the label
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2016-04-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229472#comment-15229472
 ] 

Sunil G commented on YARN-4925:
---

Thanks [~bibinchundatt].
Yes, NODE_LOCAL and OFF-SWITCH should not be set differently. So a change in 
ANY can reset other 2 types which is fine. I was just ensuring this point. No 
issue :)


> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2016-04-06 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229469#comment-15229469
 ] 

Bibin A Chundatt commented on YARN-4925:


[~sunilg] and [~Naganarasimha] 
Thank you for looking into the issue.

The check is for {{ContainerRequest}} and IIUC after the discussion with 
[~wangda] the check was done to *properly account how much resources an app 
requests for each partitions* considering that  label for NODE_LOCAL and 
OFF-SWITCH shouldnt be set differently, which  is not possible if we are 
setting {{ContainerRequest}} . 

{noformat}
 public static class ContainerRequest {
final Resource capability;
final List nodes;
final List racks;
final Priority priority;
final boolean relaxLocality;
final String nodeLabelsExpression;
}
{noformat}
Consider in one NODELABEL there are 100 Nodes and need container to be started 
on NODE where data is available this is not allowed.For issue we face was for 
spark container  needed be node local and was  not allowed.
cc/ [~wangda] 

> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes

2016-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229466#comment-15229466
 ] 

Hadoop QA commented on YARN-4794:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 21s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 39s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 146m 33s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | hadoop.yarn.client.api.impl.TestAMRMProxy |
|   | hadoop.yarn.client.TestGetGroups |
| JDK v1.8.0_77 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
| JDK v1.7.0_95 Failed junit tests | hadoop.yarn.client.api.impl.TestAMRMProxy |
|   | hadoop.yarn.client.TestGetGroups |
| JDK v1.7.0_95 

[jira] [Commented] (YARN-4552) NM ResourceLocalizationService should check and initialize local filecache dir (and log dir) even if NM recover is enabled.

2016-04-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229410#comment-15229410
 ] 

Vinod Kumar Vavilapalli commented on YARN-4552:
---

[~djp], let me know if you can update this soon enough for 2.7.3 in a couple of 
days. Otherwise, we can simply move this to 2.7.4 or 2.8 in few weeks.


> NM ResourceLocalizationService should check and initialize local filecache 
> dir (and log dir) even if NM recover is enabled.
> ---
>
> Key: YARN-4552
> URL: https://issues.apache.org/jira/browse/YARN-4552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4552-v2.patch, YARN-4552.patch
>
>
> In some cases, user are cleanup localized file cache for debugging/trouble 
> shooting purpose during NM down time. However, after bring back NM (with 
> recovery enabled), the job submission could be failed for exception like 
> below:
> {noformat}
> Diagnostics: java.io.FileNotFoundException: File 
> /disk/12/yarn/local/filecache does not exist.
> {noformat}
> This is due to we only create filecache dir when recover is not enabled 
> during ResourceLocalizationService get initialized/started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3461) Consolidate flow name/version/run defaults

2016-04-06 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3461:
--
Attachment: YARN-3461-YARN-2928.03.patch

Posted patch v.3.

{quote}
i am not sure why earlier code was trying to set these flags in 
AMLauncher.setupTokens methods, would it be better to set it in the caller 
createAMContainerLaunchContext or a seperate method for it or change the method 
name to be more meaningful. Thoughts?
{quote}

That's a good suggestion, and I made that change in the latest patch. I also 
thought that the code was buried at an awkward place. I refactored the code out 
to create a separate method ({{setFlowContext()}}).

{quote}
i think instead of calling setFlowTags twice may be we can have additional 
parameter indicating the default value along with pushing of tag.split(":", 2) 
inside setFlowTags
{quote}

We kind of need it in the form it's in. I know it seems somewhat awkward that 
we call it twice for a given tag. However, note that the flow information may 
not be present in the tags (that's why we need the defaults in the first 
place!), in which case the inner {{setFlowTags()}} will not be invoked.

Another approach would be to call {{setFlowTags()}} with the default *after* 
iterating over the tag, but that also adds some more code to handle it so it's 
not clear if things become simpler.

The {{setFlowTags()}} method is pretty cheap (a simple {{Map.put()}} call 
really), so I went for the simplest form of the implementation. Let me know if 
that works.

> Consolidate flow name/version/run defaults
> --
>
> Key: YARN-3461
> URL: https://issues.apache.org/jira/browse/YARN-3461
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3461-YARN-2928.01.patch, 
> YARN-3461-YARN-2928.02.patch, YARN-3461-YARN-2928.03.patch
>
>
> In YARN-3391, it's not resolved what should be the defaults for flow 
> name/version/run. Let's continue the discussion here and unblock YARN-3391 
> from moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-04-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229408#comment-15229408
 ] 

Wangda Tan commented on YARN-4855:
--

Thanks [~Tao Jie] and discussions from [~Naganarasimha], [~sunilg].

[Comment|https://issues.apache.org/jira/browse/YARN-4855?focusedCommentId=15227603=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15227603]
 looks doable and backward compatible. However, \-checkNode seems very 
confusing. How about renaming it to "--fail-on-unkown-nodes", which is longer 
but clearer.

Thoughts?

> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Minor
> Attachments: YARN-4855.001.patch
>
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be 
> denied if node does not exist.
> When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would 
> add nodelabels no matter whether node exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4855) Should check if node exists when replace nodelabels

2016-04-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4855:
-
Assignee: Tao Jie

> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Minor
> Attachments: YARN-4855.001.patch
>
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be 
> denied if node does not exist.
> When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would 
> add nodelabels no matter whether node exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY

2016-04-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229384#comment-15229384
 ] 

Wangda Tan commented on YARN-2694:
--

[~zhz],

The major reason of only supporting specify node label in off-switch request 
is: existing node label is actually node partition, admin can limit how much 
resource that can be used by each queue in each partition. So if rack / node 
level request has different partition of off-switch request, scheduler cannot 
correctly calculate how much resource the app requested in each partition: all 
existing logic to calculate pending resources of an app relies on off-switch 
request.

This patch doesn't allow specify any label in rack / node, all rack / node 
request contains label will be directly rejected. This is not correct, we fixed 
this in YARN-4140 that forces all requests under the off-switch request has 
same label. You can take a look at discussions on YARN-4140.

If you want rack / node specify different label,  they will be available after 
YARN-3409 (node constraints). Node constraints will be used to tag NMs, no more 
enforcements (such as ACL, resource limtis, etc.) will be added to node 
constraints. 

> Ensure only single node labels specified in resource request / host, and node 
> label expression only specified when resourceName=ANY
> ---
>
> Key: YARN-2694
> URL: https://issues.apache.org/jira/browse/YARN-2694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, 
> YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, 
> YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, 
> YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, 
> YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, 
> YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, 
> YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, 
> YARN-2694-20150205-3.patch, YARN-2694-branch-2.6.1.txt
>
>
> Currently, node label expression supporting in capacity scheduler is partial 
> completed. Now node label expression specified in Resource Request will only 
> respected when it specified at ANY level. And a ResourceRequest/host with 
> multiple node labels will make user limit, etc. computation becomes more 
> tricky.
> Now we need temporarily disable them, changes include,
> - AMRMClient
> - ApplicationMasterService
> - RMAdminCLI
> - CommonNodeLabelsManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2883) Queuing of container requests in the NM

2016-04-06 Thread Konstantinos Karanasos (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-2883:
-
Attachment: YARN-2883-trunk.010.patch

Thanks [~asuresh] for the feedback.
Addressing your comments and attaching new patch.

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-trunk.004.patch, YARN-2883-trunk.005.patch, 
> YARN-2883-trunk.006.patch, YARN-2883-trunk.007.patch, 
> YARN-2883-trunk.008.patch, YARN-2883-trunk.009.patch, 
> YARN-2883-trunk.010.patch, YARN-2883-yarn-2877.001.patch, 
> YARN-2883-yarn-2877.002.patch, YARN-2883-yarn-2877.003.patch, 
> YARN-2883-yarn-2877.004.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY

2016-04-06 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229313#comment-15229313
 ] 

Zhe Zhang commented on YARN-2694:
-

Hi [~jianhe], [~leftnoteasy], I have a few questions about this change:
bq. Currently, node label expression supporting in capacity scheduler is 
partial completed. Now node label expression specified in Resource Request will 
only respected when it specified at ANY level.
Could you elaborate a bit on this? Is it because it's hard to satisfy both node 
label and locality requirements? If so, when do you think we will be ready to 
enable Node or Rack level resource requests?

Thanks,


> Ensure only single node labels specified in resource request / host, and node 
> label expression only specified when resourceName=ANY
> ---
>
> Key: YARN-2694
> URL: https://issues.apache.org/jira/browse/YARN-2694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, 
> YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, 
> YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, 
> YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, 
> YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, 
> YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, 
> YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, 
> YARN-2694-20150205-3.patch, YARN-2694-branch-2.6.1.txt
>
>
> Currently, node label expression supporting in capacity scheduler is partial 
> completed. Now node label expression specified in Resource Request will only 
> respected when it specified at ANY level. And a ResourceRequest/host with 
> multiple node labels will make user limit, etc. computation becomes more 
> tricky.
> Now we need temporarily disable them, changes include,
> - AMRMClient
> - ApplicationMasterService
> - RMAdminCLI
> - CommonNodeLabelsManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes

2016-04-06 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4794:
--
Attachment: YARN-4794.1.patch

> Distributed shell app gets stuck on stopping containers after App completes
> ---
>
> Key: YARN-4794
> URL: https://issues.apache.org/jira/browse/YARN-4794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-4794.1.patch
>
>
> Distributed shell app gets stuck on stopping containers after App completes 
> with the following exception
> {code:title = app log}
> 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application 
> completed. Stopping running containers
> 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to 
> the server : java.nio.channels.ClosedByInterruptException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4886) Add HDFS caller context for EntityGroupFSTimelineStore

2016-04-06 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229240#comment-15229240
 ] 

Mingliang Liu commented on YARN-4886:
-

+1 (non-binding)

> Add HDFS caller context for EntityGroupFSTimelineStore
> --
>
> Key: YARN-4886
> URL: https://issues.apache.org/jira/browse/YARN-4886
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4886-trunk.001.patch
>
>
> We need to add a HDFS caller context for the entity group FS storage for 
> better audit log debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4886) Add HDFS caller context for EntityGroupFSTimelineStore

2016-04-06 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4886:

Attachment: YARN-4886-trunk.001.patch

Upload a simple patch to add HDFS caller context information for ATS storage. 

> Add HDFS caller context for EntityGroupFSTimelineStore
> --
>
> Key: YARN-4886
> URL: https://issues.apache.org/jira/browse/YARN-4886
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4886-trunk.001.patch
>
>
> We need to add a HDFS caller context for the entity group FS storage for 
> better audit log debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4930) Convenience command to perform a work-preserving stop of the nodemanager

2016-04-06 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-4930:


 Summary: Convenience command to perform a work-preserving stop of 
the nodemanager
 Key: YARN-4930
 URL: https://issues.apache.org/jira/browse/YARN-4930
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jason Lowe
Priority: Minor


When the nodemanager is not under supervision 
(yarn.nodemanager.recovery.supervised=false) there may be cases where an admin 
wants to perform a work-preserving maintenance of the node but the nodemanager 
stop CLI command will cause the NM to cleanup containers.  There should be a 
CLI command that shuts down the NM while preserving local filesystem and 
container state to allow a subsequent nodemanager start to recover containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4930) Convenience command to perform a work-preserving stop of the nodemanager

2016-04-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229060#comment-15229060
 ] 

Jason Lowe commented on YARN-4930:
--

Currently admins can accomplish the task via a kill -9 of the nodemanager 
process, but it would be better to encapsulate the specific mechanism into a 
CLI command.

> Convenience command to perform a work-preserving stop of the nodemanager
> 
>
> Key: YARN-4930
> URL: https://issues.apache.org/jira/browse/YARN-4930
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jason Lowe
>Priority: Minor
>
> When the nodemanager is not under supervision 
> (yarn.nodemanager.recovery.supervised=false) there may be cases where an 
> admin wants to perform a work-preserving maintenance of the node but the 
> nodemanager stop CLI command will cause the NM to cleanup containers.  There 
> should be a CLI command that shuts down the NM while preserving local 
> filesystem and container state to allow a subsequent nodemanager start to 
> recover containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)

2016-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229052#comment-15229052
 ] 

Hadoop QA commented on YARN-4630:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 3s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 17s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 17s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 21s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 20s 
{color} | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 16s 
{color} | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 16s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 53s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 21s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed with 
JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 27s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| 

[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults

2016-04-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229017#comment-15229017
 ] 

Li Lu commented on YARN-3461:
-

I looked at the patch and felt OK with it. 

> Consolidate flow name/version/run defaults
> --
>
> Key: YARN-3461
> URL: https://issues.apache.org/jira/browse/YARN-3461
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3461-YARN-2928.01.patch, 
> YARN-3461-YARN-2928.02.patch
>
>
> In YARN-3391, it's not resolved what should be the defaults for flow 
> name/version/run. Let's continue the discussion here and unblock YARN-3391 
> from moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults

2016-04-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228955#comment-15228955
 ] 

Varun Saxena commented on YARN-3461:


Ok...

> Consolidate flow name/version/run defaults
> --
>
> Key: YARN-3461
> URL: https://issues.apache.org/jira/browse/YARN-3461
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3461-YARN-2928.01.patch, 
> YARN-3461-YARN-2928.02.patch
>
>
> In YARN-3391, it's not resolved what should be the defaults for flow 
> name/version/run. Let's continue the discussion here and unblock YARN-3391 
> from moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-04-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228937#comment-15228937
 ] 

Naganarasimha G R commented on YARN-4855:
-

Thanks  [~sunilg] for sharing your views, as i mentioned earlier IIUC the 
intent of not validating the existance of the node is to just make admin life 
easier so that in certain cases he need not worry whether the node is 
temporarily down and he can just apply the label mappings. so approaches you 
mentioned overrides this behavior. 

But I can also understand [~Tao Jie]'s concern that if the node (host ip or 
port) is wrongly configured better to warn before hand than erroneously 
configure the cluster.
So we have pro's and cons with the existing approach.

So i am ok with [~Tao Jie]'s approach but with better option name (-verifyNode 
?), also one more aspect is does it require a protocol change can we do it in 
the client side ? 

> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Priority: Minor
> Attachments: YARN-4855.001.patch
>
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be 
> denied if node does not exist.
> When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would 
> add nodelabels no matter whether node exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4807) MockAM#waitForState sleep duration is too long

2016-04-06 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228934#comment-15228934
 ] 

Yufei Gu commented on YARN-4807:


[~ka...@cloudera.com], you are right. I have a misunderstanding to the Hadoop 
QA's message. I found the following unit test cases failed because of we remove 
the minimum time for attempt. So I will file a JIRA YARN-4929 for it. 

- TestAMRestart.testRMAppAttemptFailuresValidityInterval 
- TestApplicationMasterService.testResourceTypes
- TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers
- TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForFairSche
- TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForCapacitySche

> MockAM#waitForState sleep duration is too long
> --
>
> Key: YARN-4807
> URL: https://issues.apache.org/jira/browse/YARN-4807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>  Labels: newbie
> Attachments: YARN-4807.001.patch, YARN-4807.002.patch, 
> YARN-4807.003.patch, YARN-4807.004.patch, YARN-4807.005.patch, 
> YARN-4807.006.patch, YARN-4807.007.patch
>
>
> MockAM#waitForState sleep duration (500 ms) is too long. Also, there is 
> significant duplication with MockRM#waitForState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4929) Fix unit test cases failed because of removing the minimum wait time for attempt.

2016-04-06 Thread Yufei Gu (JIRA)
Yufei Gu created YARN-4929:
--

 Summary: Fix unit test cases failed because of removing the 
minimum wait time for attempt.
 Key: YARN-4929
 URL: https://issues.apache.org/jira/browse/YARN-4929
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yufei Gu
Assignee: Yufei Gu


The following unit test cases failed because of we remove the minimum wait time 
for attempt. 

- TestAMRestart.testRMAppAttemptFailuresValidityInterval 
- TestApplicationMasterService.testResourceTypes
- TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers
- TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForFairSche
- TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForCapacitySche



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4562) YARN WebApp ignores the configuration passed to it for keystore settings

2016-04-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228924#comment-15228924
 ] 

Sergey Shelukhin commented on YARN-4562:


That is true only if ssl-server.xml is present :) Yes, that works.

> YARN WebApp ignores the configuration passed to it for keystore settings
> 
>
> Key: YARN-4562
> URL: https://issues.apache.org/jira/browse/YARN-4562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: YARN-4562.patch
>
>
> The conf can be passed to WebApps builder, however the following code in 
> WebApps.java that builds the HttpServer2 object:
> {noformat}
> if (httpScheme.equals(WebAppUtils.HTTPS_PREFIX)) {
>   WebAppUtils.loadSslConfiguration(builder);
> }
> {noformat}
> ...results in loadSslConfiguration creating a new Configuration object; the 
> one that is passed in is ignored, as far as the keystore/etc. settings are 
> concerned.  loadSslConfiguration has another overload with Configuration 
> parameter that should be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4810) NM applicationpage cause internal error 500

2016-04-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228925#comment-15228925
 ] 

Naganarasimha G R commented on YARN-4810:
-

Thanks [~bibinchundatt] for the patch, lgtm +1. Will wait for some time before 
committing it.

> NM applicationpage cause internal error 500
> ---
>
> Key: YARN-4810
> URL: https://issues.apache.org/jira/browse/YARN-4810
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4810.patch, 0002-YARN-4810.patch, 1.png, 2.png
>
>
> Use url /node/application/
> *Case 1*
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.dao.AppInfo.(AppInfo.java:45)
> at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58)
> ... 44 more
> {noformat}
> *Case 2*
> {noformat}
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused by: java.util.NoSuchElementException
> at 
> com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
> at 
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:131)
> at 
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:126)
> at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:79)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58)
> ... 44 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults

2016-04-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228920#comment-15228920
 ] 

Naganarasimha G R commented on YARN-3461:
-

Thanks for the patch [~sjlee0]
few nits in the patch
# i am not sure why earlier code was trying to set these flags in 
{{AMLauncher.setupTokens}} methods, would it be better to set it in the caller 
{{createAMContainerLaunchContext}} *or* a seperate method for it *or* change 
the method name to be more meaningful. Thoughts?
# i think instead of calling {{setFlowTags}} twice may be we can have 
additional parameter indicating the default value along with pushing of 
{{tag.split(":", 2)}} inside {{setFlowTags}}

> Consolidate flow name/version/run defaults
> --
>
> Key: YARN-3461
> URL: https://issues.apache.org/jira/browse/YARN-3461
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3461-YARN-2928.01.patch, 
> YARN-3461-YARN-2928.02.patch
>
>
> In YARN-3391, it's not resolved what should be the defaults for flow 
> name/version/run. Let's continue the discussion here and unblock YARN-3391 
> from moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2016-04-06 Thread Marcin Tustin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228878#comment-15228878
 ] 

Marcin Tustin commented on YARN-3216:
-

[~sunilg] Yeah, YARN-4751 will be VERY nice to have. 

[~wangda] [~Naganarasimha Garla] My pleasure! Just trying to build up the 
community's knowledge, and keep it shared. 

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2016-04-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228874#comment-15228874
 ] 

Naganarasimha G R commented on YARN-3216:
-

Thanks [~marcin.tustin] its a useful doc and yes [~sunilg] it would be ideal to 
have fix specific to 2.7.x as its a mjor limitation to use nodelabels.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2016-04-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228860#comment-15228860
 ] 

Sunil G commented on YARN-3216:
---

Yes. This is helpful. Thanks Marcin.

With the recent progress in YARN-4751, I think I can make some necessary 
changes here too. I will wait to see the dependencies cleared out for 
YARN-4751, and will make the progress here.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2016-04-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228853#comment-15228853
 ] 

Wangda Tan commented on YARN-3216:
--

Thanks [~marcin.tustin], that is very helpful!

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults

2016-04-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228845#comment-15228845
 ] 

Li Lu commented on YARN-3461:
-

Hi [~varun_saxena] could you please hold a bit so that I can have a look at the 
patch this afternoon? Thanks! 

> Consolidate flow name/version/run defaults
> --
>
> Key: YARN-3461
> URL: https://issues.apache.org/jira/browse/YARN-3461
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3461-YARN-2928.01.patch, 
> YARN-3461-YARN-2928.02.patch
>
>
> In YARN-3391, it's not resolved what should be the defaults for flow 
> name/version/run. Let's continue the discussion here and unblock YARN-3391 
> from moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2016-04-06 Thread Marcin Tustin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228839#comment-15228839
 ] 

Marcin Tustin commented on YARN-3216:
-

I've written up how we worked around this issue here: 
https://medium.com/handy-tech/practical-capacity-scheduling-with-yarn-28548ae4fb88#.5ihha0oqy



> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2016-04-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228836#comment-15228836
 ] 

Sunil G commented on YARN-4925:
---

Thanks [~Naganarasimha Garla] for the clarification. One final doubt, in this 
case if we change the labelExpression for ANY  containerRequest ,  it will 
replace the expression set by node/rack local earlier. Correct.

> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2016-04-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228824#comment-15228824
 ] 

Naganarasimha G R commented on YARN-4925:
-

IIUC, [~bibinchundatt]'s solution was to keep the code same i.e. set the 
nodelabel expression present containerRequest for only *any* request thus user 
will be able to specify node/rack locality with label expression.


> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2016-04-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228803#comment-15228803
 ] 

Sunil G commented on YARN-4925:
---

Hi [~bibinchundatt]
I have one doubt here. After removing this check, now ResourceRequest can have 
label exp for NodeLocal or RackLocal or ANY. With YARN-4140, label specified in 
ANY will reset all other ResourceRequest's label expression (for same 
priority). Is this intended OR will it solve the case mentioned in Spark 
scenario?

> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults

2016-04-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228748#comment-15228748
 ] 

Varun Saxena commented on YARN-3461:


[~sjlee0], thanks for updating the patch.
It looks good to me overall. Will wait for a while before committing it, to 
give others a chance to look at it.

> Consolidate flow name/version/run defaults
> --
>
> Key: YARN-3461
> URL: https://issues.apache.org/jira/browse/YARN-3461
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3461-YARN-2928.01.patch, 
> YARN-3461-YARN-2928.02.patch
>
>
> In YARN-3391, it's not resolved what should be the defaults for flow 
> name/version/run. Let's continue the discussion here and unblock YARN-3391 
> from moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon

2016-04-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228743#comment-15228743
 ] 

Steve Loughran commented on YARN-4928:
--

an absolute {{/tmp}} path is "probably" mapped into a path on the current 
drive, e.g. {{c:\tmp}} . (Trivia: every drive on a windows system has its own 
current directory). 

> Some yarn.server.timeline.* tests fail on Windows attempting to use a test 
> root path containing a colon
> ---
>
> Key: YARN-4928
> URL: https://issues.apache.org/jira/browse/YARN-4928
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Minor
> Attachments: YARN-4928.001.patch
>
>
> yarn.server.timeline.TestEntityGroupFSTimelineStore.* and 
> yarn.server.timeline.TestLogInfo.* fail on Windows, because they are 
> attempting to use a test root paths like 
> "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo",
>  which contains a ":" (after the Windows drive letter) and 
> DFSUtil.isValidName() does not accept paths containing ":".
> This problem is identical to HDFS-6189, so I suggest to use the same 
> approach: using "/tmp/..." as test root dir instead of 
> System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4784) fair scheduler: defaultQueueSchedulingPolicy should not accept fifo as a value

2016-04-06 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-4784:
---
Attachment: YARN-4784.002.patch

[~ka...@cloudera.com], I uploaded a new patch. Would you please take a look?

> fair scheduler: defaultQueueSchedulingPolicy should not accept fifo as a value
> --
>
> Key: YARN-4784
> URL: https://issues.apache.org/jira/browse/YARN-4784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-4784.001.patch, YARN-4784.002.patch
>
>
> The configure item defaultQueueSchedulingPolicy should not accept fifo as a 
> value since it is an invalid value for non-leaf queues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-04-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228684#comment-15228684
 ] 

Sunil G commented on YARN-4855:
---

bq.This would be a incompatible change 
Yes. I agree with [~Naganarasimha Garla] here. This can be an incompatible 
behavior compared to older version. Suddenly we may get some exception which 
were not thrown earlier.

{{yarn rmadmin -replaceLabelsOnNode -checkNode "node1=label1"}} seems to me 
like a workaround rather than a clean fix. And it may be difficult to 
understand what is meant by {{-checkNode}}. So I would like to get some more 
thoughts on original point with few suggestions. 
- Could we support this validation in {{replaceLabelsOnNode}} and document it 
cleanly about the new validation.  OR
- Can we skip such nodes which are not existing, but log it clearly (may be 
audit log too).

I think in this earlier stages, it may be better. But its very much debatable. 
Thoughts?

> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Priority: Minor
> Attachments: YARN-4855.001.patch
>
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be 
> denied if node does not exist.
> When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would 
> add nodelabels no matter whether node exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)

2016-04-06 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-4630:

Assignee: Kousuke Saruta

> Remove useless boxing/unboxing code (Hadoop YARN)
> -
>
> Key: YARN-4630
> URL: https://issues.apache.org/jira/browse/YARN-4630
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Attachments: YARN-4630.0.patch, YARN-4630.1.patch
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults

2016-04-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228509#comment-15228509
 ] 

Sangjin Lee commented on YARN-3461:
---

The unit test failures should be unrelated. I have seen those failures on trunk 
as well. I would greatly appreciate your review. Thanks!

> Consolidate flow name/version/run defaults
> --
>
> Key: YARN-3461
> URL: https://issues.apache.org/jira/browse/YARN-3461
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3461-YARN-2928.01.patch, 
> YARN-3461-YARN-2928.02.patch
>
>
> In YARN-3391, it's not resolved what should be the defaults for flow 
> name/version/run. Let's continue the discussion here and unblock YARN-3391 
> from moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4756) Unnecessary wait in Node Status Updater during reboot

2016-04-06 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228520#comment-15228520
 ] 

Eric Badger commented on YARN-4756:
---

[~kasha], does my explanation make sense? Do you have any other concerns with 
this patch?

> Unnecessary wait in Node Status Updater during reboot
> -
>
> Key: YARN-4756
> URL: https://issues.apache.org/jira/browse/YARN-4756
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-4756.001.patch, YARN-4756.002.patch, 
> YARN-4756.003.patch, YARN-4756.004.patch, YARN-4756.005.patch
>
>
> The startStatusUpdater thread waits for the isStopped variable to be set to 
> true, but it is waiting for the next heartbeat. During a reboot, the next 
> heartbeat will not come and so the thread waits for a timeout. Instead, we 
> should notify the thread to continue so that it can check the isStopped 
> variable and exit without having to wait for a timeout. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI

2016-04-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228492#comment-15228492
 ] 

Sunil G commented on YARN-4624:
---

I think this has gone stale. Lets move forward.

In YARN-4304, one of the small improvement was to hide {{maxAMPercentageLimit}} 
for parent queue. We achieved this by declaring this variable as {{Float}} and 
was setting to null if queue is ParentQueue. 

Now we have got this corner scenario when labels are available in cluster, and 
no label-mappings are defined in CS. We have fixed partially in YARN-4634, 
however we can still have some more cases like here. So ideally its better to 
handle  null check for {{getMaxAMPercentageLimit}} from UI to keep the 
improvement what we have done as part of YARN-4304.

But after seeing the findbug warning, essentially we are boxing and unboxing 
this float variable. And I could see that there are some other parrellel effort 
going on in other tickets to avoid such cases.YARN-4630.

So I think we can still keep {{float}} and avoid this problem. But we will 
loose small part of improvement there.  So I think may be we can group this new 
metrics (maxAMPercentageLimit etc) in other DAO object and can use it. If its 
fine we can go with v1 patch here and unblock the scheduler issue. And other 
improvement can be tracked separately. [~leftnoteasy], Could you pls share your 
thoughts here.


> NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
> ---
>
> Key: YARN-4624
> URL: https://issues.apache.org/jira/browse/YARN-4624
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, 
> YARN-4624-003.patch, YARN-4624.patch
>
>
> Scenario:
> ===
> Configure nodelables and add to cluster
> Start the cluster
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon

2016-04-06 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228424#comment-15228424
 ] 

Arpit Agarwal commented on YARN-4928:
-

>From looking at the test cases I think these paths will be only used as HDFS 
>paths.

> Some yarn.server.timeline.* tests fail on Windows attempting to use a test 
> root path containing a colon
> ---
>
> Key: YARN-4928
> URL: https://issues.apache.org/jira/browse/YARN-4928
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Priority: Minor
> Attachments: YARN-4928.001.patch
>
>
> yarn.server.timeline.TestEntityGroupFSTimelineStore.* and 
> yarn.server.timeline.TestLogInfo.* fail on Windows, because they are 
> attempting to use a test root paths like 
> "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo",
>  which contains a ":" (after the Windows drive letter) and 
> DFSUtil.isValidName() does not accept paths containing ":".
> This problem is identical to HDFS-6189, so I suggest to use the same 
> approach: using "/tmp/..." as test root dir instead of 
> System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon

2016-04-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228459#comment-15228459
 ] 

Gergely Novák commented on YARN-4928:
-

No, you're right, this actually does use the local file system. HDFS-6189 uses 
a different approach as well, let me update the patch based on that.

> Some yarn.server.timeline.* tests fail on Windows attempting to use a test 
> root path containing a colon
> ---
>
> Key: YARN-4928
> URL: https://issues.apache.org/jira/browse/YARN-4928
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Minor
> Attachments: YARN-4928.001.patch
>
>
> yarn.server.timeline.TestEntityGroupFSTimelineStore.* and 
> yarn.server.timeline.TestLogInfo.* fail on Windows, because they are 
> attempting to use a test root paths like 
> "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo",
>  which contains a ":" (after the Windows drive letter) and 
> DFSUtil.isValidName() does not accept paths containing ":".
> This problem is identical to HDFS-6189, so I suggest to use the same 
> approach: using "/tmp/..." as test root dir instead of 
> System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon

2016-04-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4928:
-
Assignee: Gergely Novák

> Some yarn.server.timeline.* tests fail on Windows attempting to use a test 
> root path containing a colon
> ---
>
> Key: YARN-4928
> URL: https://issues.apache.org/jira/browse/YARN-4928
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Minor
> Attachments: YARN-4928.001.patch
>
>
> yarn.server.timeline.TestEntityGroupFSTimelineStore.* and 
> yarn.server.timeline.TestLogInfo.* fail on Windows, because they are 
> attempting to use a test root paths like 
> "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo",
>  which contains a ":" (after the Windows drive letter) and 
> DFSUtil.isValidName() does not accept paths containing ":".
> This problem is identical to HDFS-6189, so I suggest to use the same 
> approach: using "/tmp/..." as test root dir instead of 
> System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon

2016-04-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228450#comment-15228450
 ] 

Junping Du commented on YARN-4928:
--

Oh. Sorry. My previous comments could be misleading... Yes. It is HDFS path so 
patch should be OK.
+1 too. 

> Some yarn.server.timeline.* tests fail on Windows attempting to use a test 
> root path containing a colon
> ---
>
> Key: YARN-4928
> URL: https://issues.apache.org/jira/browse/YARN-4928
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Priority: Minor
> Attachments: YARN-4928.001.patch
>
>
> yarn.server.timeline.TestEntityGroupFSTimelineStore.* and 
> yarn.server.timeline.TestLogInfo.* fail on Windows, because they are 
> attempting to use a test root paths like 
> "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo",
>  which contains a ":" (after the Windows drive letter) and 
> DFSUtil.isValidName() does not accept paths containing ":".
> This problem is identical to HDFS-6189, so I suggest to use the same 
> approach: using "/tmp/..." as test root dir instead of 
> System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon

2016-04-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228418#comment-15228418
 ] 

Junping Du commented on YARN-4928:
--

Is absolute path "/tmp" works on Windows? I think we don't want test case to 
touch system absolute path in other JIRA discussions. [~ste...@apache.org], any 
comments here?

> Some yarn.server.timeline.* tests fail on Windows attempting to use a test 
> root path containing a colon
> ---
>
> Key: YARN-4928
> URL: https://issues.apache.org/jira/browse/YARN-4928
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Priority: Minor
> Attachments: YARN-4928.001.patch
>
>
> yarn.server.timeline.TestEntityGroupFSTimelineStore.* and 
> yarn.server.timeline.TestLogInfo.* fail on Windows, because they are 
> attempting to use a test root paths like 
> "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo",
>  which contains a ":" (after the Windows drive letter) and 
> DFSUtil.isValidName() does not accept paths containing ":".
> This problem is identical to HDFS-6189, so I suggest to use the same 
> approach: using "/tmp/..." as test root dir instead of 
> System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon

2016-04-06 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228409#comment-15228409
 ] 

Arpit Agarwal commented on YARN-4928:
-

+1 from me.

[~djp], do you have any comments?

> Some yarn.server.timeline.* tests fail on Windows attempting to use a test 
> root path containing a colon
> ---
>
> Key: YARN-4928
> URL: https://issues.apache.org/jira/browse/YARN-4928
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Priority: Minor
> Attachments: YARN-4928.001.patch
>
>
> yarn.server.timeline.TestEntityGroupFSTimelineStore.* and 
> yarn.server.timeline.TestLogInfo.* fail on Windows, because they are 
> attempting to use a test root paths like 
> "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo",
>  which contains a ":" (after the Windows drive letter) and 
> DFSUtil.isValidName() does not accept paths containing ":".
> This problem is identical to HDFS-6189, so I suggest to use the same 
> approach: using "/tmp/..." as test root dir instead of 
> System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4457) Cleanup unchecked types for EventHandler

2016-04-06 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-4457:
---
Attachment: YARN-4457.004.patch

Here's a rebased patch that includes all the changes and fixes the formatting 
issue.

> Cleanup unchecked types for EventHandler
> 
>
> Key: YARN-4457
> URL: https://issues.apache.org/jira/browse/YARN-4457
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4457.001.patch, YARN-4457.002.patch, 
> YARN-4457.003.patch, YARN-4457.004.patch
>
>
> The EventHandler class is often used in an untyped context resulting in a 
> bunch of warnings about unchecked usage.  The culprit is the 
> {{Dispatcher.getHandler()}} method.  Fixing the typing on the method to 
> return {{EventHandler}} instead of {{EventHandler}} clears up the 
> errors and doesn't not introduce any incompatible changes.  In the case that 
> some code does:
> {code}
> EventHandler h = dispatcher.getHandler();
> {code}
> it will still work and will issue a compiler warning about raw types.  There 
> are, however, no instances of this issue in the current source base.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)

2016-04-06 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated YARN-4630:
-
Attachment: YARN-4630.1.patch

[~ajisakaa] Sorry I've overlooked your feedback.  I've just fixed what you 
pointed out.

> Remove useless boxing/unboxing code (Hadoop YARN)
> -
>
> Key: YARN-4630
> URL: https://issues.apache.org/jira/browse/YARN-4630
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Priority: Minor
> Attachments: YARN-4630.0.patch, YARN-4630.1.patch
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4849) [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add licenses.

2016-04-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228320#comment-15228320
 ] 

Sunil G commented on YARN-4849:
---

Thank you [~leftnoteasy] for the detailed cleaning  up and mvn integration 
activity. Overall it looks fine for me.. I could build this -Pyarn-ui. But i 
havenot tested it, I will cover that as part of YARN-4515

Few comments from my end. Pls correct me if I am wrong.

1. In 
{{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnUI2.md}}
We may need some change in below part
{noformat}
Try it
--

* Packaging and deploying Hadoop in this branch
* Modify `app/adapters/yarn-app.js`, change `host` to your YARN RM web 
address
* If you running YARN RM in your localhost, you should install `npm 
install -g corsproxy` and run `corsproxy` to avoid CORS errors. More details: 
`https://www.npmjs.com/package/corsproxy`. And the `host` of 
`app/adapters/yarn-app.js` should start with `localhost:1337`.
* Run `ember server`
* Visit your app at [http://localhost:4200](http://localhost:4200).
{noformat}
- {{Modify `app/adapters/yarn-app.js`, change `host` to}}, With YARN-4514, we 
have a better way to handle this. We can refer to config.js instead.
- Run `ember server`. I think it is "ember serve". 

2. In hadoop-yarn-ui/pom.xml, will "clean" target work? With clean, we expect 
the generated source and war file to be removed, correct?
3. In src/main/resources/META-INF/LICENSE.txt, Apache TEZ ui is mentioned in 
many places. Is it Apache YARN UI?
4. I may be wrong here Or could not find this info. In 
hadoop-yarn-ui/src/main/webapp/tests, how this will be triggered by jenkins. 
Pls help to share some information.

> [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add 
> licenses.
> ---
>
> Key: YARN-4849
> URL: https://issues.apache.org/jira/browse/YARN-4849
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4849-YARN-3368.1.patch, 
> YARN-4849-YARN-3368.2.patch, YARN-4849-YARN-3368.3.patch, 
> YARN-4849-YARN-3368.4.patch, YARN-4849-YARN-3368.5.patch, 
> YARN-4849-YARN-3368.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4795) ContainerMetrics drops records

2016-04-06 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-4795:
---
Attachment: YARN-4795.002.patch

Nice catch.  Here's a rebased patch that addresses the issues.

> ContainerMetrics drops records
> --
>
> Key: YARN-4795
> URL: https://issues.apache.org/jira/browse/YARN-4795
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4795.001.patch, YARN-4795.002.patch
>
>
> The metrics2 system was implemented to deal with persistent sources.  
> {{ContainerMetrics}} is an ephemeral source, and so it causes problems.  
> Specifically, the {{ContainerMetrics}} only reports metrics once after the 
> container has been stopped.  This behavior is a problem because the metrics2 
> system can ask sources for reports that will be quietly dropped by the sinks 
> that care.  (It's a metrics2 feature, not a bug.)  If that final report is 
> silently dropped, it's lost, because the {{ContainerMetrics}} won't report 
> anything else ever anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2016-04-06 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4514:
--
Attachment: YARN-4514-YARN-3368.4.patch

Thank you [~leftnoteasy] for the comments. Yes, I agree. Its a valid point. We 
will have it also configured by default.

> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-4514-YARN-3368.1.patch, 
> YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, 
> YARN-4514-YARN-3368.4.patch
>
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-06 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228258#comment-15228258
 ] 

Nathan Roberts commented on YARN-4924:
--

Sorry [~sandflee]. I missed your comment about updating YARN-4051. That seems 
fine with me!

> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-06 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4924:
-
Assignee: (was: Nathan Roberts)

> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-06 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228243#comment-15228243
 ] 

Nathan Roberts commented on YARN-4924:
--

Thanks [~sandflee], [~jlowe] for the suggestion. I'll work up a fix soon.

> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-06 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts reassigned YARN-4924:


Assignee: Nathan Roberts

> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228198#comment-15228198
 ] 

Jason Lowe commented on YARN-4924:
--

I agree with [~sandflee] that postponing the finish app event dispatch until 
after we've waited for the containers to complete recovering would be an 
appropriate fix.

> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-04-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228183#comment-15228183
 ] 

Naganarasimha G R commented on YARN-4855:
-

Thanks for working on the patch, but would like to know the view of other 
community members as you are modifying the protocol in the patch.
Any thoughts ? cc/ [~wangda], [~sunilg]

> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Priority: Minor
> Attachments: YARN-4855.001.patch
>
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be 
> denied if node does not exist.
> When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would 
> add nodelabels no matter whether node exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon

2016-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228082#comment-15228082
 ] 

Hadoop QA commented on YARN-4928:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 26s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 9s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 8s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s 
{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch 
passed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s 
{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch 
passed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 59s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12797274/YARN-4928.001.patch |
| JIRA Issue | YARN-4928 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 769b4dfe493c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Updated] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon

2016-04-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Novák updated YARN-4928:

Attachment: YARN-4928.001.patch

> Some yarn.server.timeline.* tests fail on Windows attempting to use a test 
> root path containing a colon
> ---
>
> Key: YARN-4928
> URL: https://issues.apache.org/jira/browse/YARN-4928
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Priority: Minor
> Attachments: YARN-4928.001.patch
>
>
> yarn.server.timeline.TestEntityGroupFSTimelineStore.* and 
> yarn.server.timeline.TestLogInfo.* fail on Windows, because they are 
> attempting to use a test root paths like 
> "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo",
>  which contains a ":" (after the Windows drive letter) and 
> DFSUtil.isValidName() does not accept paths containing ":".
> This problem is identical to HDFS-6189, so I suggest to use the same 
> approach: using "/tmp/..." as test root dir instead of 
> System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon

2016-04-06 Thread JIRA
Gergely Novák created YARN-4928:
---

 Summary: Some yarn.server.timeline.* tests fail on Windows 
attempting to use a test root path containing a colon
 Key: YARN-4928
 URL: https://issues.apache.org/jira/browse/YARN-4928
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.8.0
 Environment: OS: Windows Server 2012
JDK: 1.7.0_79
Reporter: Gergely Novák
Priority: Minor


yarn.server.timeline.TestEntityGroupFSTimelineStore.* and 
yarn.server.timeline.TestLogInfo.* fail on Windows, because they are attempting 
to use a test root paths like 
"/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo",
 which contains a ":" (after the Windows drive letter) and 
DFSUtil.isValidName() does not accept paths containing ":".

This problem is identical to HDFS-6189, so I suggest to use the same approach: 
using "/tmp/..." as test root dir instead of 
System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4855) Should check if node exists when replace nodelabels

2016-04-06 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated YARN-4855:
--
Attachment: YARN-4855.001.patch

> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Priority: Minor
> Attachments: YARN-4855.001.patch
>
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be 
> denied if node does not exist.
> When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would 
> add nodelabels no matter whether node exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4906) Capture container start/finish time in container metrics

2016-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227943#comment-15227943
 ] 

Hudson commented on YARN-4906:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9566 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9566/])
YARN-4906. Capture container start/finish time in container metrics. (vvasudev: 
rev b41e65e5bc9459b4d950a2c53860a223f1a0d2ec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestAuxServices.java


> Capture container start/finish time in container metrics
> 
>
> Key: YARN-4906
> URL: https://issues.apache.org/jira/browse/YARN-4906
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.9.0
>
> Attachments: YARN-4906.1.patch, YARN-4906.2.patch, YARN-4906.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4769) Add support for CSRF header in the dump capacity scheduler logs and kill app buttons in RM web UI

2016-04-06 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227904#comment-15227904
 ] 

Varun Vasudev commented on YARN-4769:
-

[~jianhe] - can you please review? Thanks!

> Add support for CSRF header in the dump capacity scheduler logs and kill app 
> buttons in RM web UI
> -
>
> Key: YARN-4769
> URL: https://issues.apache.org/jira/browse/YARN-4769
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4769.001.patch
>
>
> YARN-4737 adds support for CSRF filters in YARN. If the CSRF filter is 
> enabled, the current functionality to dump the capacity scheduler logs and 
> kill an app from the RM web UI will not work due to the missing CSRF header.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts

2016-04-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227867#comment-15227867
 ] 

Karthik Kambatla commented on YARN-4901:


TestNMReconnect fails with FairScheduler because of this. 

> MockRM should clear the QueueMetrics when it starts
> ---
>
> Key: YARN-4901
> URL: https://issues.apache.org/jira/browse/YARN-4901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> The {{ResourceManager}} rightly assumes that when it starts, it's starting 
> from naught.  The {{MockRM}}, however, violates that assumption.  For 
> example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} 
> instance.  The {{QueueMetrics.queueMetrics}} field is static, which means 
> that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} 
> bleed over.  Having the MockRM clear the {{QueueMetrics}} when it starts 
> should resolve the issue.  I haven't looked yet at scope to see how hard easy 
> that is to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-04-06 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227863#comment-15227863
 ] 

Tao Jie commented on YARN-4855:
---

[~Naganarasimha], thanks for reply.
We are trying to split resource pool for certain Applications from our cluster 
by nodeLabel. For example, we need a resource pool of 10 nodes, then we run a 
script with *rmadmin -replaceLabelsOnNode* cmd. However, this command always 
run successfully even though nodes are not available in cluster (node is down 
or misspelling of hostname). As a result, we should do something more to ensure 
the capacity of the resource pool. So I hope this option may help.

> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Priority: Minor
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be 
> denied if node does not exist.
> When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would 
> add nodelabels no matter whether node exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227816#comment-15227816
 ] 

Hadoop QA commented on YARN-3998:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 30s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 50s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 43s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 48s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
5s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 1s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 1s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 55s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 17 new + 
773 unchanged - 8 fixed = 790 total (was 781) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 23s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 14s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_77. {color} |

[jira] [Commented] (YARN-4927) TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the default

2016-04-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227786#comment-15227786
 ] 

Rohith Sharma K S commented on YARN-4927:
-

Thanks Karthik for finding out test failure. Even in HadoopQA result could not 
find because of test runs using default configurations. I think there are 2 
options to handle it, firstly test class can run with parameterized. But only 1 
test case depends on CS and rest all other tests are independent of any 
scheduler specific. Secondly, this particular test case can be moved out OR 
skip if fair scheduler is used.

> TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the 
> default
> 
>
> Key: YARN-4927
> URL: https://issues.apache.org/jira/browse/YARN-4927
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> YARN-3893 adds this test, that relies on some CapacityScheduler-specific 
> stuff for refreshAll to fail, which doesn't apply when using FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)