[jira] [Updated] (YARN-3104) RM generates new AMRM tokens every heartbeat between rolling and activation

2016-04-05 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3104:
---
Priority: Critical  (was: Major)

> RM generates new AMRM tokens every heartbeat between rolling and activation
> ---
>
> Key: YARN-3104
> URL: https://issues.apache.org/jira/browse/YARN-3104
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-3104.001.patch, YARN-3104.002.patch, 
> YARN-3104.003.patch
>
>
> When the RM rolls a new AMRM secret, it conveys this to the AMs when it 
> notices they are still connected with the old key.  However neither the RM 
> nor the AM explicitly close the connection or otherwise try to reconnect with 
> the new secret.  Therefore the RM keeps thinking the AM doesn't have the new 
> token on every heartbeat and keeps sending new tokens for the period between 
> the key roll and the key activation.  Once activated the RM no longer squawks 
> in its logs about needing to generate a new token every heartbeat (i.e.: 
> second) for every app, but the apps can still be using the old token.  The 
> token is only checked upon connection to the RM.  The apps don't reconnect 
> when sent a new token, and the RM doesn't force them to reconnect by closing 
> the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4927) TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the default

2016-04-05 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-4927:
--

 Summary: TestRMHA#testTransitionedToActiveRefreshFail fails when 
FairScheduler is the default
 Key: YARN-4927
 URL: https://issues.apache.org/jira/browse/YARN-4927
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.8.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


YARN-3893 adds this test, that relies on some CapacityScheduler-specific stuff 
for refreshAll to fail, which doesn't apply when using FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

2016-04-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227737#comment-15227737
 ] 

Hadoop QA commented on YARN-2883:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 36s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 12s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 22s 
{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 57s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 0m 57s {color} | 
{color:red} hadoop-yarn in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 57s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 6s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 1m 6s {color} | 
{color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 6s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 38s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 13 new + 
424 unchanged - 4 fixed = 437 total (was 428) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 14s 
{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 23s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 21s {color} 
| {color:red} 

[jira] [Commented] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2016-04-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227728#comment-15227728
 ] 

Wangda Tan commented on YARN-4514:
--

Thanks [~sunilg], 

I tried this patch locally on top of YARN-4849. 

Only one minor comment. It seems to me that we should modify localBaseUrl to be 
"localhost:1337/" before we solve CORS problem. Otherwise it will fail in 
default settings because corsproxy is not used.

> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-4514-YARN-3368.1.patch, 
> YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch
>
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4898) Avoid sending NODE_LABELS_UPDATE event to scheduler when node label is not configured

2016-04-05 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227726#comment-15227726
 ] 

Rohith Sharma K S commented on YARN-4898:
-

sure, please go ahead. Thanks

> Avoid sending NODE_LABELS_UPDATE event to scheduler when node label is not 
> configured
> -
>
> Key: YARN-4898
> URL: https://issues.apache.org/jira/browse/YARN-4898
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>
> It is observed that when ever new NodeManager is registered or expired with 
> ResourceManager, a new additional event NODE_LABELS_UPDATE has been triggered 
> even though *Node Label is not enabled*. This makes dispatcher to process one 
> event without any real use. 
> logs : where all 100 of node-labels-update events processed by dispatcher. 
> {noformat}
> 2016-03-30 15:42:24,461 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(175)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType:
>  NODE_LABELS_UPDATE
> 2016-03-30 15:42:24,461 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(175)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType:
>  NODE_LABELS_UPDATE
> 2016-03-30 15:42:24,462 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(175)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType:
>  NODE_LABELS_UPDATE
> 2016-03-30 15:42:24,462 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(175)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType:
>  NODE_LABELS_UPDATE
> {noformat}
> Point of concern is In large cluster, it is seen that registering thousands 
> of nodes altogether to running cluster might/would cause other events 
> processing to get delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-05 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227703#comment-15227703
 ] 

sandflee commented on YARN-4924:


In YARN-4051, we also had containers leak from NEW to DONE transition while 
recovering.  The difference is in YARN-4051 NM received FINISH_APP event after 
NM register while containers are not recovered.
As in YARN-4051, pending kill event is not suggested , may be we could delay 
the sending of FINISH_APP event after containers are recovered. If this is 
acceptable, I could update the patch in YARN-4051 to fix this.

> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-04-05 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227697#comment-15227697
 ] 

Jun Gong commented on YARN-3998:


[~vvasudev] Thanks for the review and comments. Attach a new patch 08.patch to 
address your comments and fix test cases errors. 

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, 
> YARN-3998.06.patch, YARN-3998.07.patch, YARN-3998.08.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2016-04-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227695#comment-15227695
 ] 

Naganarasimha G R commented on YARN-4925:
-

[~bibinchundatt],
At first glance seems to be a valid issue, IIUC we can directly apply the 
nodelabel expression to the any request and no need to have this check. 


> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-04-05 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3998:
---
Attachment: YARN-3998.08.patch

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, 
> YARN-3998.06.patch, YARN-3998.07.patch, YARN-3998.08.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-04-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227679#comment-15227679
 ] 

Naganarasimha G R commented on YARN-4855:
-

Thanks [~Tao Jie], for sharing the alternate option but would like to know the 
background as to why this is required ? If any strong usecase then can think of 
supporting it !

> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Priority: Minor
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be 
> denied if node does not exist.
> When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would 
> add nodelabels no matter whether node exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2883) Queuing of container requests in the NM

2016-04-05 Thread Konstantinos Karanasos (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-2883:
-
Attachment: YARN-2883-trunk.009.patch

I am attaching a new version of the patch, in which I have moved the queuing of 
containers to the {{QueuingContainerManagerImpl}}.
I purposely left in the {{QueuingContainersManagerImpl}} the part that had to 
do with the utilization of the allocated containers.
I also added all the needed synchronization.

[~kasha] and [~asuresh], please give it a look and let me know how it looks.

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-trunk.004.patch, YARN-2883-trunk.005.patch, 
> YARN-2883-trunk.006.patch, YARN-2883-trunk.007.patch, 
> YARN-2883-trunk.008.patch, YARN-2883-trunk.009.patch, 
> YARN-2883-yarn-2877.001.patch, YARN-2883-yarn-2877.002.patch, 
> YARN-2883-yarn-2877.003.patch, YARN-2883-yarn-2877.004.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4821) have a separate NM timeline publishing interval

2016-04-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227654#comment-15227654
 ] 

Naganarasimha G R commented on YARN-4821:
-

[~sjlee0],  i think we can remove this from the *"yarn-2928-1st-milestone"* 
list. Thoughts ?

> have a separate NM timeline publishing interval
> ---
>
> Key: YARN-4821
> URL: https://issues.apache.org/jira/browse/YARN-4821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> Currently the interval with which NM publishes container CPU and memory 
> metrics is tied to {{yarn.nodemanager.resource-monitor.interval-ms}} whose 
> default is 3 seconds. This is too aggressive.
> There should be a separate configuration that controls how often 
> {{NMTimelinePublisher}} publishes container metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3959) Store application related configurations in Timeline Service v2

2016-04-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227653#comment-15227653
 ] 

Naganarasimha G R commented on YARN-3959:
-

 I think we can remove this from the *"yarn-2928-1st-milestone"* list, Thoughts?

> Store application related configurations in Timeline Service v2
> ---
>
> Key: YARN-3959
> URL: https://issues.apache.org/jira/browse/YARN-3959
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
>
> We already have configuration field in HBase schema for application entity. 
> We need to make sure AM write it out when it get launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3196) [Compatibility] Make TS next gen be compatible with the current TS

2016-04-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227650#comment-15227650
 ] 

Naganarasimha G R commented on YARN-3196:
-

Should this be in  *"yarn-2928-1st-milestone"* list ?

> [Compatibility] Make TS next gen be compatible with the current TS
> --
>
> Key: YARN-3196
> URL: https://issues.apache.org/jira/browse/YARN-3196
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Junping Du
>  Labels: yarn-2928-1st-milestone
>
> File a jira to make sure that we don't forget to be compatible with the 
> current TS, such that we can smoothly move users to new TS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-04-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227648#comment-15227648
 ] 

Naganarasimha G R commented on YARN-4736:
-

Hi [~sjlee0], 
As per the discussions which has happened i think we can remove this from the 
*"yarn-2928-1st-milestone"* list

> Issues with HBaseTimelineWriterImpl
> ---
>
> Key: YARN-4736
> URL: https://issues.apache.org/jira/browse/YARN-4736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Vrushali C
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: NM_Hang_hbase1.0.3.tar.gz, hbaseException.log, 
> threaddump.log
>
>
> Faced some issues while running ATSv2 in single node Hadoop cluster and in 
> the same node had launched Hbase with embedded zookeeper.
> # Due to some NPE issues i was able to see NM was trying to shutdown, but the 
> NM daemon process was not completed due to the locks.
> # Got some exception related to Hbase after application finished execution 
> successfully. 
> will attach logs and the trace for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-04-05 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227603#comment-15227603
 ] 

Tao Jie commented on YARN-4855:
---

[~Naganarasimha], agree. To keep compatible, how about improvement like this:
*yarn rmadmin -replaceLabelsOnNode "node1=label1"* performs as it is working 
today.
*yarn rmadmin -replaceLabelsOnNode -checkNode "node1=label1"* would check nodes 
existence and be denied when node not exist

> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Priority: Minor
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be 
> denied if node does not exist.
> When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would 
> add nodelabels no matter whether node exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2016-04-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227600#comment-15227600
 ] 

Hadoop QA commented on YARN-3816:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 31s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 33s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
3s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
40s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 2s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
58s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
21s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 5s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 13 new + 
17 unchanged - 0 fixed = 30 total (was 17) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
54s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 35s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_77 with JDK v1.8.0_77 
generated 20 new + 80 unchanged - 20 fixed = 100 total (was 100) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 35s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-jdk1.8.0_77
 with JDK v1.8.0_77 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 6m 50s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_95 with JDK v1.7.0_95 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 58s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} 

[jira] [Created] (YARN-4926) Change nodelabel rest API invalid reponse status to 400

2016-04-05 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-4926:
--

 Summary: Change nodelabel rest API invalid reponse status to 400
 Key: YARN-4926
 URL: https://issues.apache.org/jira/browse/YARN-4926
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt


For cases  mentioned below need to change status from 404 to 400
# Add level in using post request with invalid pattern 
#Try modification of exclusivity for label using add request again 
# Replace label of node with label that doesn’t exist in cluster





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4926) Change nodelabel rest API invalid reponse status to 400

2016-04-05 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4926:
---
Description: 
For cases  mentioned below need to change status from 404 to 400
# Add level in using post request with invalid pattern 
# Try modification of exclusivity for label using add request again 
# Replace label of node with label that doesn’t exist in cluster



  was:
For cases  mentioned below need to change status from 404 to 400
# Add level in using post request with invalid pattern 
#Try modification of exclusivity for label using add request again 
# Replace label of node with label that doesn’t exist in cluster




> Change nodelabel rest API invalid reponse status to 400
> ---
>
> Key: YARN-4926
> URL: https://issues.apache.org/jira/browse/YARN-4926
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> For cases  mentioned below need to change status from 404 to 400
> # Add level in using post request with invalid pattern 
> # Try modification of exclusivity for label using add request again 
> # Replace label of node with label that doesn’t exist in cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2016-04-05 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4925:
---
Description: 
Currently with nodelabel AMRMClient will not be able to specify nodelabels with 
Node/Rack requests.For application like spark NODE_LOCAL requests cannot be 
asked with label expression.
As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}

{noformat}
// Don't allow specify node label against ANY request
if ((containerRequest.getRacks() != null && 
(!containerRequest.getRacks().isEmpty()))
|| 
(containerRequest.getNodes() != null && 
(!containerRequest.getNodes().isEmpty( {
  throw new InvalidContainerRequestException(
  "Cannot specify node label with rack and node");
}
{noformat}

{{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
OFF-SWITCH. 

The above check is not required for ContainerRequest ask /cc [~wangda] thank 
you for confirming

  was:
Currently with nodelabel AMRMClient will not be able to specify nodelabels with 
Node/Rack requests.For application like spark NODE_LOCAL requests cannot be 
asked with label expression.

{noformat}
// Don't allow specify node label against ANY request
if ((containerRequest.getRacks() != null && 
(!containerRequest.getRacks().isEmpty()))
|| 
(containerRequest.getNodes() != null && 
(!containerRequest.getNodes().isEmpty( {
  throw new InvalidContainerRequestException(
  "Cannot specify node label with rack and node");
}
{noformat}

{{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
OFF-SWITCH. 

The above check is not required for ContainerRequest ask /cc [~wangda] thank 
you for confirming


> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2016-04-05 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-4925:
--

 Summary: ContainerRequest in AMRMClient, application should be 
able to specify nodes/racks together with nodeLabelExpression
 Key: YARN-4925
 URL: https://issues.apache.org/jira/browse/YARN-4925
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt


Currently with nodelabel AMRMClient will not be able to specify nodelabels with 
Node/Rack requests.For application like spark NODE_LOCAL requests cannot be 
asked with label expression.

{noformat}
// Don't allow specify node label against ANY request
if ((containerRequest.getRacks() != null && 
(!containerRequest.getRacks().isEmpty()))
|| 
(containerRequest.getNodes() != null && 
(!containerRequest.getNodes().isEmpty( {
  throw new InvalidContainerRequestException(
  "Cannot specify node label with rack and node");
}
{noformat}

{{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
OFF-SWITCH. 

The above check is not required for ContainerRequest ask /cc [~wangda] thank 
you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4915) Fix typo in YARN Secure Containers documentation

2016-04-05 Thread Takashi Ohnishi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227524#comment-15227524
 ] 

Takashi Ohnishi commented on YARN-4915:
---

Thank you, [~templedf] and [~iwasakims] for reviewing and committing ! :)

> Fix typo in YARN Secure Containers documentation
> 
>
> Key: YARN-4915
> URL: https://issues.apache.org/jira/browse/YARN-4915
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, yarn
>Affects Versions: 2.7.2
>Reporter: Takashi Ohnishi
>Assignee: Takashi Ohnishi
>Priority: Trivial
> Fix For: 2.7.3
>
> Attachments: YARN-4915.1.patch
>
>
> `explictly forbiden` should be `explicitly forbidden`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4917) Fix typos in documentation of Capacity Scheduler.

2016-04-05 Thread Takashi Ohnishi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227521#comment-15227521
 ] 

Takashi Ohnishi commented on YARN-4917:
---

Thank you, [~iwasakims] for reviewing and committing! :)

> Fix typos in documentation of Capacity Scheduler.
> -
>
> Key: YARN-4917
> URL: https://issues.apache.org/jira/browse/YARN-4917
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.2
>Reporter: Takashi Ohnishi
>Assignee: Takashi Ohnishi
>Priority: Minor
> Fix For: 2.7.3
>
> Attachments: YARN-4917.1.patch
>
>
> There are some typos.
> For example,
> 'Adminstrators' should be 'Administrators',
> 'artifical' should be 'artificial', and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3461) Consolidate flow name/version/run defaults

2016-04-05 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3461:
--
Attachment: YARN-3461-YARN-2928.02.patch

Fixed the checkstyle issue.

The "test failures" are mostly compilation failures strangely. It looks as 
though the unit tests did not pick up the timelineservice-related code. We'll 
see what happens with the second jenkins run.

> Consolidate flow name/version/run defaults
> --
>
> Key: YARN-3461
> URL: https://issues.apache.org/jira/browse/YARN-3461
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3461-YARN-2928.01.patch, 
> YARN-3461-YARN-2928.02.patch
>
>
> In YARN-3391, it's not resolved what should be the defaults for flow 
> name/version/run. Let's continue the discussion here and unblock YARN-3391 
> from moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4699) Scheduler UI and REST o/p is not in sync when -replaceLabelsOnNode is used to change label of a node

2016-04-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227484#comment-15227484
 ] 

Sunil G commented on YARN-4699:
---

Thank you very much [~leftnoteasy] for the review and commit... 

> Scheduler UI and REST o/p is not in sync when -replaceLabelsOnNode is used to 
> change label of a node
> 
>
> Key: YARN-4699
> URL: https://issues.apache.org/jira/browse/YARN-4699
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.7.2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4699.patch, 0002-YARN-4699.patch, 
> AfterAppFInish-LabelY-Metrics.png, ForLabelX-AfterSwitch.png, 
> ForLabelY-AfterSwitch.png
>
>
> Scenario is as follows:
> a. 2 nodes are available in the cluster (node1 with label "x", node2 with 
> label "y")
> b. Submit an application to node1 for label "x". 
> c. Change node1 label to "y" by using *replaceLabelsOnNode* command.
> d. Verify Scheduler UI for metrics such as "Used Capacity", "Absolute 
> Capacity" etc. "x" still shows some capacity.
> e. Change node1 label back to "x" and verify UI and REST o/p
> Output:
> 1. "Used Capacity", "Absolute Capacity" etc are not decremented once labels 
> is changed for a node.
> 2. UI tab for respective label shows wrong GREEN color in these cases.
> 3. REST o/p is wrong for each label after executing above scenario.
> Attaching screen shots also. This ticket will try to cover UI and REST o/p 
> fix when label is changed runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults

2016-04-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227476#comment-15227476
 ] 

Hadoop QA commented on YARN-3461:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 34s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
6s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 11s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 15s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
13s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 33s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
23s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
11s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
1s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 15s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s 
{color} | {color:red} root: patch generated 1 new + 4 unchanged - 0 fixed = 5 
total (was 4) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 58s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 11s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 58s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 17s {color} 
| {color:red} hadoop-yarn-applications-distributedshell in the patch failed 
with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 24s {color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed with JDK 
v1.8.0_77. {color} |
| 

[jira] [Updated] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2016-04-05 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3816:

Attachment: YARN-3816-YARN-2928-v5.patch

OK I've done a major revise of the existing patch. Some key changes:
- Refactored the patch so that it applies to the latest branch. 
- Got some offline discussion with [~vinodkv]. We focus on real time 
aggregation for single data metrics for now. Reflect this in the latest patch. 
Specifically, this aggregation addresses the case where all containers post 
their metrics to the same collector, and we aggregate to get the total metric 
for the whole application. This aggregation is currently done by maintaining an 
aggregation table in the collector and periodically aggregate the table. 
- Provide an extendable interface to support more aggregation operations in 
future. Most binary commutative and associative operations (like average) can 
fit in this model. 
- Extend TimelineMetrics according to the suggestions from [~sjlee0]. However, 
instead of using "counters" and "gauges" to categorize all metrics, I used the 
type of the real time aggregation operation as the metadata of the metric. I 
was hoping in this way we're not limiting timeline metrics in the Hadoop scope. 

Some future works:
- Decide the reader API for the aggregated entities. From a web ui point of 
view, it would be cool to integrate those data with applications. I.e., when an 
user request timeline data for one application, we can return the aggregated 
data back. 
- My goal is to make the aggregation process to be eventually consistent. 
However, maybe there are some concurrency related issues in this patch. Please 
feel free to point of there there's any. 
- More unit tests. 
- Support taking averages in aggregations. With the current code framework I 
think this should be a quick change, but it's of low priority so not in the 
first draft. (new JIRAs are welcome if anyone has the bandwidth.)
- Decide configs for the aggregation period. 
- Fault tolerance, not there yet... (new JIRAs are welcome if anyone has the 
bandwidth. )

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2016-04-05 Thread Shiwei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227456#comment-15227456
 ] 

Shiwei Guo commented on YARN-3933:
--

And  I didn't figure out how YARN-4809 fix the race condition, may be you mean 
we should apply both of the two patch?

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
>Reporter: Lavkesh Lahngir
>Assignee: Shiwei Guo
> Attachments: YARN-3933.001.patch, YARN-3933.002.patch, 
> YARN-3933.003.patch
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2016-04-05 Thread Shiwei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227453#comment-15227453
 ] 

Shiwei Guo commented on YARN-3933:
--

Hi, I do think remove the duplicated code is a better way. 
My only concern is that the counter updated in 
'FairScheduler#FaireupdateRootQueueMetrics();‘ is not protected in 'YARN-4809'. 
Any thoughts ?

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
>Reporter: Lavkesh Lahngir
>Assignee: Shiwei Guo
> Attachments: YARN-3933.001.patch, YARN-3933.002.patch, 
> YARN-3933.003.patch
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-857) Localization failures should be available in container diagnostics

2016-04-05 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-857:
-
Attachment: YARN-857-20160405.txt

TestNMProxy failure is fixed via YARN-4916. Fixing my test's issue.

> Localization failures should be available in container diagnostics
> --
>
> Key: YARN-857
> URL: https://issues.apache.org/jira/browse/YARN-857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: YARN-857-20160404.txt, YARN-857-20160405.txt, 
> YARN-857.1.patch, YARN-857.2.patch
>
>
> at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:106)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:978)
> Traced this down to DefaultExecutor which does not look at the exit code for 
> the localizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2016-04-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227409#comment-15227409
 ] 

Wangda Tan commented on YARN-3215:
--

+1 to latest patch, will commit by end of this week if no opposite opinions.

> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3215.v1.001.patch, YARN-3215.v2.001.patch, 
> YARN-3215.v2.002.patch, YARN-3215.v2.003.patch
>
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4849) [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add licenses.

2016-04-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227404#comment-15227404
 ] 

Wangda Tan commented on YARN-4849:
--

The -1 ASF license files is created by build framework.

[~sunilg], could you take a look at latest patch and let me know of you have 
any comments?

> [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add 
> licenses.
> ---
>
> Key: YARN-4849
> URL: https://issues.apache.org/jira/browse/YARN-4849
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4849-YARN-3368.1.patch, 
> YARN-4849-YARN-3368.2.patch, YARN-4849-YARN-3368.3.patch, 
> YARN-4849-YARN-3368.4.patch, YARN-4849-YARN-3368.5.patch, 
> YARN-4849-YARN-3368.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4911) Bad placement policy in FairScheduler causes the RM to crash

2016-04-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227387#comment-15227387
 ] 

Hadoop QA commented on YARN-4911:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 69 unchanged - 0 fixed = 70 total (was 69) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 24s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 46s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 141m 25s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.8.0_77 Timed out junit tests 

[jira] [Commented] (YARN-4699) Scheduler UI and REST o/p is not in sync when -replaceLabelsOnNode is used to change label of a node

2016-04-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227386#comment-15227386
 ] 

Hudson commented on YARN-4699:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9565 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9565/])
YARN-4699. Scheduler UI and REST o/p is not in sync when (wangda: rev 
21eb4284487d6f8e4beedb8a0c3168e952f224fc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java


> Scheduler UI and REST o/p is not in sync when -replaceLabelsOnNode is used to 
> change label of a node
> 
>
> Key: YARN-4699
> URL: https://issues.apache.org/jira/browse/YARN-4699
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.7.2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4699.patch, 0002-YARN-4699.patch, 
> AfterAppFInish-LabelY-Metrics.png, ForLabelX-AfterSwitch.png, 
> ForLabelY-AfterSwitch.png
>
>
> Scenario is as follows:
> a. 2 nodes are available in the cluster (node1 with label "x", node2 with 
> label "y")
> b. Submit an application to node1 for label "x". 
> c. Change node1 label to "y" by using *replaceLabelsOnNode* command.
> d. Verify Scheduler UI for metrics such as "Used Capacity", "Absolute 
> Capacity" etc. "x" still shows some capacity.
> e. Change node1 label back to "x" and verify UI and REST o/p
> Output:
> 1. "Used Capacity", "Absolute Capacity" etc are not decremented once labels 
> is changed for a node.
> 2. UI tab for respective label shows wrong GREEN color in these cases.
> 3. REST o/p is wrong for each label after executing above scenario.
> Attaching screen shots also. This ticket will try to cover UI and REST o/p 
> fix when label is changed runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4699) Scheduler UI and REST o/p is not in sync when -replaceLabelsOnNode is used to change label of a node

2016-04-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4699:
-
Fix Version/s: 2.8.0

> Scheduler UI and REST o/p is not in sync when -replaceLabelsOnNode is used to 
> change label of a node
> 
>
> Key: YARN-4699
> URL: https://issues.apache.org/jira/browse/YARN-4699
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.7.2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4699.patch, 0002-YARN-4699.patch, 
> AfterAppFInish-LabelY-Metrics.png, ForLabelX-AfterSwitch.png, 
> ForLabelY-AfterSwitch.png
>
>
> Scenario is as follows:
> a. 2 nodes are available in the cluster (node1 with label "x", node2 with 
> label "y")
> b. Submit an application to node1 for label "x". 
> c. Change node1 label to "y" by using *replaceLabelsOnNode* command.
> d. Verify Scheduler UI for metrics such as "Used Capacity", "Absolute 
> Capacity" etc. "x" still shows some capacity.
> e. Change node1 label back to "x" and verify UI and REST o/p
> Output:
> 1. "Used Capacity", "Absolute Capacity" etc are not decremented once labels 
> is changed for a node.
> 2. UI tab for respective label shows wrong GREEN color in these cases.
> 3. REST o/p is wrong for each label after executing above scenario.
> Attaching screen shots also. This ticket will try to cover UI and REST o/p 
> fix when label is changed runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults

2016-04-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227276#comment-15227276
 ] 

Sangjin Lee commented on YARN-3461:
---

Posted patch v.1.

I am proposing essentially the original default proposal again as in the 
[design 
document|https://issues.apache.org/jira/secure/attachment/12691416/ATSv2.rev2.pdf].
 Namely,
- flow name: the YARN app name (or the app id if the YARN app name is not set)
- flow version: "1"
- flow run id: the YARN app start time

I think this will cover most of the important use cases (e.g. old MR jobs that 
do not specify flows) and keep things simple enough. What Junping suggested 
above sounds reasonable and may add value for certain cases. However, I think 
there is something to be said about keeping things simple at this point. We 
could consider making this more configurable if such a need is identified later.

Let me know what you think. Also a review is greatly appreciated. Thanks!

> Consolidate flow name/version/run defaults
> --
>
> Key: YARN-3461
> URL: https://issues.apache.org/jira/browse/YARN-3461
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3461-YARN-2928.01.patch
>
>
> In YARN-3391, it's not resolved what should be the defaults for flow 
> name/version/run. Let's continue the discussion here and unblock YARN-3391 
> from moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3461) Consolidate flow name/version/run defaults

2016-04-05 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3461:
--
Attachment: YARN-3461-YARN-2928.01.patch

> Consolidate flow name/version/run defaults
> --
>
> Key: YARN-3461
> URL: https://issues.apache.org/jira/browse/YARN-3461
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3461-YARN-2928.01.patch
>
>
> In YARN-3391, it's not resolved what should be the defaults for flow 
> name/version/run. Let's continue the discussion here and unblock YARN-3391 
> from moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4911) Bad placement policy in FairScheduler causes the RM to crash

2016-04-05 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-4911:
-
Attachment: YARN-4911.001.patch

- Rather than remove the exception, catch it at a higher level.  That allows 
for a clearer error message.  This results in the message:

java.io.IOException: Failed to run job : Unable to match app 
application_1459890777944_0001 to a queue placement policy.  Check with an 
administrator to make sure that you are submitting to a valid queue and/or 
check that the queue placement policies have the create property set to true.


> Bad placement policy in FairScheduler causes the RM to crash
> 
>
> Key: YARN-4911
> URL: https://issues.apache.org/jira/browse/YARN-4911
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Attachments: YARN-4911.001.patch
>
>
> When you have a fair-scheduler.xml with the rule:
>   
> 
>   
> and the queue okay1 doesn't exist, the following exception occurs in the RM:
> 2016-04-01 16:56:33,383 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IllegalStateException: Should have applied a rule before reaching 
> here
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:173)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:728)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:634)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1224)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:691)
> at java.lang.Thread.run(Thread.java:745)
> which causes the RM to crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4911) Bad placement policy in FairScheduler causes the RM to crash

2016-04-05 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-4911:
-
Labels: supportability  (was: )

> Bad placement policy in FairScheduler causes the RM to crash
> 
>
> Key: YARN-4911
> URL: https://issues.apache.org/jira/browse/YARN-4911
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
>
> When you have a fair-scheduler.xml with the rule:
>   
> 
>   
> and the queue okay1 doesn't exist, the following exception occurs in the RM:
> 2016-04-01 16:56:33,383 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IllegalStateException: Should have applied a rule before reaching 
> here
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:173)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:728)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:634)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1224)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:691)
> at java.lang.Thread.run(Thread.java:745)
> which causes the RM to crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4917) Fix typos in documentation of Capacity Scheduler.

2016-04-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226975#comment-15226975
 ] 

Hudson commented on YARN-4917:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9562 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9562/])
YARN-4917. Fix typos in documentation of Capacity Scheduler. (Takashi 
(iwasakims: rev 500e5a5952f8f34bf0e1e2653fa01b357d68cc8f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md


> Fix typos in documentation of Capacity Scheduler.
> -
>
> Key: YARN-4917
> URL: https://issues.apache.org/jira/browse/YARN-4917
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.2
>Reporter: Takashi Ohnishi
>Assignee: Takashi Ohnishi
>Priority: Minor
> Fix For: 2.7.3
>
> Attachments: YARN-4917.1.patch
>
>
> There are some typos.
> For example,
> 'Adminstrators' should be 'Administrators',
> 'artifical' should be 'artificial', and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4876) [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop

2016-04-05 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4876:
--
Attachment: YARN-4876-design-doc.pdf

Attaching an initial proposal to solicit feedback. [~vinodkv], [~vvasudev], Do 
let us know what you think.

> [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop
> --
>
> Key: YARN-4876
> URL: https://issues.apache.org/jira/browse/YARN-4876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-4876-design-doc.pdf
>
>
> Introduce *initialize* and *destroy* container API into the 
> *ContainerManagementProtocol* and decouple the actual start of a container 
> from the initialization. This will allow AMs to re-start a container without 
> having to lose the allocation.
> Additionally, if the localization of the container is associated to the 
> initialize (and the cleanup with the destroy), This can also be used by 
> applications to upgrade a Container by *re-initializing* with a new 
> *ContainerLaunchContext*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4876) [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop

2016-04-05 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4876:
--
Description: 
Introduce *initialize* and *destroy* container API into the 
*ContainerManagementProtocol* and decouple the actual start of a container from 
the initialization. This will allow AMs to re-start a container without having 
to lose the allocation.

Additionally, if the localization of the container is associated to the 
initialize (and the cleanup with the destroy), This can also be used by 
applications to upgrade a Container by *re-initializing* with a new 
*ContainerLaunchContext*

  was:
Introduce *initialize* and *destroy* container API into the 
*ContainerManagementProtocol* to decouple the actual start of a container from 
the initialization. This will allow AMs to re-start a container without having 
to lose the allocation.

Additionally, if the localization of the container is associated to the 
initialize (and the cleanup with the destroy), This can also be used by 
applications to upgrade a Container by *re-initializing* with a new 
*ContainerLaunchContext*


> [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop
> --
>
> Key: YARN-4876
> URL: https://issues.apache.org/jira/browse/YARN-4876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Introduce *initialize* and *destroy* container API into the 
> *ContainerManagementProtocol* and decouple the actual start of a container 
> from the initialization. This will allow AMs to re-start a container without 
> having to lose the allocation.
> Additionally, if the localization of the container is associated to the 
> initialize (and the cleanup with the destroy), This can also be used by 
> applications to upgrade a Container by *re-initializing* with a new 
> *ContainerLaunchContext*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4876) [Phase 1] Decouple Container Init / Destroy from Start / Stop

2016-04-05 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4876:
--
Description: 
Introduce *initialize* and *destroy* container API into the 
*ContainerManagementProtocol* so decouple the actual start of a container from 
the initialization. This will allow AMs to re-initialize and re-start a 
container without having to lose the allocation.

Additionally, if the localization of the container is assicoated to the 
initialize (and the cleanup with the destroy), This can also be used by 
applications to upgrade a Container by re-initializing with a new 
*ContainerLaunchContext*

> [Phase 1] Decouple Container Init / Destroy from Start / Stop
> -
>
> Key: YARN-4876
> URL: https://issues.apache.org/jira/browse/YARN-4876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Introduce *initialize* and *destroy* container API into the 
> *ContainerManagementProtocol* so decouple the actual start of a container 
> from the initialization. This will allow AMs to re-initialize and re-start a 
> container without having to lose the allocation.
> Additionally, if the localization of the container is assicoated to the 
> initialize (and the cleanup with the destroy), This can also be used by 
> applications to upgrade a Container by re-initializing with a new 
> *ContainerLaunchContext*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4876) [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop

2016-04-05 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4876:
--
Summary: [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop 
 (was: [Phase 1] Add support for explicit Init / Destroy of Containers 
decoupled from Start / Stop)

> [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop
> --
>
> Key: YARN-4876
> URL: https://issues.apache.org/jira/browse/YARN-4876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Introduce *initialize* and *destroy* container API into the 
> *ContainerManagementProtocol* to decouple the actual start of a container 
> from the initialization. This will allow AMs to re-start a container without 
> having to lose the allocation.
> Additionally, if the localization of the container is associated to the 
> initialize (and the cleanup with the destroy), This can also be used by 
> applications to upgrade a Container by *re-initializing* with a new 
> *ContainerLaunchContext*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-05 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226956#comment-15226956
 ] 

Nathan Roberts commented on YARN-4924:
--

Observed the following race with NM recovery.

1) ContainerManager handles a FINISH_APPS event causing 
storeFinishedApplication() to be recorded in state store (e.g. if RM kills 
application)
2) Prior to cleaning up the containers associated with this application, the NM 
dies
3) When NM restarts it attempts to recover the Application, Containers, and 
FinishedApplication events all associated with this application, in that order
4) This leads to a NEW to DONE transition for the containers, which will not 
try to cleanup the actual container since this is supposed to be a pre-LAUNCHED 
transition

iiuc, this happens because when the application transitions from NEW to INITING 
during Application recovery, the containerInitEvents aren't actually dispatched 
yet. They are delayed until the AppInitDoneTransition. However, the 
AppInitDoneTransition may not occur until after the recovery code has handled 
the FinishedApplicationEvent and queued up KILL_CONTAINER events. So, in 
effect, the containerKillEvents passed up the containerInitEvents leading to 
the NEW to DONE transition. 

{noformat}
2016-04-04 18:20:45,513 [main] INFO application.ApplicationImpl: Application 
application_1458666253602_2367938 transitioned from NEW to INITING
2016-04-04 18:20:56,437 [AsyncDispatcher event handler] INFO 
application.ApplicationImpl: Adding 
container_e08_1458666253602_2367938_01_04 to application 
application_1458666253602_2367938
2016-04-04 18:20:57,062 [AsyncDispatcher event handler] INFO 
application.ApplicationImpl: Application application_1458666253602_2367938 
transitioned from INITING to FINISHING_CONTAINERS_WAIT
2016-04-04 18:20:57,095 [AsyncDispatcher event handler] INFO 
container.ContainerImpl: Container 
container_e08_1458666253602_2367938_01_04 transitioned from NEW to DONE
2016-04-04 18:20:57,120 [AsyncDispatcher event handler] INFO 
application.ApplicationImpl: Removing 
container_e08_1458666253602_2367938_01_04 from application 
application_1458666253602_2367938
2016-04-04 18:20:57,120 [AsyncDispatcher event handler] INFO 
application.ApplicationImpl: Application application_1458666253602_2367938 
transitioned from FINISHING_CONTAINERS_WAIT to APPLICATION_RESOURCES_CLEANINGUP
{noformat}



> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-05 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-4924:


 Summary: NM recovery race can lead to container not cleaned up
 Key: YARN-4924
 URL: https://issues.apache.org/jira/browse/YARN-4924
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.2, 3.0.0
Reporter: Nathan Roberts


It's probably a small window but we observed a case where the NM crashed and 
then a container was not properly cleaned up during recovery.

I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4876) [Phase 1] Add support for explicit Init / Destroy of Containers decoupled from Start / Stop

2016-04-05 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4876:
--
Summary: [Phase 1] Add support for explicit Init / Destroy of Containers 
decoupled from Start / Stop  (was: [Phase 1] Decouple Container Init / Destroy 
from Start / Stop)

> [Phase 1] Add support for explicit Init / Destroy of Containers decoupled 
> from Start / Stop
> ---
>
> Key: YARN-4876
> URL: https://issues.apache.org/jira/browse/YARN-4876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Introduce *initialize* and *destroy* container API into the 
> *ContainerManagementProtocol* to decouple the actual start of a container 
> from the initialization. This will allow AMs to re-start a container without 
> having to lose the allocation.
> Additionally, if the localization of the container is associated to the 
> initialize (and the cleanup with the destroy), This can also be used by 
> applications to upgrade a Container by *re-initializing* with a new 
> *ContainerLaunchContext*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4876) [Phase 1] Decouple Container Init / Destroy from Start / Stop

2016-04-05 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4876:
--
Description: 
Introduce *initialize* and *destroy* container API into the 
*ContainerManagementProtocol* to decouple the actual start of a container from 
the initialization. This will allow AMs to re-start a container without having 
to lose the allocation.

Additionally, if the localization of the container is associated to the 
initialize (and the cleanup with the destroy), This can also be used by 
applications to upgrade a Container by *re-initializing* with a new 
*ContainerLaunchContext*

  was:
Introduce *initialize* and *destroy* container API into the 
*ContainerManagementProtocol* so decouple the actual start of a container from 
the initialization. This will allow AMs to re-initialize and re-start a 
container without having to lose the allocation.

Additionally, if the localization of the container is associated to the 
initialize (and the cleanup with the destroy), This can also be used by 
applications to upgrade a Container by *re-initializing* with a new 
*ContainerLaunchContext*


> [Phase 1] Decouple Container Init / Destroy from Start / Stop
> -
>
> Key: YARN-4876
> URL: https://issues.apache.org/jira/browse/YARN-4876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Introduce *initialize* and *destroy* container API into the 
> *ContainerManagementProtocol* to decouple the actual start of a container 
> from the initialization. This will allow AMs to re-start a container without 
> having to lose the allocation.
> Additionally, if the localization of the container is associated to the 
> initialize (and the cleanup with the destroy), This can also be used by 
> applications to upgrade a Container by *re-initializing* with a new 
> *ContainerLaunchContext*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4876) [Phase 1] Decouple Container Init / Destroy from Start / Stop

2016-04-05 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4876:
--
Description: 
Introduce *initialize* and *destroy* container API into the 
*ContainerManagementProtocol* so decouple the actual start of a container from 
the initialization. This will allow AMs to re-initialize and re-start a 
container without having to lose the allocation.

Additionally, if the localization of the container is associated to the 
initialize (and the cleanup with the destroy), This can also be used by 
applications to upgrade a Container by *re-initializing* with a new 
*ContainerLaunchContext*

  was:
Introduce *initialize* and *destroy* container API into the 
*ContainerManagementProtocol* so decouple the actual start of a container from 
the initialization. This will allow AMs to re-initialize and re-start a 
container without having to lose the allocation.

Additionally, if the localization of the container is associated to the 
initialize (and the cleanup with the destroy), This can also be used by 
applications to upgrade a Container by re-initializing with a new 
*ContainerLaunchContext*


> [Phase 1] Decouple Container Init / Destroy from Start / Stop
> -
>
> Key: YARN-4876
> URL: https://issues.apache.org/jira/browse/YARN-4876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Introduce *initialize* and *destroy* container API into the 
> *ContainerManagementProtocol* so decouple the actual start of a container 
> from the initialization. This will allow AMs to re-initialize and re-start a 
> container without having to lose the allocation.
> Additionally, if the localization of the container is associated to the 
> initialize (and the cleanup with the destroy), This can also be used by 
> applications to upgrade a Container by *re-initializing* with a new 
> *ContainerLaunchContext*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4876) [Phase 1] Decouple Container Init / Destroy from Start / Stop

2016-04-05 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4876:
--
Description: 
Introduce *initialize* and *destroy* container API into the 
*ContainerManagementProtocol* so decouple the actual start of a container from 
the initialization. This will allow AMs to re-initialize and re-start a 
container without having to lose the allocation.

Additionally, if the localization of the container is associated to the 
initialize (and the cleanup with the destroy), This can also be used by 
applications to upgrade a Container by re-initializing with a new 
*ContainerLaunchContext*

  was:
Introduce *initialize* and *destroy* container API into the 
*ContainerManagementProtocol* so decouple the actual start of a container from 
the initialization. This will allow AMs to re-initialize and re-start a 
container without having to lose the allocation.

Additionally, if the localization of the container is assicoated to the 
initialize (and the cleanup with the destroy), This can also be used by 
applications to upgrade a Container by re-initializing with a new 
*ContainerLaunchContext*


> [Phase 1] Decouple Container Init / Destroy from Start / Stop
> -
>
> Key: YARN-4876
> URL: https://issues.apache.org/jira/browse/YARN-4876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Introduce *initialize* and *destroy* container API into the 
> *ContainerManagementProtocol* so decouple the actual start of a container 
> from the initialization. This will allow AMs to re-initialize and re-start a 
> container without having to lose the allocation.
> Additionally, if the localization of the container is associated to the 
> initialize (and the cleanup with the destroy), This can also be used by 
> applications to upgrade a Container by re-initializing with a new 
> *ContainerLaunchContext*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4876) [Phase 1] Decouple Container Init / Destroy from Start / Stop

2016-04-05 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4876:
--
Summary: [Phase 1] Decouple Container Init / Destroy from Start / Stop  
(was: Introduce Single-use Allocation for Backward compatibility)

> [Phase 1] Decouple Container Init / Destroy from Start / Stop
> -
>
> Key: YARN-4876
> URL: https://issues.apache.org/jira/browse/YARN-4876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4915) Fix typo in YARN Secure Containers documentation

2016-04-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226932#comment-15226932
 ] 

Hudson commented on YARN-4915:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9561 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9561/])
YARN-4915. Fix typo in YARN Secure Containers documentation (Takashi 
(iwasakims: rev 30206346cf13fe1b7267f86e7c210b77c86b88c9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/SecureContainer.md


> Fix typo in YARN Secure Containers documentation
> 
>
> Key: YARN-4915
> URL: https://issues.apache.org/jira/browse/YARN-4915
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, yarn
>Affects Versions: 2.7.2
>Reporter: Takashi Ohnishi
>Assignee: Takashi Ohnishi
>Priority: Trivial
> Fix For: 2.7.3
>
> Attachments: YARN-4915.1.patch
>
>
> `explictly forbiden` should be `explicitly forbidden`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4917) Fix typos in documentation of Capacity Scheduler.

2016-04-05 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226929#comment-15226929
 ] 

Masatake Iwasaki commented on YARN-4917:


+1

> Fix typos in documentation of Capacity Scheduler.
> -
>
> Key: YARN-4917
> URL: https://issues.apache.org/jira/browse/YARN-4917
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.2
>Reporter: Takashi Ohnishi
>Assignee: Takashi Ohnishi
>Priority: Minor
> Attachments: YARN-4917.1.patch
>
>
> There are some typos.
> For example,
> 'Adminstrators' should be 'Administrators',
> 'artifical' should be 'artificial', and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4562) YARN WebApp ignores the configuration passed to it for keystore settings

2016-04-05 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4562:

Assignee: Sergey Shelukhin

> YARN WebApp ignores the configuration passed to it for keystore settings
> 
>
> Key: YARN-4562
> URL: https://issues.apache.org/jira/browse/YARN-4562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: YARN-4562.patch
>
>
> The conf can be passed to WebApps builder, however the following code in 
> WebApps.java that builds the HttpServer2 object:
> {noformat}
> if (httpScheme.equals(WebAppUtils.HTTPS_PREFIX)) {
>   WebAppUtils.loadSslConfiguration(builder);
> }
> {noformat}
> ...results in loadSslConfiguration creating a new Configuration object; the 
> one that is passed in is ignored, as far as the keystore/etc. settings are 
> concerned.  loadSslConfiguration has another overload with Configuration 
> parameter that should be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4562) YARN WebApp ignores the configuration passed to it for keystore settings

2016-04-05 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226905#comment-15226905
 ] 

Varun Vasudev commented on YARN-4562:
-

[~sershe] - just to clarify - we add ssl-server.xml to the configuration object 
so any settings you provide in your conf object will be overridden by 
corresponding settings in ssl-server.xml. Is that ok with you?

> YARN WebApp ignores the configuration passed to it for keystore settings
> 
>
> Key: YARN-4562
> URL: https://issues.apache.org/jira/browse/YARN-4562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: YARN-4562.patch
>
>
> The conf can be passed to WebApps builder, however the following code in 
> WebApps.java that builds the HttpServer2 object:
> {noformat}
> if (httpScheme.equals(WebAppUtils.HTTPS_PREFIX)) {
>   WebAppUtils.loadSslConfiguration(builder);
> }
> {noformat}
> ...results in loadSslConfiguration creating a new Configuration object; the 
> one that is passed in is ignored, as far as the keystore/etc. settings are 
> concerned.  loadSslConfiguration has another overload with Configuration 
> parameter that should be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails.

2016-04-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226898#comment-15226898
 ] 

Junping Du commented on YARN-4916:
--

Hi Vinod, this one is best to work with HADOOP-11212 which is in 2.9 so far. 
Before HADOOP-11212, the test won't failed on Linux but only on Windows/Mac for 
the reason in description. HADOOP-11212 wrap a better exception message but the 
test still check the old message. If needed, I can backport two patches to 
branch-2.8.

> TestNMProxy.tesNMProxyRPCRetry fails.
> -
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4325) purge app state from NM state-store should be independent of log aggregation

2016-04-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4325:
-
Target Version/s:   (was: 2.7.3, 2.6.5)

> purge app state from NM state-store should be independent of log aggregation
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. The reason is some wrong configuration 
> setting to log aggregation so the end of log aggregation events are not 
> received so stale apps are not purged properly. We should make sure the 
> removal of app state to be independent of log aggregation life cycle. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4325) purge app state from NM state-store should be independent of log aggregation

2016-04-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226861#comment-15226861
 ] 

Junping Du commented on YARN-4325:
--

bq. If set yarn.log-aggregation-enable = false, does NM recovery work well?
It supposed to be. YARN-2079 address non-aggregation log cases.

bq. Junping Du, any update on your debugging?
No. Unfortunately, I didn't track the original cluster again. Let's move it out 
of 2.7.3 and 2.6.5 and fix it next time we see this again...

> purge app state from NM state-store should be independent of log aggregation
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. The reason is some wrong configuration 
> setting to log aggregation so the end of log aggregation events are not 
> received so stale apps are not purged properly. We should make sure the 
> removal of app state to be independent of log aggregation life cycle. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails.

2016-04-05 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226858#comment-15226858
 ] 

Vinod Kumar Vavilapalli commented on YARN-4916:
---

Does this need to be on any of the older branches?

> TestNMProxy.tesNMProxyRPCRetry fails.
> -
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4562) YARN WebApp ignores the configuration passed to it for keystore settings

2016-04-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226854#comment-15226854
 ] 

Sergey Shelukhin commented on YARN-4562:


Hmm... ping?

> YARN WebApp ignores the configuration passed to it for keystore settings
> 
>
> Key: YARN-4562
> URL: https://issues.apache.org/jira/browse/YARN-4562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: YARN-4562.patch
>
>
> The conf can be passed to WebApps builder, however the following code in 
> WebApps.java that builds the HttpServer2 object:
> {noformat}
> if (httpScheme.equals(WebAppUtils.HTTPS_PREFIX)) {
>   WebAppUtils.loadSslConfiguration(builder);
> }
> {noformat}
> ...results in loadSslConfiguration creating a new Configuration object; the 
> one that is passed in is ignored, as far as the keystore/etc. settings are 
> concerned.  loadSslConfiguration has another overload with Configuration 
> parameter that should be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4552) NM ResourceLocalizationService should check and initialize local filecache dir (and log dir) even if NM recover is enabled.

2016-04-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226850#comment-15226850
 ] 

Junping Du commented on YARN-4552:
--

Hi [~vinodkv], I will rebase the patch soon. I think we are getting close on 
this.

> NM ResourceLocalizationService should check and initialize local filecache 
> dir (and log dir) even if NM recover is enabled.
> ---
>
> Key: YARN-4552
> URL: https://issues.apache.org/jira/browse/YARN-4552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4552-v2.patch, YARN-4552.patch
>
>
> In some cases, user are cleanup localized file cache for debugging/trouble 
> shooting purpose during NM down time. However, after bring back NM (with 
> recovery enabled), the job submission could be failed for exception like 
> below:
> {noformat}
> Diagnostics: java.io.FileNotFoundException: File 
> /disk/12/yarn/local/filecache does not exist.
> {noformat}
> This is due to we only create filecache dir when recover is not enabled 
> during ResourceLocalizationService get initialized/started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-04-05 Thread Jonathan Maron (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226722#comment-15226722
 ] 

Jonathan Maron commented on YARN-4757:
--

That's great information.  Thanks!  I'll start digging into zone transfer 
support.

> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
> Attachments: YARN-4757- Simplified discovery of services via DNS 
> mechanisms.pdf
>
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service­-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent end­points of a service is not easy to implement 
> using the present registry­-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-­known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2016-04-05 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4514:
--
Attachment: YARN-4514-YARN-3368.3.patch

Attaching a new patch addressing the comments from [~varun_saxena].

[~varun_saxena]/[~leftnoteasy]. Could you pls take a look.

> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-4514-YARN-3368.1.patch, 
> YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch
>
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4893) Fix some intermittent test failures in TestRMAdminService

2016-04-05 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226670#comment-15226670
 ] 

Brahma Reddy Battula commented on YARN-4893:


[~djp] thanks for review and commit..And thanks to others for additional 
review..

> Fix some intermittent test failures in TestRMAdminService
> -
>
> Key: YARN-4893
> URL: https://issues.apache.org/jira/browse/YARN-4893
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: YARN-4893-002.patch, YARN-4893-003.patch, YARN-4893.patch
>
>
> As discussion in YARN-998, we need to add rm.drainEvents() after 
> rm.registerNode() or some of test could get failed intermittently. Also, we 
> can consider to add rm.drainEvents() within rm.registerNode() that could be 
> more convenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4898) Avoid sending NODE_LABELS_UPDATE event to scheduler when node label is not configured

2016-04-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226620#comment-15226620
 ] 

Naganarasimha G R commented on YARN-4898:
-

Thanks [~rohithsharma] for raising this issue, 
Seems like we do not require most of the functionalities done by  
{{RMNodelableManager}} when node labels are not enabled. So i will try to 
evaluate the impacts if the {{RMNodelableManager}} is set to null or have a 
overridden implementation which can avoid the redundant functionalities when 
the labels are disabled.
i would like to take this up if you have not already started.

> Avoid sending NODE_LABELS_UPDATE event to scheduler when node label is not 
> configured
> -
>
> Key: YARN-4898
> URL: https://issues.apache.org/jira/browse/YARN-4898
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>
> It is observed that when ever new NodeManager is registered or expired with 
> ResourceManager, a new additional event NODE_LABELS_UPDATE has been triggered 
> even though *Node Label is not enabled*. This makes dispatcher to process one 
> event without any real use. 
> logs : where all 100 of node-labels-update events processed by dispatcher. 
> {noformat}
> 2016-03-30 15:42:24,461 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(175)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType:
>  NODE_LABELS_UPDATE
> 2016-03-30 15:42:24,461 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(175)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType:
>  NODE_LABELS_UPDATE
> 2016-03-30 15:42:24,462 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(175)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType:
>  NODE_LABELS_UPDATE
> 2016-03-30 15:42:24,462 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(175)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType:
>  NODE_LABELS_UPDATE
> {noformat}
> Point of concern is In large cluster, it is seen that registering thousands 
> of nodes altogether to running cluster might/would cause other events 
> processing to get delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2016-04-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226600#comment-15226600
 ] 

Sunil G commented on YARN-4514:
---

bq.we expect this configuration to be configured by user during deployment. 
Right ?
I say admin. But more or less its a user.

bq.namespaces like ws/v1/cluster, etc. move to constants.js
I still say these are config than constants. .Because some of the rest 
namespace end points can be added in future, and may get changed too. So it ll 
help in upgrades etc. I looked TEZ code, and they use this as config too. 
Thoughts?

bq.maybe we can have a separate config for scheme(http/https)
Yes, I will address the same.

bq.I think we wont require localBaseUrl(which is corsproxy URL) in normal 
production cluster
We can use localBaseUrl and keep it as empty for now. And for tests, we can 
configure {{corsproxy}} URL in testbed.

I will also change the config name from default to address as discussed offline.

> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-4514-YARN-3368.1.patch, YARN-4514-YARN-3368.2.patch
>
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-04-05 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226581#comment-15226581
 ] 

Kuhu Shukla commented on YARN-4311:
---

Thank you so much [~jlowe] for the detailed help and review! Appreciate the 
help.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 2.8.0
>
> Attachments: YARN-4311-v1.patch, YARN-4311-v10.patch, 
> YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v12.patch, 
> YARN-4311-v13.patch, YARN-4311-v13.patch, YARN-4311-v14.patch, 
> YARN-4311-v2.patch, YARN-4311-v3.patch, YARN-4311-v4.patch, 
> YARN-4311-v5.patch, YARN-4311-v6.patch, YARN-4311-v7.patch, 
> YARN-4311-v8.patch, YARN-4311-v9.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails.

2016-04-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226552#comment-15226552
 ] 

Hudson commented on YARN-4916:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9560 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9560/])
YARN-4916. TestNMProxy.tesNMProxyRPCRetry fails. Contributed by Tibor 
(junping_du: rev 00058167431475c6e63c80207424f1d365569e3a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java


> TestNMProxy.tesNMProxyRPCRetry fails.
> -
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails.

2016-04-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4916:
-
Summary: TestNMProxy.tesNMProxyRPCRetry fails.  (was: 
TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows)

> TestNMProxy.tesNMProxyRPCRetry fails.
> -
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2016-04-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226521#comment-15226521
 ] 

Varun Saxena commented on YARN-4514:


IIUC, we expect this configuration to be configured by user during deployment. 
Right ?

If that's the case shouldn't the namespaces like ws/v1/cluster, etc. move to 
constants.js, the file I had created in YARN-4517.
I am not too sure if it be required for users to configure namespaces. These 
sound more like constants. 
Maybe the only config which they may require is whether they are connecting to 
ATSv1 or ATSv2, which I think can be added in some ATSv2 JIRA.
[~sunilg], what do you think ?

Also would we want to put config.js in a separate folder named something like 
config ? I had placed it outside(under app folder) in YARN-4517 because it was 
mainly added for convenience and we had this JIRA open too. Basically we need 
to decide where we want to keep it while packaging.

Correct me if I am wrong. I think we wont require localBaseUrl(which is 
corsproxy URL) in normal production cluster. 
So maybe we can have a separate config for scheme(http/https). And do not 
consider corsproxy URL for production. I think maybe we can use ENV.baseUrl in 
config/environment.js for this. And set ENV.baseUrl to "" for production.

> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-4514-YARN-3368.1.patch, YARN-4514-YARN-3368.2.patch
>
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4834) ProcfsBasedProcessTree doesn't track daemonized processes

2016-04-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226481#comment-15226481
 ] 

Hadoop QA commented on YARN-4834:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: patch 
generated 6 new + 35 unchanged - 0 fixed = 41 total (was 35) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 1s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 17s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 39s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12797089/YARN-4834.001.patch |
| JIRA Issue | YARN-4834 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux bcba00a3970f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 

[jira] [Commented] (YARN-4834) ProcfsBasedProcessTree doesn't track daemonized processes

2016-04-05 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226424#comment-15226424
 ] 

Nathan Roberts commented on YARN-4834:
--

As a note, we were seeing this with slider applications. I didn't investigate 
far enough to know if all slider applications escape or if this was just a a 
characteristic of this particular application.

> ProcfsBasedProcessTree doesn't track daemonized processes
> -
>
> Key: YARN-4834
> URL: https://issues.apache.org/jira/browse/YARN-4834
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4834.001.patch
>
>
> Currently the algorithm uses ppid from /proc//stat which can be 1 if a 
> child process has daemonized itself. This causes potentially large processes 
> from not being monitored. 
> session id might be a better choice since that's what we use to signal the 
> container during teardown. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4834) ProcfsBasedProcessTree doesn't track daemonized processes

2016-04-05 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4834:
-
Attachment: YARN-4834.001.patch

Simple fix that falls back to sessionID if process has become owned by init. 
Seemed safest low risk change.

Other options might be:
- Only use sessionID to build process tree
- Use container cgroup (cgroup.procs) if available/configured.

> ProcfsBasedProcessTree doesn't track daemonized processes
> -
>
> Key: YARN-4834
> URL: https://issues.apache.org/jira/browse/YARN-4834
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4834.001.patch
>
>
> Currently the algorithm uses ppid from /proc//stat which can be 1 if a 
> child process has daemonized itself. This causes potentially large processes 
> from not being monitored. 
> session id might be a better choice since that's what we use to signal the 
> container during teardown. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-04-05 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226395#comment-15226395
 ] 

Allen Wittenauer commented on YARN-4757:


bq. May I ask why?

In the vast majority of cases I've seen, there's been primarily two reasons:

* performance.  A specialty, high-end DNS server (e.g., Infoblox, PowerDNS, 
etc) will typically blow the doors off of home grown solutions... esp if Java 
GC pauses are involved. ;)  Never mind that having forced, local resolution is 
a HUGE win when dealing with heavy loads or high latency connections.  Why 
should I blow out my local DNS cache when I can just AXFR it to begin with?  
Latency in DNS queries matters; it's one of the reasons why servers at places 
like Y! ran (and probably still do) with full grown DNS caches using BIND on 
the local nodes.

* limiting exposure.  It's a much simpler network and security monitoring 
design if the DNS pathways are limited.  If I know that DNS server X can 
authoritatively answer questions for my entire corporate network, then I know 
that the only DNS traffic I should be seeing are zone transfers, especially if 
I also close off access to the Internet (think data center).  If I see any 
extra traffic, something bad is likely going on.

bq.  If a zone key is created and properly configured, isn't the authenticity 
of the records assured? 

As you can see, that's mostly an irrelevant question.

You're also assuming that group A trusts group B. The folks who run the 
corporate DNS are not the same people who run the data center DNS who are not 
the same people that run the Hadoop systems who are not the same people who are 
actually using the Hadoop systems as end users.  ANY sort of insurance policy 
(in this case, AFXR with bonus points for IXFR) enable a lot more trust.

bq. Do we need to require the support of secure zone transfers?

Absolutely.  It's trivial to run out of fingers of big name companies will 
dismiss this totally if they can't do resolution from their own servers.


> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
> Attachments: YARN-4757- Simplified discovery of services via DNS 
> mechanisms.pdf
>
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service­-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent end­points of a service is not easy to implement 
> using the present registry­-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-­known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4905) Improve Yarn log Command line option to show log metadata

2016-04-05 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226389#comment-15226389
 ] 

Varun Vasudev commented on YARN-4905:
-

Thanks for the patch [~xgong]. A few things -
# If I don't have AHS enabled, and restart the RM, the yarn logs command will 
give me the logs of an older app, but the -ls doesn't work. We should change 
this to return valid information if it's present on the file system.
# {code} yarn logs -applicationId application_1459856011417_0002 -ls 
-nodeAddress ubuntu_60321 -containerId container_1459856011417_0002_01_02
Can not find log meta for container: container_1459856011417_0002_01_02 on 
ubuntu_60321
{code} 
In this case, container_1459856011417_0002_01_02 ran on another node. From 
the containerId we can get the node it ran on and we should let the user know 
which node it ran on. We don't need to publish the information, just a helpful 
message like "The container couldn't be found on the node specified. From the 
logs, it appears the container ran on ".
# {code}yarn logs -applicationId application_1459856011417_0002 -ls 
-nodeAddress ubuntu_60322 -containerId blah
Can not find log meta for container: blah on ubuntu_60322
{code}
Similar to (2), in this case we should list all the containers that did run on 
the node and let the user know
# {code} +opts.addOption(LS_OPTION, false, "List Container log meta," {code}
Change "meta" to "metadata". There are other places where "meta" is used - it 
should be changed to "metadata".
# {code}+public static int listContainerLogs(DataInputStream valueStream,
+PrintStream out) throws IOException {
{code}
It would be better if this function returned a formatted String instead of 
printing the data directly to the PrintStream. Also, can you add a test case 
for this function to make sure that any future changes don't break this piece?
# {code}+  public void listContainerLogInfo(ApplicationId appId, String 
containerIdStr,
+  String nodeId, String appOwner, PrintStream out) throws IOException {
{code}
Rename to printContainerLogMetadata

> Improve Yarn log Command line option to show log metadata
> -
>
> Key: YARN-4905
> URL: https://issues.apache.org/jira/browse/YARN-4905
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4905.1.patch
>
>
> Improve the Yarn log commandline to have "ls" command which can list 
> containers for which we have logs, list files within each container, along 
> with file size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4893) Fix some intermittent test failures in TestRMAdminService

2016-04-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226325#comment-15226325
 ] 

Hudson commented on YARN-4893:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9559 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9559/])
YARN-4893. Fix some intermittent test failures in TestRMAdminService. 
(junping_du: rev 6be28bcc461292b24589dae17a235b3eaadc07ed)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestNodesListManager.java


> Fix some intermittent test failures in TestRMAdminService
> -
>
> Key: YARN-4893
> URL: https://issues.apache.org/jira/browse/YARN-4893
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-4893-002.patch, YARN-4893-003.patch, YARN-4893.patch
>
>
> As discussion in YARN-998, we need to add rm.drainEvents() after 
> rm.registerNode() or some of test could get failed intermittently. Also, we 
> can consider to add rm.drainEvents() within rm.registerNode() that could be 
> more convenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-04-05 Thread Jonathan Maron (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226326#comment-15226326
 ] 

Jonathan Maron commented on YARN-4757:
--

two questions:

1)  May I ask why?  If a zone key is created and properly configured, isn't the 
authenticity of the records assured?  
2)  If this truly a non-starter for many environments - what is the 
alternative?  Do we need to require the support of secure zone transfers?


> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
> Attachments: YARN-4757- Simplified discovery of services via DNS 
> mechanisms.pdf
>
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service­-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent end­points of a service is not easy to implement 
> using the present registry­-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-­known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4773) Log aggregation performs extraneous filesystem operations when rolling log aggregation is disabled

2016-04-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4773:
-
Fix Version/s: 2.6.5
   2.7.3

Apologies for the long delay.

+1 for the branch-2.6 patch.  Thanks, Jun!  I committed it to branch-2.7 and 
branch-2.6.  


> Log aggregation performs extraneous filesystem operations when rolling log 
> aggregation is disabled
> --
>
> Key: YARN-4773
> URL: https://issues.apache.org/jira/browse/YARN-4773
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jun Gong
>Priority: Minor
> Fix For: 2.8.0, 2.7.3, 2.6.5
>
> Attachments: YARN-4773-branch-2.6.patch, YARN-4773.01.patch, 
> YARN-4773.02.patch, YARN-4773.03.patch
>
>
> I noticed when log aggregation occurs for an application the nodemanager is 
> listing the application's log directory in HDFS.  Apparently this is for 
> removing old logs before uploading new ones.  This is a wasteful operation 
> when rolling log aggregation is disabled, since there will be no prior logs 
> in HDFS -- aggregation only occurs once when rolling log aggregation is 
> disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-04-05 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226278#comment-15226278
 ] 

Allen Wittenauer commented on YARN-4757:


bq. 2. Configured a BIND server with a forwarding zone configuration

This is not allowed in MANY corporate environments.  DNS server access is 
strictly firewall'ed off with load balancing done via zone propagation instead.

> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
> Attachments: YARN-4757- Simplified discovery of services via DNS 
> mechanisms.pdf
>
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service­-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent end­points of a service is not easy to implement 
> using the present registry­-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-­known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-04-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226284#comment-15226284
 ] 

Hudson commented on YARN-4311:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9558 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9558/])
YARN-4311. Removing nodes from include and exclude lists will not remove 
(jlowe: rev 1cbcd4a491e6a57d466c2897335614dc6770b475)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java


> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 2.8.0
>
> Attachments: YARN-4311-v1.patch, YARN-4311-v10.patch, 
> YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v12.patch, 
> YARN-4311-v13.patch, YARN-4311-v13.patch, YARN-4311-v14.patch, 
> YARN-4311-v2.patch, YARN-4311-v3.patch, YARN-4311-v4.patch, 
> YARN-4311-v5.patch, YARN-4311-v6.patch, YARN-4311-v7.patch, 
> YARN-4311-v8.patch, YARN-4311-v9.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows

2016-04-05 Thread Tibor Kiss (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226268#comment-15226268
 ] 

Tibor Kiss commented on YARN-4916:
--

Thanks a lot Junping!

> TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows
> --
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-04-05 Thread Jonathan Maron (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226267#comment-15226267
 ] 

Jonathan Maron commented on YARN-4757:
--

So in my internal testing I setup the following (this is the approach related 
in the "High Availability" section):

1)  Run an instance of the DNS service serving up records of the hadoop cluster 
and configured with a specific zone name
2)  Configured a BIND server with a forwarding zone configuration, forwarding 
all requests for the hadoop configured zone to the DNS service instance
3)  Queries to the BIND server for cluster related records are returned 
successfully

This also assumes that the DNSSEC setup is correct (zone key created and 
configured on both ends). 

Assuming that a corporate server is configured appropriately to serve external 
requests, the hadoop cluster zone records should be available externally, I 
believe (this is one thing I haven't had the ability to test).

> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
> Attachments: YARN-4757- Simplified discovery of services via DNS 
> mechanisms.pdf
>
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service­-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent end­points of a service is not easy to implement 
> using the present registry­-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-­known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows

2016-04-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226266#comment-15226266
 ] 

Junping Du commented on YARN-4916:
--

Thanks [~tibor.k...@gmail.com] for clarification here. Just add you to YARN 
contributor and assign this JIRA to you. :)
01 patch LGTM. +1. Will commit it shortly.


> TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows
> --
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows

2016-04-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4916:
-
Assignee: Tibor Kiss

> TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows
> --
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-04-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226259#comment-15226259
 ] 

Jason Lowe commented on YARN-4311:
--

+1 latest patch lgtm.  Committing this.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v10.patch, 
> YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v12.patch, 
> YARN-4311-v13.patch, YARN-4311-v13.patch, YARN-4311-v14.patch, 
> YARN-4311-v2.patch, YARN-4311-v3.patch, YARN-4311-v4.patch, 
> YARN-4311-v5.patch, YARN-4311-v6.patch, YARN-4311-v7.patch, 
> YARN-4311-v8.patch, YARN-4311-v9.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-04-05 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226212#comment-15226212
 ] 

Allen Wittenauer commented on YARN-4757:


... that only works if this DNS server supports zone transfers. 

Since this DNS server doesn't support the vast majority of DNS operations, it's 
pretty much impossible to use this in, e.g., /etc/resolv.conf.  There's also 
the problem of remote latency, firewalls, etc, etc, that come about when 
dealing with corporate installations.

All of which means ...

bq. I think that is through the naming convention, and the DNS configuration on 
the desktop in a foreign county.

... isn't viable.  The naming convention only works if I can resolve it.  I 
can't resolve it if I can only ask this DNS server the answer!

> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
> Attachments: YARN-4757- Simplified discovery of services via DNS 
> mechanisms.pdf
>
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service­-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent end­points of a service is not easy to implement 
> using the present registry­-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-­known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4893) Fix some intermittent test failures in TestRMAdminService

2016-04-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226214#comment-15226214
 ] 

Junping Du commented on YARN-4893:
--

Latest patch LGTM. +1. Will commit it shortly.

> Fix some intermittent test failures in TestRMAdminService
> -
>
> Key: YARN-4893
> URL: https://issues.apache.org/jira/browse/YARN-4893
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-4893-002.patch, YARN-4893-003.patch, YARN-4893.patch
>
>
> As discussion in YARN-998, we need to add rm.drainEvents() after 
> rm.registerNode() or some of test could get failed intermittently. Also, we 
> can consider to add rm.drainEvents() within rm.registerNode() that could be 
> more convenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-04-05 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226204#comment-15226204
 ] 

Robert Joseph Evans commented on YARN-4757:
---

I am not suggesting there is a DNS based solution.  I am not a DNS expert and 
was hopeful there could at least be a DNS based mitigation possible, but that 
hope has now faded.

I wanted to bring it up for discussion as part of the design so we go into this 
with our eyes wide open, and that at a minimum documenting it with examples for 
"fixing" it becomes a part of the final product.  That did not happen for the 
initial registry service, but probably should have.

> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
> Attachments: YARN-4757- Simplified discovery of services via DNS 
> mechanisms.pdf
>
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service­-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent end­points of a service is not easy to implement 
> using the present registry­-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-­known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows

2016-04-05 Thread Tibor Kiss (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226201#comment-15226201
 ] 

Tibor Kiss commented on YARN-4916:
--

[~djp]: The difference between the first and second patch is the 
printStackTrace(). That's why I suggested pushing the first version (w/o stack 
trace): YARN-4916.01.patch
Thanks!

> TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows
> --
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Priority: Minor
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-04-05 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226196#comment-15226196
 ] 

Robert Joseph Evans commented on YARN-4757:
---

I think that is through the naming convention, and the DNS configuration on the 
desktop in a foreign county.

I imagine that if I were doing this I would set it up so that the Hadoop DNS 
server would handle a set of sub-domains in my companies internal DNS setup.  
Then when my desktop is setup, or when my laptop connects to the VPN, the DNS 
server that it talks to would be configured to include one that also knows 
about the Hadoop setup.

But that is just my guess.

> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
> Attachments: YARN-4757- Simplified discovery of services via DNS 
> mechanisms.pdf
>
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service­-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent end­points of a service is not easy to implement 
> using the present registry­-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-­known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows

2016-04-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226173#comment-15226173
 ] 

Junping Du commented on YARN-4916:
--

Thanks [~brahmareddy] for verification on this.
About 02 patch:
{{e.printStackTrace(System.out);}} seems not necessary? 
Other looks good to me.

> TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows
> --
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Priority: Minor
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4484) Available Resource calculation for a queue is not correct when used with labels

2016-04-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226162#comment-15226162
 ] 

Sunil G commented on YARN-4484:
---

Apart from the known test failures, other failures are tracked via YARN-4846 
and YARN-4893. I will also analyze YARN-4846 which seems to me a pblm.

> Available Resource calculation for a queue is not correct when used with 
> labels
> ---
>
> Key: YARN-4484
> URL: https://issues.apache.org/jira/browse/YARN-4484
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4484.patch, 0002-YARN-4484.patch, 
> 0003-YARN-4484-v2.patch, 0003-YARN-4484.patch, 0004-YARN-4484.patch, 
> 0005-YARN-4484.patch
>
>
> To calculate available resource for a queue, we have to get the total 
> resource allocated for all labels in queue compare to its usage. 
> Also address the comments given in 
> [YARN-4304-comments|https://issues.apache.org/jira/browse/YARN-4304?focusedCommentId=15064874=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15064874
>  ] given by [~leftnoteasy] for same.
> ClusterMetrics related issues will also get handled once we fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-321) [Umbrella] Generic application history service

2016-04-05 Thread chenlu1990js (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenlu1990js updated YARN-321:
--
Priority: Minor  (was: Major)

> [Umbrella] Generic application history service
> --
>
> Key: YARN-321
> URL: https://issues.apache.org/jira/browse/YARN-321
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Luke Lu
>Priority: Minor
> Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
> Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted 
> server in sync with the mapreduce runtime. Every new application would need a 
> similar application history server. Having to deploy O(T*V) (where T is 
> number of type of application, V is number of version of application) trusted 
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and 
> history data into a particular directory for later serving. Job history data 
> is already stored as json (or binary avro). I propose that we create only one 
> trusted application history server, which can have a generic UI (display json 
> as a tree of strings) as well. Specific application/version can deploy 
> untrusted webapps (a la AMs) to query the application history server and 
> interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows

2016-04-05 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226089#comment-15226089
 ] 

Brahma Reddy Battula commented on YARN-4916:


Yes, this patch also required.{{TestNMProxy.tesNMProxyRPCRetry}} fails without 
this patch.It's started failing after HADOOP-11212 in jenkins precommit 
builds.Hence YARN-4922 and YARN-4923 raised to track this.

 *Precommit- Builds* 
https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1257/testReport/
https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1261/testReport/
https://builds.apache.org/job/Hadoop-Yarn-trunk/1968/testReport/
https://builds.apache.org/job/Hadoop-Yarn-trunk/1971/testReport/
https://builds.apache.org/job/Hadoop-Yarn-trunk/1972/testReport/

> TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows
> --
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Priority: Minor
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4685) AM blacklisting result in application to get hanged

2016-04-05 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4685:

Description: 
AM blacklist addition or removal is updated only when RMAppAttempt is scheduled 
i.e {{RMAppAttemptImpl#ScheduleTransition#transition}}. But once attempt is 
scheduled if there is any removeNode/addNode in cluster then this is not 
updated to {{BlackListManager#refreshNodeHostCount}}. This leads 
BlackListManager to operate on stale NM's count. And application is in ACCEPTED 
state and wait forever even if blacklisted nodes are reconnected with clearing 
disk space.


  was:
AM blacklist addition or removal is updated only when RMAppAttempt is scheduled 
i.e {{RMAppAttemptImpl#ScheduleTransition#transition}}. But once attempt is 
scheduled if there is any removeNode/addNode in cluster then this is not 
updated to {{BlackListManager#refreshNodeHostCount}}. This leads 
BlackListManager to operate on stale NM's count. And application is in ACCEPTED 
state and wait forever even if we add more nodes to cluster.

Solution is update BlacklistManager for every 
{{RMAppAttemptImpl#AMContainerAllocatedTransition#transition}} call. This 
ensures if there is any addition/removal in nodes, this will be updated to 
BlacklistManager 


> AM blacklisting result in application to get hanged
> ---
>
> Key: YARN-4685
> URL: https://issues.apache.org/jira/browse/YARN-4685
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>
> AM blacklist addition or removal is updated only when RMAppAttempt is 
> scheduled i.e {{RMAppAttemptImpl#ScheduleTransition#transition}}. But once 
> attempt is scheduled if there is any removeNode/addNode in cluster then this 
> is not updated to {{BlackListManager#refreshNodeHostCount}}. This leads 
> BlackListManager to operate on stale NM's count. And application is in 
> ACCEPTED state and wait forever even if blacklisted nodes are reconnected 
> with clearing disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows

2016-04-05 Thread Tibor Kiss (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226079#comment-15226079
 ] 

Tibor Kiss commented on YARN-4916:
--

[~ste...@apache.org]: This patch builds on top of HADOOP-11212, both are needed 
to resolve this issue.

The YARN-4916.01.patch will resolve the problem both on Windows, OS X & Linux.

Thanks

> TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows
> --
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Priority: Minor
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4916) TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows

2016-04-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226067#comment-15226067
 ] 

Steve Loughran commented on YARN-4916:
--

Tibor: to confirm, with HADOOP-11212 in, you don't need a patch to this test? 
Or is the patch here needed as well?

> TestNMProxy.tesNMProxyRPCRetry fails on OS X & Windows
> --
>
> Key: YARN-4916
> URL: https://issues.apache.org/jira/browse/YARN-4916
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.0, 2.7.3, 2.6.4
> Environment: OS X 10.11 with Oracle JDK 1.7.0_79
> Windows Server 2012 with Oracle JDK 1.7.0_79
>Reporter: Tibor Kiss
>Priority: Minor
> Attachments: YARN-4916.01.patch, YARN-4916.02-WiP.patch
>
>
> The test ensures that java.net.SocketException is thrown from
> NMProxy.startContainers() without the RPC Request being retried.
> With Oracle JDK 1.7 on OS X & Windows BindException is thrown from 
> startContainers().
> The testcase expects that SocketException is thrown - which is 
> BindException's superclass.
> The exception type check is implemented using string compare and not 
> reflection, therefore the thrown BindException is not accepted.
> {noformat}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.149 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 0.211 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}
> Actual exception:
> {noformat}
> 2016-04-02 21:25:13,311 WARN  [Thread-93] ipc.Client 
> (Client.java:handleConnectionFailure(880)) - Failed to connect to server: 
> 1234/0.0.4.210:0: retries get failed due to exceeded maximum allowed retries
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:634)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1413)
> at org.apache.hadoop.ipc.Client.call(Client.java:1328)
> at org.apache.hadoop.ipc.Client.call(Client.java:1306)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.startContainers(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4923) Test Failure TestNMProxy#testNMProxyRPCRetry

2016-04-05 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4923:
-
Affects Version/s: 2.8.0
  Description: 
{noformat}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
{noformat}

  was:

{noformat}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
{noformat}

  Component/s: (was: yarn)
   test

> Test Failure TestNMProxy#testNMProxyRPCRetry
> 
>
> Key: YARN-4923
> URL: https://issues.apache.org/jira/browse/YARN-4923
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Bibin A Chundatt
>
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4922) TestNMProxy#testNMProxyRPCRetry fails

2016-04-05 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4922:
-
Affects Version/s: 2.8.0
  Component/s: test

> TestNMProxy#testNMProxyRPCRetry fails
> -
>
> Key: YARN-4922
> URL: https://issues.apache.org/jira/browse/YARN-4922
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Jian He
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4923) Test Failure TestNMProxy#testNMProxyRPCRetry

2016-04-05 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-4923.
--
Resolution: Duplicate

closing as duplicate of YARN-4916

> Test Failure TestNMProxy#testNMProxyRPCRetry
> 
>
> Key: YARN-4923
> URL: https://issues.apache.org/jira/browse/YARN-4923
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Bibin A Chundatt
>
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:191)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >