[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-03-03 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344707#comment-14344707
 ] 

Tsuyoshi Ozawa commented on YARN-3249:
--

One more minor comment about indentation:

{code}
+  html.div()
+.button()
+  .$onclick(String.format("confirmAction('%s')",
+url(String.format("/killapp/%s", aid
+.b("Kill Application")
+  ._()
+  ._();
{code}

Above lines should be same as following lines:
{code}
+  html.script().$type("text/javascript")
+  ._("function confirmAction(href) { "
+  + "b = confirm(\"Are you sure?\");"
+  + "if (b == true) {"
+  + "  location.href = href;"
+  + "}"
++ "}")
+  ._();
{code}

> Add the kill application to the Resource Manager Web UI
> ---
>
> Key: YARN-3249
> URL: https://issues.apache.org/jira/browse/YARN-3249
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
> YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.patch, killapp-failed.log, 
> killapp-failed2.log, screenshot.png, screenshot2.png
>
>
> It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344739#comment-14344739
 ] 

Rohith commented on YARN-3222:
--

Had a mail chat with [~jianhe] regarding the issue's observed in this jira 
discussions and decided to split up the jira into 2 separate jira. The observed 
issues in ReconnectNodeTransition are
# As per defect description, order of node_resource_update and node_added 
events sending to schedulers. If Node_added events is being sent to schedulers 
then no need of sending node_resource_update event from RMNode again to 
scheduler which is not necessarily required.
# If the RMNode state is RUNNING then Node_usable event not necessarily to be 
sent.
# If a node is reconnceted with different capability, then 
RMNode#totalCapability remains with old capability. This has to be updated with 
new capability.

1 and 2 are going to handle in this jira. 3 issue will be done in separate jira.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344766#comment-14344766
 ] 

Hadoop QA commented on YARN-3249:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702071/YARN-3249.5.patch
  against trunk revision 742f9d9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6817//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6817//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6817//console

This message is automatically generated.

> Add the kill application to the Resource Manager Web UI
> ---
>
> Key: YARN-3249
> URL: https://issues.apache.org/jira/browse/YARN-3249
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
> YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.patch, killapp-failed.log, 
> killapp-failed2.log, screenshot.png, screenshot2.png
>
>
> It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3286) RMNode#totalCapability has stale capability after NM is reconnected.

2015-03-03 Thread Rohith (JIRA)
Rohith created YARN-3286:


 Summary: RMNode#totalCapability has stale capability after NM is 
reconnected.
 Key: YARN-3286
 URL: https://issues.apache.org/jira/browse/YARN-3286
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith


This is found while fixing YARN-3222 mentioned in the comment 
[link1|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14339799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339799]
 and 
[link2|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14344739&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14344739]

And RMNode#ReconnectNodeTransition clean up : It always remove an old node and 
add a new node. This need to be examined whether this is really required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344790#comment-14344790
 ] 

Rohith commented on YARN-3222:
--

For handling 3rd point, raised issue YARN-3286

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3222:
-
Attachment: 0004-YARN-3222.patch

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3286) RMNode#totalCapability has stale capability after NM is reconnected.

2015-03-03 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3286:
-
Attachment: YARN-3286-test-only.patch

Attached test patch that simulate the issue.

> RMNode#totalCapability has stale capability after NM is reconnected.
> 
>
> Key: YARN-3286
> URL: https://issues.apache.org/jira/browse/YARN-3286
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-3286-test-only.patch
>
>
> This is found while fixing YARN-3222 mentioned in the comment 
> [link1|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14339799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339799]
>  and 
> [link2|https://issues.apache.org/jira/browse/YARN-3222?focusedCommentId=14344739&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14344739]
> And RMNode#ReconnectNodeTransition clean up : It always remove an old node 
> and add a new node. This need to be examined whether this is really required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344802#comment-14344802
 ] 

Rohith commented on YARN-3222:
--

Kindly review the update patch that fixes 1& 2 in as mentioned in earlier 
comment.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-03-03 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344848#comment-14344848
 ] 

Varun Vasudev commented on YARN-3248:
-

Thanks for the feedback [~ozawa], [~vinodkv]. 

{quote}
The blacklist is an instance of HashSet, so it can throw 
ConcurrentModificationException when blacklist is modified in another thread. 
One alternative is to use Collections.newSetFromMap(new 
ConcurrentHashMap()) instead of HashSet.
{quote}

Good catch. Collections.newSetFromMap won't work because the blacklist itself 
is a set. I create a copy of the structure in the latest patch.

bq. If AbstractYarnScheduler#getApplicationAttempt() can be used, I think it's 
more straightforward and simple. What do you think?

Agreed. Changed the code.

bq. Could you add tests to TestRMWebServicesApps?

I'm not sure what tests to add. I'm not adding any new web services.

{quote}
The blacklist information is per application-attempt, and scheduler will forget 
previous application-attempts today. I think this is a general behaviour with 
the way blacklisting is done today - each AM is expected to explicitly 
blacklist all the nodes it wants to blacklist even if the previous attempt 
already informed about some of them before. That is how all of resource 
requests work. Given the above, we should make it clear that blacklists are 
really for this app-attempt.
{quote}

I was under this impression as well, but it the information is maintained on a 
per app basis in the AbstractYarnScheduler.
{noformat}
protected Map> applications;
{noformat}

bq. W.r.t UI, showing the list of all the nodes is going to be a UI scalability 
problem - how about we move this list to the per-app page? That is the place 
where this is useful the most.

Agreed. Made the change.

bq. We should also add this information to the web-services.

You mean the app information web service?




> Display count of nodes blacklisted by apps in the web UI
> 
>
> Key: YARN-3248
> URL: https://issues.apache.org/jira/browse/YARN-3248
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screenshot.jpg, apache-yarn-3248.0.patch
>
>
> It would be really useful when debugging app performance and failure issues 
> to get a count of the nodes blacklisted by individual apps displayed in the 
> web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-03-03 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3248:

Attachment: App page.png
All applications.png

Uploaded screen shots from the latest patch.

> Display count of nodes blacklisted by apps in the web UI
> 
>
> Key: YARN-3248
> URL: https://issues.apache.org/jira/browse/YARN-3248
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: All applications.png, App page.png, Screenshot.jpg, 
> apache-yarn-3248.0.patch
>
>
> It would be really useful when debugging app performance and failure issues 
> to get a count of the nodes blacklisted by individual apps displayed in the 
> web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-03-03 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3248:

Attachment: apache-yarn-3248.1.patch

Uploaded patch with changes.

> Display count of nodes blacklisted by apps in the web UI
> 
>
> Key: YARN-3248
> URL: https://issues.apache.org/jira/browse/YARN-3248
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: All applications.png, App page.png, Screenshot.jpg, 
> apache-yarn-3248.0.patch, apache-yarn-3248.1.patch
>
>
> It would be really useful when debugging app performance and failure issues 
> to get a count of the nodes blacklisted by individual apps displayed in the 
> web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344853#comment-14344853
 ] 

Hadoop QA commented on YARN-3248:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702133/apache-yarn-3248.1.patch
  against trunk revision 4228de9.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6819//console

This message is automatically generated.

> Display count of nodes blacklisted by apps in the web UI
> 
>
> Key: YARN-3248
> URL: https://issues.apache.org/jira/browse/YARN-3248
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: All applications.png, App page.png, Screenshot.jpg, 
> apache-yarn-3248.0.patch, apache-yarn-3248.1.patch
>
>
> It would be really useful when debugging app performance and failure issues 
> to get a count of the nodes blacklisted by individual apps displayed in the 
> web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-03-03 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3248:

Attachment: apache-yarn-3248.2.patch

Uploaded new patch fixing conflict.

> Display count of nodes blacklisted by apps in the web UI
> 
>
> Key: YARN-3248
> URL: https://issues.apache.org/jira/browse/YARN-3248
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: All applications.png, App page.png, Screenshot.jpg, 
> apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch
>
>
> It would be really useful when debugging app performance and failure issues 
> to get a count of the nodes blacklisted by individual apps displayed in the 
> web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344905#comment-14344905
 ] 

Hudson commented on YARN-3281:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #121 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/121/])
YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed 
by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* hadoop-yarn-project/CHANGES.txt


> Add RMStateStore to StateMachine visualization list
> ---
>
> Key: YARN-3281
> URL: https://issues.apache.org/jira/browse/YARN-3281
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3281.01.patch
>
>
> The command "mvn compile -Pvisualize" should generate graph representations 
> for all state machines in the project. We are still missing 
> {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for 
> resourcemanager project.
> Another class 
> {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}}
>  also has a state machine. However this one is a protected inner class, hence 
> cannot be seen by class {{VisualizeStateMachine}}. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344903#comment-14344903
 ] 

Hudson commented on YARN-3265:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #121 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/121/])
YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's 
available resource-limit from the parent queue. Contributed by Wangda Tan. 
(vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java


> CapacityScheduler deadlock when computing absolute max avail capacity (fix 
> for trunk/branch-2)
> --
>
> Key: YARN-3265
> URL: https://issues.apache.org/jira/browse/YARN-3265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, 
> YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch
>
>
> This patch is trying to solve the same problem described in YARN-3251, but 
> this is a longer term fix for trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344899#comment-14344899
 ] 

Hudson commented on YARN-3270:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #121 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/121/])
YARN-3270. Fix node label expression not getting set in 
ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev 
abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* hadoop-yarn-project/CHANGES.txt


> node label expression not getting set in ApplicationSubmissionContext
> -
>
> Key: YARN-3270
> URL: https://issues.apache.org/jira/browse/YARN-3270
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Assignee: Rohit Agarwal
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3270.patch
>
>
> One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
> setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-03 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344914#comment-14344914
 ] 

Junping Du commented on YARN-3039:
--

Thanks for comments, [~Naganarasimha]!
bq. +1 for this approach. Also if NM uses this new blocking call in AMRMClient 
to get aggregator address then there might not be any race conditions for 
posting AM container's life cycle events by NM immediately after creation of 
appAggregator through Aux service.
Discussed with [~vinodkv] and [~zjshen] on this again offline. It looks heavy 
weight to make TimelineClient to wrap AMRMClient especially for security reason 
it make NM to take AMRMTokens for using TimelineClient in future which make 
less sense. To get rid of rack condition you mentioned above, we propose to use 
observer pattern to make TimelineClient can listen aggregator address update in 
AM or NM (wrap with retry logic to tolerant connection failure).

bq. Are we just adding a method to get the aggregator address aggregator 
address ? or what other API's are planned ?
Per above comments, we have no plan to add API to TimelineClient to talk to RM 
directly.

bq. I beleive the idea of using AUX service was to to decouple NM and Timeline 
service. If NM will notify RM about new appAggregator creation (based on AUX 
service) then basically NM should be aware of PerNodeAggregatorServer is 
configured as AUX service, and and if it supports rebinding appAggregator for 
failure then it should be able to communicate with this Auxservice too, whether 
would this be clean approach?
I agree we want to decouple things here. However, AUX service is not the only 
way to deploy app aggregators. There are other ways (check from diagram in 
YARN-3033) that app aggregators could be deployed in a separate process or an 
independent container which make less sense to have a protocol between AUX 
service and RM. I think now we should plan to add a protocol between aggregator 
and NM, and then notify RM through NM-RM heartbeat on registering/rebind for 
aggregator.

bq. I also feel we need to support to start per app aggregator only if app 
requests for it (Zhijie also had mentioned abt this). If not we can make use of 
one default aggregator for all these kind of apps launched in NM, which is just 
used to post container entities from different NM's for these apps.
My 2 cents here is app aggregator should have logic to consolidate all messages 
(events and metrics) for one application into more complex and flexible new 
data model. If each NM do aggregation separately, then it still a *writer* 
(like old timeline service), but not an *aggregator*. Thoughts?

bq. Any discussions happened wrt RM having its own Aggregator ? I feel it would 
be better for RM to have it as it need not depend on any NM's to post any 
entities.
Agree. I think we are on the same page now.
Will update proposal to reflect all these discussions (JIRA's and offline).

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> ---
>
> Key: YARN-3039
> URL: https://issues.apache.org/jira/browse/YARN-3039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, YARN-3039-no-test.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344919#comment-14344919
 ] 

Hadoop QA commented on YARN-3222:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702122/0004-YARN-3222.patch
  against trunk revision 9ae7f9e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6818//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6818//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6818//console

This message is automatically generated.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344954#comment-14344954
 ] 

Hudson commented on YARN-3265:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #855 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/855/])
YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's 
available resource-limit from the parent queue. Contributed by Wangda Tan. 
(vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> CapacityScheduler deadlock when computing absolute max avail capacity (fix 
> for trunk/branch-2)
> --
>
> Key: YARN-3265
> URL: https://issues.apache.org/jira/browse/YARN-3265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, 
> YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch
>
>
> This patch is trying to solve the same problem described in YARN-3251, but 
> this is a longer term fix for trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344950#comment-14344950
 ] 

Hudson commented on YARN-3270:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #855 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/855/])
YARN-3270. Fix node label expression not getting set in 
ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev 
abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java


> node label expression not getting set in ApplicationSubmissionContext
> -
>
> Key: YARN-3270
> URL: https://issues.apache.org/jira/browse/YARN-3270
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Assignee: Rohit Agarwal
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3270.patch
>
>
> One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
> setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344956#comment-14344956
 ] 

Hudson commented on YARN-3281:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #855 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/855/])
YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed 
by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml


> Add RMStateStore to StateMachine visualization list
> ---
>
> Key: YARN-3281
> URL: https://issues.apache.org/jira/browse/YARN-3281
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3281.01.patch
>
>
> The command "mvn compile -Pvisualize" should generate graph representations 
> for all state machines in the project. We are still missing 
> {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for 
> resourcemanager project.
> Another class 
> {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}}
>  also has a state machine. However this one is a protected inner class, hence 
> cannot be seen by class {{VisualizeStateMachine}}. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344966#comment-14344966
 ] 

Hadoop QA commented on YARN-3248:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702135/apache-yarn-3248.2.patch
  against trunk revision 4228de9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6820//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6820//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6820//console

This message is automatically generated.

> Display count of nodes blacklisted by apps in the web UI
> 
>
> Key: YARN-3248
> URL: https://issues.apache.org/jira/browse/YARN-3248
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: All applications.png, App page.png, Screenshot.jpg, 
> apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch
>
>
> It would be really useful when debugging app performance and failure issues 
> to get a count of the nodes blacklisted by individual apps displayed in the 
> web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345122#comment-14345122
 ] 

Hudson commented on YARN-3265:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2053 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2053/])
YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's 
available resource-limit from the parent queue. Contributed by Wangda Tan. 
(vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java


> CapacityScheduler deadlock when computing absolute max avail capacity (fix 
> for trunk/branch-2)
> --
>
> Key: YARN-3265
> URL: https://issues.apache.org/jira/browse/YARN-3265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, 
> YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch
>
>
> This patch is trying to solve the same problem described in YARN-3251, but 
> this is a longer term fix for trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345118#comment-14345118
 ] 

Hudson commented on YARN-3270:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2053 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2053/])
YARN-3270. Fix node label expression not getting set in 
ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev 
abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java


> node label expression not getting set in ApplicationSubmissionContext
> -
>
> Key: YARN-3270
> URL: https://issues.apache.org/jira/browse/YARN-3270
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Assignee: Rohit Agarwal
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3270.patch
>
>
> One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
> setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345124#comment-14345124
 ] 

Hudson commented on YARN-3281:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2053 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2053/])
YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed 
by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* hadoop-yarn-project/CHANGES.txt


> Add RMStateStore to StateMachine visualization list
> ---
>
> Key: YARN-3281
> URL: https://issues.apache.org/jira/browse/YARN-3281
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3281.01.patch
>
>
> The command "mvn compile -Pvisualize" should generate graph representations 
> for all state machines in the project. We are still missing 
> {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for 
> resourcemanager project.
> Another class 
> {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}}
>  also has a state machine. However this one is a protected inner class, hence 
> cannot be seen by class {{VisualizeStateMachine}}. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345143#comment-14345143
 ] 

Hudson commented on YARN-3281:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #112 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/112/])
YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed 
by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml


> Add RMStateStore to StateMachine visualization list
> ---
>
> Key: YARN-3281
> URL: https://issues.apache.org/jira/browse/YARN-3281
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3281.01.patch
>
>
> The command "mvn compile -Pvisualize" should generate graph representations 
> for all state machines in the project. We are still missing 
> {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for 
> resourcemanager project.
> Another class 
> {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}}
>  also has a state machine. However this one is a protected inner class, hence 
> cannot be seen by class {{VisualizeStateMachine}}. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345141#comment-14345141
 ] 

Hudson commented on YARN-3265:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #112 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/112/])
YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's 
available resource-limit from the parent queue. Contributed by Wangda Tan. 
(vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


> CapacityScheduler deadlock when computing absolute max avail capacity (fix 
> for trunk/branch-2)
> --
>
> Key: YARN-3265
> URL: https://issues.apache.org/jira/browse/YARN-3265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, 
> YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch
>
>
> This patch is trying to solve the same problem described in YARN-3251, but 
> this is a longer term fix for trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345136#comment-14345136
 ] 

Hudson commented on YARN-3270:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #112 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/112/])
YARN-3270. Fix node label expression not getting set in 
ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev 
abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java


> node label expression not getting set in ApplicationSubmissionContext
> -
>
> Key: YARN-3270
> URL: https://issues.apache.org/jira/browse/YARN-3270
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Assignee: Rohit Agarwal
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3270.patch
>
>
> One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
> setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-03 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345161#comment-14345161
 ] 

Anubhav Dhoot commented on YARN-3242:
-

[~zxu] patch looks good overall.
Instead of blindly switching in zkClient on a connect and removing it on a 
disconnect, we verify is activeZkClient is the one receiving the event
Makes sense then that we get rid of oldZkClient logic and just have one zk 
client activeZkCLient that can get events, and on connection event is activated 
for use as zkClient to actually do processing.

Verified that the updated unit test fails if i remove the check  if (zk != 
activeZkClient) {

The only minor nits
a) is if we could add comments that activeZkClient is not used to do actual 
processing (thats still zkClient) but only to process watched events and on 
connection event it gets activated into zkClient.
b) Also will CountdownWatcher#setWatchedClient be ever more than once? If not 
rename it to initializeWatchedClient and let it throw if client is already not 
null.

LGTM otherwise

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing clien

[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-03-03 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345160#comment-14345160
 ] 

Junping Du commented on YARN-3031:
--

Discussed offline with [~vrushalic] and [~zjshen] last week and we agree to 
consolidate APIs here. [~vrushalic], mind giving a quick update? Thx!

> [Storage abstraction] Create backing storage write interface for ATS writers
> 
>
> Key: YARN-3031
> URL: https://issues.apache.org/jira/browse/YARN-3031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
> Attachments: Sequence_diagram_write_interaction.2.png, 
> Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
> YARN-3031.02.patch, YARN-3031.03.patch
>
>
> Per design in YARN-2928, come up with the interface for the ATS writer to 
> write to various backing storages. The interface should be created to capture 
> the right level of abstractions so that it will enable all backing storage 
> implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345176#comment-14345176
 ] 

Hudson commented on YARN-3270:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #121 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/121/])
YARN-3270. Fix node label expression not getting set in 
ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev 
abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java


> node label expression not getting set in ApplicationSubmissionContext
> -
>
> Key: YARN-3270
> URL: https://issues.apache.org/jira/browse/YARN-3270
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Assignee: Rohit Agarwal
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3270.patch
>
>
> One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
> setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345180#comment-14345180
 ] 

Hudson commented on YARN-3265:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #121 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/121/])
YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's 
available resource-limit from the parent queue. Contributed by Wangda Tan. 
(vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java


> CapacityScheduler deadlock when computing absolute max avail capacity (fix 
> for trunk/branch-2)
> --
>
> Key: YARN-3265
> URL: https://issues.apache.org/jira/browse/YARN-3265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, 
> YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch
>
>
> This patch is trying to solve the same problem described in YARN-3251, but 
> this is a longer term fix for trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345182#comment-14345182
 ] 

Hudson commented on YARN-3281:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #121 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/121/])
YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed 
by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml


> Add RMStateStore to StateMachine visualization list
> ---
>
> Key: YARN-3281
> URL: https://issues.apache.org/jira/browse/YARN-3281
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3281.01.patch
>
>
> The command "mvn compile -Pvisualize" should generate graph representations 
> for all state machines in the project. We are still missing 
> {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for 
> resourcemanager project.
> Another class 
> {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}}
>  also has a state machine. However this one is a protected inner class, hence 
> cannot be seen by class {{VisualizeStateMachine}}. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1963) Support priorities across applications within the same queue

2015-03-03 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1963:
--
Attachment: 0001-YARN-1963-prototype.patch

Uploading a prototype version based on configuration file. 


> Support priorities across applications within the same queue 
> -
>
> Key: YARN-1963
> URL: https://issues.apache.org/jira/browse/YARN-1963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Reporter: Arun C Murthy
>Assignee: Sunil G
> Attachments: 0001-YARN-1963-prototype.patch, YARN Application 
> Priorities Design.pdf, YARN Application Priorities Design_01.pdf
>
>
> It will be very useful to support priorities among applications within the 
> same queue, particularly in production scenarios. It allows for finer-grained 
> controls without having to force admins to create a multitude of queues, plus 
> allows existing applications to continue using existing queues which are 
> usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345211#comment-14345211
 ] 

Hudson commented on YARN-3265:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2071 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2071/])
YARN-3265. Fixed a deadlock in CapacityScheduler by always passing a queue's 
available resource-limit from the parent queue. Contributed by Wangda Tan. 
(vinodkv: rev 14dd647c556016d351f425ee956ccf800ccb9ce2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java


> CapacityScheduler deadlock when computing absolute max avail capacity (fix 
> for trunk/branch-2)
> --
>
> Key: YARN-3265
> URL: https://issues.apache.org/jira/browse/YARN-3265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, 
> YARN-3265.5.patch, YARN-3265.6.patch, YARN-3265.7.patch
>
>
> This patch is trying to solve the same problem described in YARN-3251, but 
> this is a longer term fix for trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3281) Add RMStateStore to StateMachine visualization list

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345213#comment-14345213
 ] 

Hudson commented on YARN-3281:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2071 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2071/])
YARN-3281. Added RMStateStore to StateMachine visualization list. Contributed 
by Chengbing Liu (jianhe: rev 5d0bae550f5b9a6005aa1d373cfe1ec80513dbd9)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml


> Add RMStateStore to StateMachine visualization list
> ---
>
> Key: YARN-3281
> URL: https://issues.apache.org/jira/browse/YARN-3281
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3281.01.patch
>
>
> The command "mvn compile -Pvisualize" should generate graph representations 
> for all state machines in the project. We are still missing 
> {{org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore}} for 
> resourcemanager project.
> Another class 
> {{org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.StatefulContainer}}
>  also has a state machine. However this one is a protected inner class, hence 
> cannot be seen by class {{VisualizeStateMachine}}. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345207#comment-14345207
 ] 

Hudson commented on YARN-3270:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2071 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2071/])
YARN-3270. Fix node label expression not getting set in 
ApplicationSubmissionContext (Rohit Agarwal via wangda) (wangda: rev 
abac6eb9d530bb1e6ff58ec3c75b17d840a0ee3f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* hadoop-yarn-project/CHANGES.txt


> node label expression not getting set in ApplicationSubmissionContext
> -
>
> Key: YARN-3270
> URL: https://issues.apache.org/jira/browse/YARN-3270
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Assignee: Rohit Agarwal
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-3270.patch
>
>
> One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
> setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3275) CapacityScheduler: Preemption happening on non-preemptable queues

2015-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345290#comment-14345290
 ] 

Jason Lowe commented on YARN-3275:
--

Thanks for the patch, Eric!

bq. the expectation of our users is that if they are running a job on a 
non-preemptable queue, their containers should never be preempted.

I completely agree with this.  IMHO the whole point of the preemption disable 
feature is to guarantee a queue marked as such will never be preempted.  It's 
as if the entire preemption feature was turned off from that queue's 
perspective.

Looking at the patch, I'm a bit worried about this part:

{code}
+  Resource absMaxCapIdealAssignedDelta = Resource.newInstance(0, 0);
+  if (Resources.greaterThanOrEqual(
+rc, clusterResource, maxCapacity, idealAssigned)) {
+absMaxCapIdealAssignedDelta = Resources.subtract(maxCapacity, 
idealAssigned);
+  }
{code}

If the intent of this calculation is to guarantee none of the components of 
absMaxCapIdealAssignedDelta are negative then I don't believe this accomplishes 
that goal.  I believe it's possible for Resources.greaterThanOrEqual to return 
true yet subtracting the values will result in one of the components to be 
negative.  For example, what if both resources are memory dominant, maxCapacity 
has more memory than idealAssigned, but the opposite is true for vcores?  
Subtracting idealAssigned from maxCapacity will result in a positive memory 
component but a negative vcore component.  If we need to make sure neither 
component goes negative then I think we need to do a component-wise max with 
the zero resource rather than a comparision.

Also one style nit: we normally don't do one-liner conditionals without braces, 
so I'd like to see the continue explicitly put in a block.  It might be useful 
to put a debug log statement with the continue to note that we wanted to 
preempt this queue for some reason (and by how much) but it was marked with 
preemption disabled.

> CapacityScheduler: Preemption happening on non-preemptable queues
> -
>
> Key: YARN-3275
> URL: https://issues.apache.org/jira/browse/YARN-3275
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>  Labels: capacity-scheduler
> Attachments: YARN-3275.v1.txt
>
>
> YARN-2056 introduced the ability to turn preemption on and off at the queue 
> level. In cases where a queue goes over its absolute max capacity (YARN-3243, 
> for example), containers can be preempted from that queue, even though the 
> queue is marked as non-preemptable.
> We are using this feature in large, busy clusters and seeing this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-03-03 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345321#comment-14345321
 ] 

Vrushali C commented on YARN-3031:
--

Yes, we've decided to have only a write and aggregate api in the writer 
interface. The addEvent and updateMetrics is not needed, we can use the write 
api to do this. 
Also, I have the distributed shell test case working end to end for timeline 
v2. I have some feedback from a chat with Zhijie on that. I am updating the 
code and I should be posting a patch for YARN-3167, YARN-3031 and  YARN-3264 
today. 

> [Storage abstraction] Create backing storage write interface for ATS writers
> 
>
> Key: YARN-3031
> URL: https://issues.apache.org/jira/browse/YARN-3031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
> Attachments: Sequence_diagram_write_interaction.2.png, 
> Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
> YARN-3031.02.patch, YARN-3031.03.patch
>
>
> Per design in YARN-2928, come up with the interface for the ATS writer to 
> write to various backing storages. The interface should be created to capture 
> the right level of abstractions so that it will enable all backing storage 
> implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-03-03 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345345#comment-14345345
 ] 

Junping Du commented on YARN-3031:
--

Hi [~vrushalic], Awesome! I will help to review this patch here when patch is 
ready. Thanks for updating.

> [Storage abstraction] Create backing storage write interface for ATS writers
> 
>
> Key: YARN-3031
> URL: https://issues.apache.org/jira/browse/YARN-3031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
> Attachments: Sequence_diagram_write_interaction.2.png, 
> Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
> YARN-3031.02.patch, YARN-3031.03.patch
>
>
> Per design in YARN-2928, come up with the interface for the ATS writer to 
> write to various backing storages. The interface should be created to capture 
> the right level of abstractions so that it will enable all backing storage 
> implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-03 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3267:
---
Attachment: YARN_3267_WIP2.patch

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_WIP.patch, YARN_3267_WIP1.patch, 
> YARN_3267_WIP2.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-03 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345366#comment-14345366
 ] 

Prakash Ramachandran commented on YARN-3267:


[~lichangleo] I am not very familiar with the yarn code, I can test the patch 
though.

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_WIP.patch, YARN_3267_WIP1.patch, 
> YARN_3267_WIP2.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-03 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345370#comment-14345370
 ] 

Chang Li commented on YARN-3267:


Hi [~pramachandran], could you please help test this patch? I develop it 
against branch-2. I couldn't reproduce this scenario on my single node machine. 
But I am trying to write a unit test to test it. But it will be great if you 
could help test it in real scenario. Thanks a lot.

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_WIP.patch, YARN_3267_WIP1.patch, 
> YARN_3267_WIP2.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-03-03 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345406#comment-14345406
 ] 

Ravi Prakash commented on YARN-2981:


Hi Abin! The patch doesn't apply because documentation has been converted from 
apt to markdown. Could you please update it?
Could you please limit lines to 80 chars?
Could you please also split out the functionality you are proposing to limit 
cpu shares and memory into another JIRA? And also for the user the container is 
run as.



> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-2981.patch
>
>
> This allows the yarn administrator to add a cluster-wide default docker image 
> that will be used when there are no per-job override of docker images. With 
> this features, it would be convenient for newer applications like slider to 
> launch inside a cluster-default docker container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3272) Surface container locality info

2015-03-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345424#comment-14345424
 ] 

Wangda Tan commented on YARN-3272:
--

LGTM, +1 will commit today if no opposite opinions.

> Surface container locality info 
> 
>
> Key: YARN-3272
> URL: https://issues.apache.org/jira/browse/YARN-3272
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, 
> YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, 
> YARN-3272.6.patch, container locality table.png
>
>
> We can surface the container locality info on the web UI. This is useful to 
> debug "why my applications are progressing slow", especially when locality is 
> bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.

2015-03-03 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-3287:
-

 Summary: TimelineClient kerberos authentication failure uses wrong 
login context.
 Key: YARN-3287
 URL: https://issues.apache.org/jira/browse/YARN-3287
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Daryn Sharp


TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure 
for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3288) Document and fix indentation in the DockerContainerExecutor code

2015-03-03 Thread Ravi Prakash (JIRA)
Ravi Prakash created YARN-3288:
--

 Summary: Document and fix indentation in the 
DockerContainerExecutor code
 Key: YARN-3288
 URL: https://issues.apache.org/jira/browse/YARN-3288
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Trivial


The DockerContainerExecutor has several lines over 80 chars and could use some 
more documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345513#comment-14345513
 ] 

Jian He commented on YARN-3222:
---

thanks Rohith !   
I think the condition check you added earlier about sending NodeResourceUpdate 
event only if the node resource is different is useful, that saves some 
traffic. would you mind adding that too ? 
{code}
if (rmNode.getState().equals(NodeState.RUNNING)) {
  // Update scheduler node's capacity for reconnect node.
  rmNode.context
  .getDispatcher()
  .getEventHandler()
  .handle(
  new NodeResourceUpdateSchedulerEvent(rmNode, ResourceOption
  .newInstance(newNode.getTotalCapability(), -1)));
}
{code}

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-03-03 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-2190:

Attachment: YARN-2190.9.patch

Attach a new patch that adds the following two new options for the new Windows 
container executor to control if memory and CPU limit should be set on the 
backing job object. By default, memory limit is enabled; CPU limit is disabled.

{noformat}
yarn.nodemanager.windows-container-executor.memory-limit.enabled
yarn.nodemanager.windows-container-executor.cpu-limit.enabled
{noformat}

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, 
> YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3289) Docker images should be downloaded during localization

2015-03-03 Thread Ravi Prakash (JIRA)
Ravi Prakash created YARN-3289:
--

 Summary: Docker images should be downloaded during localization
 Key: YARN-3289
 URL: https://issues.apache.org/jira/browse/YARN-3289
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Ravi Prakash


We currently call docker run on images while launching containers. If the image 
size if sufficiently big, the task will timeout. We should download the image 
we want to run during localization (if possible) to prevent this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown

2015-03-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345577#comment-14345577
 ] 

Jian He commented on YARN-3168:
---

thanks [~iwasakims] !  I'll do the review and commit. 

> Convert site documentation from apt to markdown
> ---
>
> Key: YARN-3168
> URL: https://issues.apache.org/jira/browse/YARN-3168
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Gururaj Shetty
> Fix For: 3.0.0
>
> Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, 
> YARN-3168.20150225.2.patch, YARN-3168.20150227.3.patch
>
>
> YARN analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3289) Docker images should be downloaded during localization

2015-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345578#comment-14345578
 ] 

Jason Lowe commented on YARN-3289:
--

There is no application-level (e.g.: MapReduce) task heartbeat during 
localization because the application code isn't running yet.  Downloading a 
large docker image during localization will still timeout, since the task can't 
heartbeat back to the AM to say it's making progress.

> Docker images should be downloaded during localization
> --
>
> Key: YARN-3289
> URL: https://issues.apache.org/jira/browse/YARN-3289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ravi Prakash
>
> We currently call docker run on images while launching containers. If the 
> image size if sufficiently big, the task will timeout. We should download the 
> image we want to run during localization (if possible) to prevent this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3210) [Source organization] Refactor timeline aggregator according to new code organization

2015-03-03 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3210.
---
   Resolution: Fixed
Fix Version/s: YARN-2928
 Hadoop Flags: Reviewed

Committed the patch to branch YARN-2928. Thanks for the patch, Li! And thanks 
for review, Vinod!

> [Source organization] Refactor timeline aggregator according to new code 
> organization
> -
>
> Key: YARN-3210
> URL: https://issues.apache.org/jira/browse/YARN-3210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>  Labels: refactor
> Fix For: YARN-2928
>
> Attachments: YARN-3210-022715.patch, YARN-3210-030215.patch, 
> YARN-3210-030215_1.patch, YARN-3210-030215_2.patch
>
>
> We may want to refactor the code of timeline aggregator according to the 
> discussion of YARN-3166, the code organization for timeline service v2. We 
> need to refactor the code after we reach an agreement on the aggregator part 
> of YARN-3166. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3285) Convert branch-2 .apt.vm files of YARN to markdown

2015-03-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345607#comment-14345607
 ] 

Jian He commented on YARN-3285:
---

some comments which may be applicable to trunk too, we may have a separate jira 
to fix this.

-  In ResourceManagerRestart page - Inside the Notes, the *e{epoch}* / *e17*,  
was highlighted before but not now. 

- yarn container command
 {code}
list ApplicationId (should be Application Attempt ID ?)
Lists containers for the application attempt.
{code}

- yarn application attempt command 
{code}
list ApplicationId
Lists applications attempts from the RM (should be Lists applications attempts 
for the given application)
{code}

I'll commit this into branch-2 later toady. thanks [~iwasakims] !

> Convert branch-2 .apt.vm files of YARN to markdown
> --
>
> Key: YARN-3285
> URL: https://issues.apache.org/jira/browse/YARN-3285
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
> Attachments: YARN-3285.001.patch
>
>
> Backport the conversion to markdown done in YARN-3168.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3272) Surface container locality info in RM web UI

2015-03-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3272:
-
Summary: Surface container locality info in RM web UI  (was: Surface 
container locality info )

> Surface container locality info in RM web UI
> 
>
> Key: YARN-3272
> URL: https://issues.apache.org/jira/browse/YARN-3272
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, 
> YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, 
> YARN-3272.6.patch, container locality table.png
>
>
> We can surface the container locality info on the web UI. This is useful to 
> debug "why my applications are progressing slow", especially when locality is 
> bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345638#comment-14345638
 ] 

Hadoop QA commented on YARN-3267:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702200/YARN_3267_WIP2.patch
  against trunk revision 4228de9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice:

  org.apache.hadoop.mapred.TestReduceFetch
  org.apache.hadoop.yarn.server.timeline.TestTimelineDataManager
  
org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice:

org.apache.hadoop.mapred.TestMRIntermediateDataEncryption

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6821//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6821//console

This message is automatically generated.

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_WIP.patch, YARN_3267_WIP1.patch, 
> YARN_3267_WIP2.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-03-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345649#comment-14345649
 ] 

Varun Saxena commented on YARN-2962:


[~kasha] / [~jianhe] / [~ozawa], kindly review

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-03 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345658#comment-14345658
 ] 

zhihai xu commented on YARN-3242:
-

[~adhoot], thanks for the review, both suggestions are good to me. I uploaded a 
new patch YARN-3242.004.patch, which addressed both comments. Please review it.
thanks zhihai

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, null, null, null);
> } catch (InterruptedException e) {
> // ignore, close the send/event threads
> } finally {
> disconnect();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-03 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3242:

Attachment: YARN-3242.004.patch

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, null, null, null);
> } catch (InterruptedException e) {
> // ignore, close the send/event threads
> } finally {
> disconnect();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-03-03 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-2981:
--
Attachment: YARN-2981.patch

Removed as [~raviprak] suggested.

> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-2981.patch, YARN-2981.patch
>
>
> This allows the yarn administrator to add a cluster-wide default docker image 
> that will be used when there are no per-job override of docker images. With 
> this features, it would be convenient for newer applications like slider to 
> launch inside a cluster-default docker container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-03-03 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-2981:
--
Attachment: YARN-2981.patch

Fixed docs.

> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch
>
>
> This allows the yarn administrator to add a cluster-wide default docker image 
> that will be used when there are no per-job override of docker images. With 
> this features, it would be convenient for newer applications like slider to 
> launch inside a cluster-default docker container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI

2015-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345737#comment-14345737
 ] 

Hudson commented on YARN-3272:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7246 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7246/])
YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) 
(wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* hadoop-yarn-project/CHANGES.txt


> Surface container locality info in RM web UI
> 
>
> Key: YARN-3272
> URL: https://issues.apache.org/jira/browse/YARN-3272
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.0
>
> Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, 
> YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, 
> YARN-3272.6.patch, container locality table.png
>
>
> We can surface the container locality info on the web UI. This is useful to 
> debug "why my applications are progressing slow", especially when locality is 
> bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345734#comment-14345734
 ] 

Hadoop QA commented on YARN-2190:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702229/YARN-2190.9.patch
  against trunk revision e17e5ba.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6823//console

This message is automatically generated.

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch, 
> YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-03-03 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345747#comment-14345747
 ] 

zhihai xu commented on YARN-2893:
-

I find there is another possibility which can also cause this exception for 
none-secure one: the JobClient corrupted the tokens buffer.
The RM code only check the tokens buffer in RMAppManager#submitApplication for 
secure one.
{code}
if (UserGroupInformation.isSecurityEnabled()) {
  try {
this.rmContext.getDelegationTokenRenewer().addApplicationAsync(appId,
parseCredentials(submissionContext),
submissionContext.getCancelTokensWhenComplete(),
application.getUser());
  } catch (Exception e) {
LOG.warn("Unable to parse credentials.", e);
// Sending APP_REJECTED is fine, since we assume that the
// RMApp is in NEW state and thus we haven't yet informed the
// scheduler about the existence of the application
assert application.getState() == RMAppState.NEW;
this.rmContext.getDispatcher().getEventHandler()
  .handle(new RMAppRejectedEvent(applicationId, e.getMessage()));
throw RPCUtil.getRemoteException(e);
  }

  protected Credentials parseCredentials(
  ApplicationSubmissionContext application) throws IOException {
Credentials credentials = new Credentials();
DataInputByteBuffer dibb = new DataInputByteBuffer();
ByteBuffer tokens = application.getAMContainerSpec().getTokens();
if (tokens != null) {
  dibb.reset(tokens);
  credentials.readTokenStorageStream(dibb);
  tokens.rewind();
}
return credentials;
  }
{code}
I think we should do the same for none-secure one, so we can fail the 
application earlier to avoid confusion.

Also I find out a cascading patch to fix the credentials corruption at the 
jobClient.
https://github.com/Cascading/cascading/commit/45b33bb864172486ac43782a4d13329312d01c0e

I will update the patch to check the  tokens buffer for for none-secure one in 
RMAppManager#submitApplication.

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> --
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: zhihai xu
> Attachments: YARN-2893.000.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt 
> tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345755#comment-14345755
 ] 

Jian He commented on YARN-3021:
---

bq. Overall I think "automatic token renewal" has always been an "auxiliary 
service" provided by YARN's RM.
I think this raised a point that the DelegationTokenRenewal is just an 
auxiliary service, not a fundamental service required by YARN. RM today happens 
to be the renewer,  in the long term solution, we can point the renewer to a 
real centralized renewal service to support such cross-platform trust setup.  
Instead of explicitly adding a user-facing API and deprecate the API in the 
future, we may choose to add a server-side config to not let application fail 
if renewal fails.  thoughts ?

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-03-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345774#comment-14345774
 ] 

Karthik Kambatla commented on YARN-2423:


I propose we get this in. 

I understand the compatibility concern. My understanding is we would like to 
support the current APIs in TimelineClient with the new implementation as well. 
We can handle the new APIs added here along with them. To be on the safer side, 
we could annotate these methods as evolving and graduate them to stable if we 
continue to support them with the new implementation. 

Coming to the patch itself, I feel getEntity is a special case of getEntities. 
To limit the number of new APIs being added, can we get rid of it? 

[~rkanter], [~vinodkv] - thoughts? 

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
> Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
> YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
> YARN-2423.patch
>
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2380) The normalizeRequests method in SchedulerUtils always resets the vCore to 1

2015-03-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345776#comment-14345776
 ] 

Wangda Tan commented on YARN-2380:
--

[~kj-ki],
Thought about it, similar problems existed in other methods like 
roundUp/divideAndCeil, etc. probably we should make their behavior consistent.

My proposal is, if we keep vcore when normalize, we need do math on vcore when 
call roundUp/divideAndCeil, etc. We will only ignore vcore when doing compare 
operations.

Make sense?

> The normalizeRequests method in SchedulerUtils always resets the vCore to 1
> ---
>
> Key: YARN-2380
> URL: https://issues.apache.org/jira/browse/YARN-2380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Jian Fang
>Priority: Critical
> Attachments: YARN-2380.patch
>
>
> I added some log info to the method normalizeRequest() as follows.
>   public static void normalizeRequest(
>   ResourceRequest ask, 
>   ResourceCalculator resourceCalculator, 
>   Resource clusterResource,
>   Resource minimumResource,
>   Resource maximumResource,
>   Resource incrementResource) {
> LOG.info("Before request normalization, the ask capacity: " + 
> ask.getCapability());
> Resource normalized = 
> Resources.normalize(
> resourceCalculator, ask.getCapability(), minimumResource,
> maximumResource, incrementResource);
> LOG.info("After request normalization, the ask capacity: " + normalized);
> ask.setCapability(normalized);
>   }
> The resulted log showed that the vcore in ask was changed from 2 to 1.
> 2014-08-01 20:54:15,537 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC 
> Server handler 4 on 9024): Before request normalization, the ask capacity: 
> 
> 2014-08-01 20:54:15,537 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC 
> Server handler 4 on 9024): After request normalization, the ask capacity: 
> 
> The root cause is the DefaultResourceCalculator calls 
> Resources.createResource(normalizedMemory) to regenerate a new resource with 
> vcore = 1.
> This bug is critical and it leads to the mismatch of the request resource and 
> the container resource and many other potential issues if the user requests 
> containers with vcore > 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3290) DockerContainerExecutor should optionally limit memory and cpu

2015-03-03 Thread Abin Shahab (JIRA)
Abin Shahab created YARN-3290:
-

 Summary: DockerContainerExecutor should optionally limit memory 
and cpu
 Key: YARN-3290
 URL: https://issues.apache.org/jira/browse/YARN-3290
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Abin Shahab


Currently, DockerContainerExecutor does not set cgroup limits on memory and 
cpu. It should follow LCE's example to set cgroup limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2015-03-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: Non-exclusive-Node-Partition-Design.pdf

Attached same design doc for YARN-3214 (Non-exclusive node label) to umbrella 
ticket.

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, 
> Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345798#comment-14345798
 ] 

Hadoop QA commented on YARN-2981:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702254/YARN-2981.patch
  against trunk revision e17e5ba.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestDockerContainerExecutorWithMocks

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6822//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6822//console

This message is automatically generated.

> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch
>
>
> This allows the yarn administrator to add a cluster-wide default docker image 
> that will be used when there are no per-job override of docker images. With 
> this features, it would be convenient for newer applications like slider to 
> launch inside a cluster-default docker container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3291) DockerContainerExecutor should run as a non-root user

2015-03-03 Thread Abin Shahab (JIRA)
Abin Shahab created YARN-3291:
-

 Summary: DockerContainerExecutor should run as a non-root user
 Key: YARN-3291
 URL: https://issues.apache.org/jira/browse/YARN-3291
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Abin Shahab


Currently DockerContainerExecutor runs container as root. This can be run as 
the user which is not root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-03-03 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-2190:

Attachment: YARN-2190.10.patch

bq. -1 patch. The patch command could not apply the patch.

Not sure why it failed to apply. Attach a new patch generated on Linux.

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, 
> YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, 
> YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-03-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345828#comment-14345828
 ] 

Zhijie Shen commented on YARN-2423:
---

It seems that Spark wants the stable APIs. But we know, based on the new data 
model, APIs will be changed accordingly. One step back, even the APIs are not 
changed, we usually be a bit conservative to mark them \@Unstable in the 
release where they're pushed out.

bq. Coming to the patch itself, I feel getEntity is a special case of 
getEntities. To limit the number of new APIs being added, can we get rid of it?

They're wrapping different REST APIs. One get and one search, though we can 
narrow down the result set to one entity. If we want to move on with it, I 
suggest keeping both.

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
> Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
> YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
> YARN-2423.patch
>
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-03-03 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-2981:
--
Attachment: YARN-2981.patch

Another fix

> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch, 
> YARN-2981.patch
>
>
> This allows the yarn administrator to add a cluster-wide default docker image 
> that will be used when there are no per-job override of docker images. With 
> this features, it would be convenient for newer applications like slider to 
> launch inside a cluster-default docker container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3285) Convert branch-2 .apt.vm files of YARN to markdown

2015-03-03 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345821#comment-14345821
 ] 

Masatake Iwasaki commented on YARN-3285:


Thanks, [~jianhe]. I agree to keep focus only on conversion here and address 
your comments in follow-ups.

> Convert branch-2 .apt.vm files of YARN to markdown
> --
>
> Key: YARN-3285
> URL: https://issues.apache.org/jira/browse/YARN-3285
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
> Attachments: YARN-3285.001.patch
>
>
> Backport the conversion to markdown done in YARN-3168.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-03-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345836#comment-14345836
 ] 

Hitesh Shah commented on YARN-2423:
---

[~kasha] [~vinodkv] Will this api be backported all the way to 2.4 and hadoop 
maintenance releases for each line? Or will it only be available from 2.7.0 
onwards? If the latter, is there really need to publish an obsolete api for 
just one release assuming that Timeline v2 will be ready in time for 2.8?

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
> Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
> YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
> YARN-2423.patch
>
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-03 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345852#comment-14345852
 ] 

Anubhav Dhoot commented on YARN-3242:
-

LGTM

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, null, null, null);
> } catch (InterruptedException e) {
> // ignore, close the send/event threads
> } finally {
> disconnect();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345847#comment-14345847
 ] 

Hadoop QA commented on YARN-3242:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702245/YARN-3242.004.patch
  against trunk revision e17e5ba.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1236 javac 
compiler warnings (more than the trunk's current 1199 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
34 warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/6824//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws:

  org.apache.hadoop.ha.TestZKFailoverController
  org.apache.hadoop.ha.TestZKFailoverControllerStress

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6824//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6824//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6824//console

This message is automatically generated.

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>  

[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-03 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3242:

Attachment: (was: YARN-3242.004.patch)

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, null, null, null);
> } catch (InterruptedException e) {
> // ignore, close the send/event threads
> } finally {
> disconnect();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-03 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3242:

Attachment: YARN-3242.004.patch

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, null, null, null);
> } catch (InterruptedException e) {
> // ignore, close the send/event threads
> } finally {
> disconnect();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345858#comment-14345858
 ] 

Hadoop QA commented on YARN-2190:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702264/YARN-2190.10.patch
  against trunk revision e17e5ba.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6826//console

This message is automatically generated.

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, 
> YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, 
> YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-03-03 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345878#comment-14345878
 ] 

Robert Kanter commented on YARN-2423:
-

Even if ATS v2 makes it for 2.8, I imagine it won't be fully stable; I'm 
assuming we're not going to suddenly throw away the old ATS in 2.8, right?  We 
have to mark it deprecated and leave it in for a few releases, especially if 
the new ATS isn't 100% ready yet.  

We seem to be assuming that this API won't be compatible with the new ATS.  I 
agree; it likely won't.  That also means that the REST API won't either.  So, 
regardless of whether or not we add this Java API, users will have to rewrite 
their code to use a new API.  Given that, it's a lot easier and cleaner for 
Java users to rewrite their code from one Java API to another Java API, than it 
is to rewrite their custom wrapped REST API and JSON handling code (which they 
also have to write themselves) to a new Java API (and that's assuming we ship a 
new Java API in 2.8; what if we only have another REST API?  That would be even 
harder for users).
And if the API turns out to be compatible, then we're already ahead of the 
game.  Either way, it seems like it would be easier for everyone if we put in 
this API, even if it's going to be "obsolete".

Keep in mind that even if it only lasts one release, not everyone updates their 
cluster every time a new Hadoop release is out.  There are many users who will 
stay on 2.7 (or distributed derived therefrom) for quite a while and would 
benefit from this API in the mean time.

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
> Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
> YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
> YARN-2423.patch
>
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-03-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345887#comment-14345887
 ] 

Karthik Kambatla commented on YARN-2423:


[~hitesh] - in my optimistic opinion, Timeline v2 is at least 6 months out. 
Ideally, I would like for 2.8 to have come out before then, but that is besides 
the point. And, as Robert mentioned, I suspect it will take us at least another 
3 months to stabilize it enough to recommend it over Timeline v1. I feel it is 
only reasonable to provide a way for downstream apps to use the existing ATS 
until then.

Zhijie's suggestion of marking it Unstable and adding comments to capture the 
reason seems like a good approach to me. 

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
> Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
> YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
> YARN-2423.patch
>
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-03-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345890#comment-14345890
 ] 

Karthik Kambatla commented on YARN-2423:


I am open to including this in point releases based on 2.4, 2.5 and 2.6 when 
they come out. 

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
> Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
> YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
> YARN-2423.patch
>
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3222:
-
Attachment: 0005-YARN-3222.patch

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345909#comment-14345909
 ] 

Rohith commented on YARN-3222:
--

bq. check you added earlier about sending NodeResourceUpdate event only if the 
node resource is different
Agree

Updated the patch addressing above comment. Kindly review it.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-03 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3267:
---
Attachment: YARN_3267_WIP3.patch

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_WIP.patch, YARN_3267_WIP1.patch, 
> YARN_3267_WIP2.patch, YARN_3267_WIP3.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345925#comment-14345925
 ] 

Hadoop QA commented on YARN-2981:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702266/YARN-2981.patch
  against trunk revision e17e5ba.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6825//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6825//console

This message is automatically generated.

> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch, 
> YARN-2981.patch
>
>
> This allows the yarn administrator to add a cluster-wide default docker image 
> that will be used when there are no per-job override of docker images. With 
> this features, it would be convenient for newer applications like slider to 
> launch inside a cluster-default docker container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-03-03 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345950#comment-14345950
 ] 

Gera Shegalov commented on YARN-2893:
-

Hi [~zxu], it's great that you make progress on this JIRA. Any chance you can 
capture the failure scenarios in some unit test so we can relate it better to 
the real failures we are seeing.

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> --
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: zhihai xu
> Attachments: YARN-2893.000.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt 
> tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345963#comment-14345963
 ] 

Vinod Kumar Vavilapalli commented on YARN-2423:
---

bq. Zhijie's suggestion of marking it Unstable and adding comments to capture 
the reason seems like a good approach to me. 
I thought the problem for Spark per [~vanzin] was that they cannot depend on 
non-public or public-Unstable stuff, no?

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
> Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
> YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
> YARN-2423.patch
>
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-03-03 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345966#comment-14345966
 ] 

zhihai xu commented on YARN-2893:
-

Hi [~jira.shegalov],
That is a very good suggestion. Yes, I will think about to write a test case 
for this failure.
thanks zhihai

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> --
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: zhihai xu
> Attachments: YARN-2893.000.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt 
> tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-03-03 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345976#comment-14345976
 ] 

Marcelo Vanzin commented on YARN-2423:
--

We'd rather not depend on unstable APIs. But in this context, what does 
"Unstable" mean? When ATS v2 is released, will all support for ATS v1 be 
removed? Are you gonna change all the APIs to work against v2, making code 
built against v1 effectively broken?

I'd imagine that if v2 is really incompatible you'd add a new set of APIs and 
then deprecate v1 instead. The v1 APIs would be public, stable and deprecated 
at that point.

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
> Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
> YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
> YARN-2423.patch
>
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345987#comment-14345987
 ] 

Hadoop QA commented on YARN-3222:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702276/0005-YARN-3222.patch
  against trunk revision e17e5ba.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1151 javac 
compiler warnings (more than the trunk's current 185 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-distcp.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6828//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6828//console

This message is automatically generated.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345998#comment-14345998
 ] 

Vinod Kumar Vavilapalli commented on YARN-2893:
---

Great progress, [~zxu]! Your explanation sounds like this error should always 
happen. Do you know why we are only seeing it sporadically? Are there special 
conditions when this happens?

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> --
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: zhihai xu
> Attachments: YARN-2893.000.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt 
> tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2015-03-03 Thread Chuan Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345996#comment-14345996
 ] 

Chuan Liu commented on YARN-2190:
-

bq. -1 patch. The patch command could not apply the patch.

Not sure what is the problem. I can apply the patch on both Windows and Linux 
with '{{patch -p0 < YARN-2190.10.patch}}'.

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, 
> YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, 
> YARN-2190.9.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-03-03 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346006#comment-14346006
 ] 

Anubhav Dhoot commented on YARN-3122:
-

Those changes look good for me.
For a sample when i run stress -c 3 using distributed shell on a 4 core machine 
and 8 vcores configured. This would mean 3 cores would be consumed which would 
map to  6 vcores. Thus PCpu would be approx 300% (similar to top) and 
MilliVcoresUsed would be  approx 6000 which is what we see below on the actual 
metrics on the NodeManager  

{noformat}
hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -debug 
-shell_command "stress -c 3" -jar 
../share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar
 -container_memory 350 -master_memory 350
{noformat}

{noformat}
  }, {
"name" : 
"Hadoop:service=NodeManager,name=ContainerResource_container_1425421474415_0003_01_02",
"modelerType" : "ContainerResource_container_1425421474415_0003_01_02",
"tag.ContainerResource" : "container_1425421474415_0003_01_02",
"tag.Context" : "container",
"tag.ContainerPid" : "10095",
"tag.Hostname" : "anuonebox.ent.cloudera.com",
"PMemUsageMBsNumUsage" : 23,
"PMemUsageMBsAvgMBs" : 2.0,
"PMemUsageMBsStdevMBs" : 0.0,
"PMemUsageMBsIMinMBs" : 2.0,
"PMemUsageMBsIMaxMBs" : 2.0,
"PMemUsageMBsMinMBs" : 2.0,
"PMemUsageMBsMaxMBs" : 2.0,
"PCpuUsagePercentNumUsage" : 23,
"PCpuUsagePercentAvgPercents" : 284.304347826087,
"PCpuUsagePercentStdevPercents" : 62.196488341829514,
"PCpuUsagePercentIMinPercents" : -1.0,
"PCpuUsagePercentIMaxPercents" : 298.0,
"PCpuUsagePercentMinPercents" : -1.0,
"PCpuUsagePercentMaxPercents" : 298.0,
"MilliVcoreUsageNumUsage" : 23,
"MilliVcoreUsageAvgMilliVcores" : 5694.782608695651,
"MilliVcoreUsageStdevMilliVcores" : 1245.8097752255082,
"MilliVcoreUsageIMinMilliVcores" : -20.0,
"MilliVcoreUsageIMaxMilliVcores" : 5971.0,
"MilliVcoreUsageMinMilliVcores" : -20.0,
"MilliVcoreUsageMaxMilliVcores" : 5971.0,
"pMemLimitMBs" : 512,
"vMemLimitMBs" : 1075,
"vCoreLimit" : 1
  } ]
}
{noformat}

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.prelim.patch, YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346013#comment-14346013
 ] 

Rohith commented on YARN-3222:
--

Had glance at javac and javadoc warning, this looks unrelated to patch

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3292) [Umbrella] Tests and/or tools for YARN backwards compatibility verification

2015-03-03 Thread Li Lu (JIRA)
Li Lu created YARN-3292:
---

 Summary: [Umbrella] Tests and/or tools for YARN backwards 
compatibility verification
 Key: YARN-3292
 URL: https://issues.apache.org/jira/browse/YARN-3292
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Li Lu
Assignee: Li Lu


YARN-666 added the support to YARN rolling upgrade. In order to support this 
feature, we made changes from many perspectives. There were many assumptions 
made together with these existing changes. Future code changes may break these 
assumptions by accident, and hence break the YARN rolling upgrades feature. 

To simplify YARN RU regression tests, maybe we would like to create a set of 
tools/tests that can verify YARN RU backward compatibility. 

On the very first step, we may want to have a compatibility checker for 
important protocols and APIs. We may also want to incorporate these tools into 
our test Jenkins runs, if necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346048#comment-14346048
 ] 

Hadoop QA commented on YARN-3242:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702269/YARN-3242.004.patch
  against trunk revision e17e5ba.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6827//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6827//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6827//console

This message is automatically generated.

> Old ZK client session watcher event causes ZKRMStateStore out of sync with 
> current ZK client session due to ZooKeeper asynchronously closing client 
> session.
> 
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
> 

[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-03-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346058#comment-14346058
 ] 

Jian He commented on YARN-3249:
---

[~ryu_kobayashi],  thanks for your work ! 
here, can this be directly routed to the RMWebService ? 
"/ws/v1/cluster/apps/{appid}/state";
{code}
  .$onclick(String.format("confirmAction('%s')",
url(String.format("/killapp/%s", aid
{code}

> Add the kill application to the Resource Manager Web UI
> ---
>
> Key: YARN-3249
> URL: https://issues.apache.org/jira/browse/YARN-3249
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
> YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.patch, killapp-failed.log, 
> killapp-failed2.log, screenshot.png, screenshot2.png
>
>
> It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >