[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-01-13 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276275#comment-14276275
 ] 

Yi Liu commented on YARN-3055:
--

Is it possible the launcher job finishes firstly, but sub-jobs are still 
running? If so, the issue exists. If not, then the issue is invalid.

 The token is not renewed properly if it's shared by jobs (oozie) in 
 DelegationTokenRenewer
 --

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch


 After YARN-2964, there is only one timer to renew the token if it's shared by 
 jobs. 
 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.

2015-01-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276287#comment-14276287
 ] 

Jian He commented on YARN-2637:
---

lgtm too ,  thanks  [~cwelch] and [~leftnoteasy] !

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.36.patch, 
 YARN-2637.38.patch, YARN-2637.39.patch, YARN-2637.40.patch, 
 YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized

2015-01-13 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276301#comment-14276301
 ] 

Xuan Gong commented on YARN-3024:
-

[~chengbing.liu] Thanks for working on this ticket. I am starting to look at 
the patch. Overall looks good, but 

* on the latest patch, looks like you change the logic for 
{code}
  case FETCH_PENDING:
break;
{code}

Originally, we will directly return the response with LocalizerAction.LIVE
But now we have to do:
{code}
  LocalResource next = findNextResource();
  if (next != null) {
try {
  ResourceLocalizationSpec resource =
  NodeManagerBuilderUtils.newResourceLocalizationSpec(next,
getPathForLocalization(next));
  rsrcs.add(resource);
} catch (IOException e) {
  LOG.error(local path for PRIVATE localization could not be  +
found. Disks might have failed., e);
} catch (URISyntaxException e) {
//TODO fail? Already translated several times...
}
  } else if (pending.isEmpty()) {
// TODO: Synchronization
action = LocalizerAction.DIE;
  }

  response.setLocalizerAction(action);
  response.setResourceSpecs(rsrcs);
  return response;
{code}

* Could you fix this format
{code}
+  if (action == LocalizerAction.DIE) {
+   response.setLocalizerAction(action);
+   return response;
+  }
{code}


 LocalizerRunner should give DIE action when all resources are localized
 ---

 Key: YARN-3024
 URL: https://issues.apache.org/jira/browse/YARN-3024
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, 
 YARN-3024.03.patch


 We have observed that {{LocalizerRunner}} always gives a LIVE action at the 
 end of localization process.
 The problem is {{findNextResource()}} can return null even when {{pending}} 
 was not empty prior to the call. This method removes localized resources from 
 {{pending}}, therefore we should check the return value, and gives DIE action 
 when it returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3033) implement NM starting the ATS writer companion

2015-01-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276306#comment-14276306
 ] 

Vinod Kumar Vavilapalli commented on YARN-3033:
---

Thanks for filing this [~sjlee0]!

We should try to fit this together with YARN-2141 so that we have one source of 
the cluster stats.

 implement NM starting the ATS writer companion
 --

 Key: YARN-3033
 URL: https://issues.apache.org/jira/browse/YARN-3033
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R

 Per design in YARN-2928, implement node managers starting the ATS writer 
 companion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276303#comment-14276303
 ] 

Vinod Kumar Vavilapalli commented on YARN-2928:
---

Just created origin/YARN-2928 based on origin/branch-2. Let's try keeping it up 
to date at a pace that suits the branch.

[~Naganarasimha], [~varun_saxena], I see you are willing to help with this 
feature work, tx! We will have to coordinate a little on how we all move 
together on this. This may involve some readjustments on the order of the tasks 
and the assignees, please bear with me. Thanks a bunch again!

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2217) Shared cache client side changes

2015-01-13 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2217:
---
Attachment: YARN-2217-trunk-v7.patch

[~kasha] V7 attached. Added error test cases and coverage around checksum 
method.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276379#comment-14276379
 ] 

Zhijie Shen commented on YARN-2928:
---

Sangjin. Some quick thought about the second point. Currently, ATS 
work-preserving restart only involves recovery of the token information in 
secured scenario only thanks to the almost stateless nature (YARN-2837). In the 
following, depending on how the writer is implemented, we may want to preserve 
the outstanding timeline data that is received by ATS companion but is still 
not be persisted into the storage backend. IAC, it seem to be the common 
requirement no matter it's per-node (e.g., restarting) or per-app (e.g., 
crashing).

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276392#comment-14276392
 ] 

Hadoop QA commented on YARN-2984:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692116/yarn-2984-2.patch
  against trunk revision c53420f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainerMetrics

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6326//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6326//console

This message is automatically generated.

 Metrics for container's actual memory usage
 ---

 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2984-1.patch, yarn-2984-2.patch, 
 yarn-2984-prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track memory usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3019) Make work-preserving-recovery the default mechanism for RM recovery

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276394#comment-14276394
 ] 

Hudson commented on YARN-3019:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6857 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6857/])
YARN-3019. Make work-preserving-recovery the default mechanism for RM recovery. 
(Contributed by Jian He) (junping_du: rev 
f92e5038000a012229c304bc6e5281411eff2883)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 Make work-preserving-recovery the default mechanism for RM recovery
 ---

 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3019.1.patch


 The proposal is to set 
 yarn.resourcemanager.work-preserving-recovery.enabled to true by default   
 to flip recovery mode to work-preserving recovery from non-work-preserving 
 recovery. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276476#comment-14276476
 ] 

Hadoop QA commented on YARN-2984:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692147/yarn-2984-3.patch
  against trunk revision f92e503.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6329//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6329//console

This message is automatically generated.

 Metrics for container's actual memory usage
 ---

 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2984-1.patch, yarn-2984-2.patch, yarn-2984-3.patch, 
 yarn-2984-prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track memory usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-13 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276490#comment-14276490
 ] 

Sangjin Lee commented on YARN-2928:
---

bq. One additional issue for developing the new feature. We may either create a 
new sub-module or or reuse the current on: applicationhistoryservice, but put 
it into blah.blah.v2 package.

My vote is to start from a clean slate with a new source project (e.g. 
applicationtimelineservice or some other distinct name) and new packages. 
There is a cost of having to copy source into the new project, but it might not 
be so bad. That way, it can start clean and small and doesn't have to carry 
code that is not relevant. Also, it won't be affected by rebasing.

What do you think?

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276417#comment-14276417
 ] 

Hadoop QA commented on YARN-2217:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12692107/YARN-2217-trunk-v7.patch
  against trunk revision f92e503.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  
org.apache.hadoop.yarn.client.api.impl.TestSharedCacheClientImpl

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6328//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6328//console

This message is automatically generated.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3058) Fix error msg of tokens activation delay configuration

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276437#comment-14276437
 ] 

Hadoop QA commented on YARN-3058:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692120/YARN-3058.001.patch
  against trunk revision c53420f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6327//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6327//console

This message is automatically generated.

 Fix error msg of tokens activation delay configuration
 --

 Key: YARN-3058
 URL: https://issues.apache.org/jira/browse/YARN-3058
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Attachments: YARN-3058.001.patch


 {code}
 this.rollingInterval = conf.getLong(
  
 YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS,
 
 YarnConfiguration.DEFAULT_RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS)
  * 1000;
 ...
 this.activationDelay =
 (long) (conf.getLong(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS,
 YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS) * 1.5);
 ...
 if (rollingInterval = activationDelay * 2) {
   throw new IllegalArgumentException(
   
 YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS
   +  should be more than 2 X 
   + 
 YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS);
 }
 {code}
 The error msg should be 
 {code}
 YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS
   +  should be more than 3 X 
   + YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS);
 {code}
 Also It's {{3 X}} instead of {{2 X}}, since it's multiplied by *1.5*.
 There are few other places having same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized

2015-01-13 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276470#comment-14276470
 ] 

Chengbing Liu commented on YARN-3024:
-

[~xgong] Thanks for reviewing. 
{quote}
on the latest patch, looks like you change the logic for
{quote}
The logic of giving resources to be localized is actually changed.

Previously, {{LocalizedRunner}} does not give the next resource to 
{{ContainerLocalizer}} until the previous has been downloaded.

In this patch, {{LocalizedRunner}} will not wait for the previous resource to 
be downloaded. {{ContainerLocalizer}} can handle that by submitting the 
download task to its CompletionService, which is able to queue those tasks, 
before executing them. The download thread pool of the CompletionService 
remains a single thread executor.

Therefore, it is possible that {{ContainerLocalizer}} sends multiple 
{{LocalResourceStatus}} to {{LocalizerRunner}} through heartbeat. In this case, 
I think we should try to find the next resources to be localized even when 
getting FETCH_PENDING.

I have tested it on a real cluster. I specified a large archive which should 
take a long time to be localized. The result shows they were getting localized 
serially, and one heartbeat contained multiple statuses of small files (thus 
reducing the number of heartbeat).

{quote}
Could you fix this format
{quote}
My bad, I will fix this.

 LocalizerRunner should give DIE action when all resources are localized
 ---

 Key: YARN-3024
 URL: https://issues.apache.org/jira/browse/YARN-3024
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, 
 YARN-3024.03.patch


 We have observed that {{LocalizerRunner}} always gives a LIVE action at the 
 end of localization process.
 The problem is {{findNextResource()}} can return null even when {{pending}} 
 was not empty prior to the call. This method removes localized resources from 
 {{pending}}, therefore we should check the return value, and gives DIE action 
 when it returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-13 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276438#comment-14276438
 ] 

Yongjun Zhang commented on YARN-3021:
-

Thanks a lot [~adhoot] and [~vinodkv]!

{quote}
Having said that, if RM cannot validate the token as valid why would the job 
itself work? Would not the containers themselves face the same issue using the 
tokens?
{quote}
Based on the scenario [~qwertymaniac] described in the jira description, the 
token is from realm B, which can not be validated by realm A's YARN since A and 
B doesn't trust each other. However, the token can be used by distcp job 
running in realm A to access B's file (B is the distcp source).

For the scenario described in the jira, I think we are aligned that it would be 
better to add an additional parameter at the time of job submission, so client 
to can tell YARN explicitly that YARN should not try to renew the token. 

What I wanted to clarify with my earlier question was, if we support this 
scenario by having YARN not to validate the token, do we open any security 
hole? Anyone could submit a job and ask YARN not to renew the token, right?

Thanks.






 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2984) Metrics for container's actual memory usage

2015-01-13 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2984:
---
Attachment: yarn-2984-3.patch

Looks like a timing issue with the test, increased the test timer to fire every 
100 ms instead of 50. 

 Metrics for container's actual memory usage
 ---

 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2984-1.patch, yarn-2984-2.patch, yarn-2984-3.patch, 
 yarn-2984-prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track memory usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276472#comment-14276472
 ] 

Karthik Kambatla commented on YARN-2217:


Is it a classpath issue? 

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2861:
--
Attachment: YARN-2861.2.patch

Thanks for review, Jian! I updated the patch.

 Timeline DT secret manager should not reuse the RM's configs.
 -

 Key: YARN-2861
 URL: https://issues.apache.org/jira/browse/YARN-2861
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2861.1.patch, YARN-2861.2.patch


 This is the configs for RM DT secret manager. We should create separate ones 
 for timeline DT only.
 {code}
   @Override
   protected void serviceInit(Configuration conf) throws Exception {
 long secretKeyInterval =
 conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT);
 long tokenMaxLifetime =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY,
 YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT);
 long tokenRenewInterval =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT);
 secretManager = new 
 TimelineDelegationTokenSecretManager(secretKeyInterval,
 tokenMaxLifetime, tokenRenewInterval,
 360);
 secretManager.startThreads();
 serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig());
 super.init(conf);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-01-13 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276599#comment-14276599
 ] 

Chen He commented on YARN-1680:
---

To address the twice blacklist issue (a node is blacklist by App and later by 
cluster). 
I propose two steps:
1. Every time, App asks for blacklist addition, we check wether the nodes in 
addition are in cluster blacklist or not (O(m), m is the nodes in blacklist 
addition). If so, remove this node from addition. 
2. It is possible that App unblacklist a node (put it in blacklist removal) but 
the cluster still blacklist it. In this situation, the clusterResource does not 
contain this node resource. Then, we need to remove this node from App's 
blacklist removal set in headroom caculation.  

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.

2015-01-13 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275700#comment-14275700
 ] 

Craig Welch commented on YARN-2637:
---

Regarding the findbugs report for LeafQueue.lastClusterResource - access to 
lastClusterResource appears to be synchronized everywhere except 
getAbsActualCapacity, which I don't actually see being used anywhere - I'm 
going to add a findbugs exception and a comment on the method so that if it is 
used in the future synchronization can be addressed

-re [~leftnoteasy] 's latest:

-re 1 - actually, user limits are based on absolute queue capacity rather than 
max capacity - this is apparently intentional because, although a queue can 
exceed it's absolute capacity, an individual user is not supposed to, hence my 
basing the user amlimit on the absolute capacity.  The approach I use fits with 
the original logic in CSQueueUtils which allows a user the greater of the 
userlimit share of the absolute capacity or 1/# active users (so if there are 
fewer users active than would reach the userlimit they can use the full queue 
absolute capacity), the only correction being that we are using the actual 
value of resources by application masters instead of one based on minalloc

-re 2 - Actually, the snippet provided is not quite correct, some schedulers 
provide a cpu value as well.  In any case, for encapsulation reasons it's 
better to use the scheduler's value in case its means of determining this 
changes in the future. 

-re 3 - I can't see this making the slightest difference in understandability - 
since these test's paths don't populate the rmapps I would simply be 
individually putting mocked ones into the map instead of the single mock + 
matcher for all the apps.  The way it is seems clearer to me as all of the 
mocking is together instead of distributing the (mock activity, if not mock 
framework...) process of putting mock rmapps into the collection throughout the 
test

-re 4 - interesting, those were already there, but I also couldn't see why.  
Test passes fine without them, so I removed them

-re 5 - removed

uploading updated patch in a few

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.36.patch, 
 YARN-2637.38.patch, YARN-2637.39.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.

2015-01-13 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.40.patch

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.36.patch, 
 YARN-2637.38.patch, YARN-2637.39.patch, YARN-2637.40.patch, 
 YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-13 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-7.patch

Thanks [~wangda][~jianhe]and [~sunilg] for reviews

Updated the patch

Thanks,
Mayank


 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3020) n similar addContainerRequest()s produce n*(n+1)/2 containers

2015-01-13 Thread Peter D Kirchner (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275825#comment-14275825
 ] 

Peter D Kirchner commented on YARN-3020:


I investigated the rates in the third paragraph of my comment immediately 
above, and found that an application is able to make addContainerRequest()s 
much faster than this.  Bear in mind that the elapsed time for making the 
client-api call to addContainerRequest() is not a measurement of the 
performance impact of the reported over-requests sent to the server and the 
resulting over-allocation of containers. It turns out my application has some 
extrinsic delay in issuing addContainerRequests which predominated in limiting 
the rate I measured and reported in the third paragraph of the comment 
immediately above.

To follow up, I measured addContainerRequest() timing with System.nanoTime().  
The first call to addContainerRequest() takes around 5 milliseconds.  The rest 
take around half a millisecond on average.  Here are some statistics for 
calling addContainerRequest():  microseconds average=433 count=914 max=11202 
min=223 .  I measure similar times for consecutive calls (without additional 
application delays in between addContainerRequest()s).

When the over-request bug is fixed, I will still think it tedious to call 1000x 
for 1000 identical containers but many applications can probably afford the 
half second to do so. Arguably, the bug exists in part because of the 
tediousness of bookkeeping on the yarn-client-api side for these requests.  If 
in the process of bug-fixing or cleanup, a change that re-introduces an integer 
quantity with the request would be welcome.

 n similar addContainerRequest()s produce n*(n+1)/2 containers
 -

 Key: YARN-3020
 URL: https://issues.apache.org/jira/browse/YARN-3020
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
Reporter: Peter D Kirchner
   Original Estimate: 24h
  Remaining Estimate: 24h

 BUG: If the application master calls addContainerRequest() n times, but with 
 the same priority, I get up to 1+2+3+...+n containers = n*(n+1)/2 .  The most 
 containers are requested when the interval between calls to 
 addContainerRequest() exceeds the heartbeat interval of calls to allocate() 
 (in AMRMClientImpl's run() method).
 If the application master calls addContainerRequest() n times, but with a 
 unique priority each time, I get n containers (as I intended).
 Analysis:
 There is a logic problem in AMRMClientImpl.java.
 Although AMRMClientImpl.java, allocate() does an ask.clear() , on subsequent 
 calls to addContainerRequest(), addResourceRequest() finds the previous 
 matching remoteRequest and increments the container count rather than 
 starting anew, and does an addResourceRequestToAsk() which defeats the 
 ask.clear().
 From documentation and code comments, it was hard for me to discern the 
 intended behavior of the API, but the inconsistency reported in this issue 
 suggests one case or the other is implemented incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging

2015-01-13 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275686#comment-14275686
 ] 

Eric Payne commented on YARN-2932:
--

[~leftnoteasy], thanks very much for your review and comments:

bq. 1. Rename {{isQueuePreemptable}} to {{getQueuePreemptable}} for 
getter/setter consistency in {{CapacitySchedulerConfiguration}}
Renamed.

bq. 2. Should consider queue reinitialize when queue preemptable in 
configuration changes (See {{TestQueueParsing}}). And it's best to add a test 
for verify that.
I'm sorry. I don't understand what you mean by the use of the word consider. 
Calling {{CapacityScheduler.reinitialize}} will follow the queue hierarchy down 
and eventually call {{AbstractCSQueue#setupQueueConfigs}} for every queue, so I 
don't think there is any additional code needed, unless I'm missing something. 
Were you just saying that I need to add a test case for that?

{quote}
3. It's better to remove the {{defaultVal}} parameter in 
{{CapacitySchedulerConfiguration.isPreemptable}}:
{code}
public boolean isQueuePreemptable(String queue, boolean defaultVal) 
{code}
And the default_value should be placed in {{CapacitySchedulerConfiguration}}, 
like other queue configuration options.
I understand what you trying to do is moving some logic from queue to 
{{CapacitySchedulerConfiguration}}, but I still think it's better to keep the 
{{CapacitySchedulerConfiguration}} simply gets some values from configuration 
file.
{quote}
The problem is that without the {{defaultval}} parameter, 
{{AbstractCSQueue#isQueuePathHierarchyPreemptable}} can't tell if the queue has 
explicitly set its preemptability or if it is just returning the default. For 
example:
{code}
root: disable_preemption = true
root.A: disable_preemption (the property is not set)
root.B: disable_preemption = false (the property is explicitly set to false)
{code}
Let's say the {{getQueuePreemptable}} interface is changed to remove the 
{{defaultVal}} parameter, and that when {{getQueuePreemptable}} calls 
{{getBoolean}}, it uses {{false}} as the default.

# {{getQueuePreemptable}} calls {{getBoolean}} on {{root}}
## {{getBoolean}} returns {{true}} because the {{disable_preemption}} property 
is set to {{true}}
## {{getQueuePreemptable}} inverts {{true}} and returns {{false}} (That is, 
{{root}} has preemption disabled, so it is not preemptable).
# {{getQueuePreemptable}} calls {{getBoolean}} on {{root.A}}
## {{getBoolean}} returns {{false}} because there is no {{disable_preemption}} 
property set for this queue, so {{getBoolean}} returns the default.
## {{getQueuePreemptable}} inverts {{false}} and returns {{true}}
# {{getQueuePreemptable}} calls {{getBoolean}} on {{root.B}}
## {{getBoolean}} returns {{false}} because {{disable_preemption}} property is 
set to {{false}} for this queue
## {{getQueuePreemptable}} inverts {{false}} and returns {{true}}

At this point, {{isQueuePathHierarchyPreemptable}} needs to know if it should 
use the default preemption from {{root}} or if it should use the value from 
each child queue. In the case of {{root.A}}, the value from {{root}} 
({{false}}) should be used because {{root.A}} does not have the property set. 
In the case of {{root.B}}, the value should be the one returned for {{root.B}} 
({{true}}) because it is explicitly set. But, since both {{root.A}} and 
{{root.B}} both returned {{true}}, {{isQueuePathHierarchyPreemptable}} can't 
tell the difference. Does that make sense?

 Add entry for preemption setting to queue status screen and startup/refresh 
 logging
 ---

 Key: YARN-2932
 URL: https://issues.apache.org/jira/browse/YARN-2932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt


 YARN-2056 enables the ability to turn preemption on or off on a per-queue 
 level. This JIRA will provide the preemption status for each queue in the 
 {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue 
 refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3035) create a test-only backing storage implementation for ATS writes

2015-01-13 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee reassigned YARN-3035:
-

Assignee: Sangjin Lee

 create a test-only backing storage implementation for ATS writes
 

 Key: YARN-3035
 URL: https://issues.apache.org/jira/browse/YARN-3035
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 Per design in YARN-2928, create a test-only bare bone backing storage 
 implementation for ATS writes.
 We could consider something like a no-op or in-memory storage strictly for 
 development and testing purposes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3052) provide a very simple POC html ATS UI

2015-01-13 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee reassigned YARN-3052:
-

Assignee: Sangjin Lee

 provide a very simple POC html ATS UI
 -

 Key: YARN-3052
 URL: https://issues.apache.org/jira/browse/YARN-3052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 As part of accomplishing a minimum viable product, we want to be able to show 
 some UI in html (however crude it is). This subtask calls for creating a 
 barebones UI to do that.
 This should be replaced later with a better-designed and implemented proper 
 UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle

2015-01-13 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee reassigned YARN-3030:
-

Assignee: Sangjin Lee

 set up ATS writer with basic request serving structure and lifecycle
 

 Key: YARN-3030
 URL: https://issues.apache.org/jira/browse/YARN-3030
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 Per design in YARN-2928, create an ATS writer as a service, and implement the 
 basic service structure including the lifecycle management.
 Also, as part of this JIRA, we should come up with the ATS client API for 
 sending requests to this ATS writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-13 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275797#comment-14275797
 ] 

Anubhav Dhoot commented on YARN-3021:
-

Looking at the patch itself we seem to suppress an error that would earlier be 
visible to user. Thats going to make it harder to detect genuine failures. MR1 
seems to be worse than YARN in this aspect and we don't need to make it match 
that behavior. If we really need to skip validation, as you said, adding a 
feature into YARN where application could opt in would be better. 

Having said that, if RM cannot validate the token as valid why would the job 
itself work? Would not the containers themselves face the same issue using the 
tokens? 

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276352#comment-14276352
 ] 

Hudson commented on YARN-2637:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6856 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6856/])
YARN-2637. Fixed max-am-resource-percent calculation in CapacityScheduler when 
activating applications. Contributed by Craig Welch (jianhe: rev 
c53420f58364b11fbda1dace7679d45534533382)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java


 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, 

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-13 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276358#comment-14276358
 ] 

Sangjin Lee commented on YARN-2928:
---

Regarding the per-node approach, I do have some questions (and observations) on 
the approach in addition to the aspect of losing the isolation/attribution as 
already discussed.

(1)
While it may be faster to allocate with the per-node companions, capacity-wise 
you would end up spending more capacity with the per-node approach. Since these 
per-node companions are always up although they may be idle for large amount of 
time. So if capacity is a concern you may lose out. Under what circumstances 
would per-node companions be more advantageous in terms of capacity?

(2)
I do have a question about the work-preserving aspect of the per-node ATS 
companion. One implication of making this a per-node thing (i.e. long-running) 
is that we need to handle the work-preserving restart. What if we need to 
restart the ATS companion? Since other YARN daemons (RM and NM) allow for 
work-preserving restarts, we cannot have the ATS companion break that. So that 
seems to be a requirement?

(3)
We still need to handle the lifecycle management aspects of it. Previously we 
said that when RM allocates an AM it would tell the NM so the NM could spawn 
the special container. With the per-node approach, the RM would *still* need to 
tell the NM so that the NM can talk to the per-node ATS companion to initialize 
the data structure for the given app.

These are quick observations. While I do see value in the per-node approach, 
it's not totally clear how much work it would save over the per-app approach 
given these observations. What do you think?


 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3058) Fix error msg of tokens activation delay configuration

2015-01-13 Thread Yi Liu (JIRA)
Yi Liu created YARN-3058:


 Summary: Fix error msg of tokens activation delay configuration
 Key: YARN-3058
 URL: https://issues.apache.org/jira/browse/YARN-3058
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor


{code}
this.rollingInterval = conf.getLong(
 YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS,

YarnConfiguration.DEFAULT_RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS) 
* 1000;
...
this.activationDelay =
(long) (conf.getLong(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS,
YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS) * 1.5);
...
if (rollingInterval = activationDelay * 2) {
  throw new IllegalArgumentException(
  YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS
  +  should be more than 2 X 
  + 
YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS);
}
{code}

The error msg should be 
{code}
YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS
  +  should be more than 3 X 
  + YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS);
{code}
Also It's {{3 X}} instead of {{2 X}}, since it's multiplied by *1.5*.
There are few other places having same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276364#comment-14276364
 ] 

Hadoop QA commented on YARN-2217:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12692107/YARN-2217-trunk-v7.patch
  against trunk revision 85aec75.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  
org.apache.hadoop.yarn.client.api.impl.TestSharedCacheClientImpl

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6325//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6325//console

This message is automatically generated.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3058) Fix error msg of tokens activation delay configuration

2015-01-13 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-3058:
-
Attachment: YARN-3058.001.patch

 Fix error msg of tokens activation delay configuration
 --

 Key: YARN-3058
 URL: https://issues.apache.org/jira/browse/YARN-3058
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Attachments: YARN-3058.001.patch


 {code}
 this.rollingInterval = conf.getLong(
  
 YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS,
 
 YarnConfiguration.DEFAULT_RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS)
  * 1000;
 ...
 this.activationDelay =
 (long) (conf.getLong(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS,
 YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS) * 1.5);
 ...
 if (rollingInterval = activationDelay * 2) {
   throw new IllegalArgumentException(
   
 YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS
   +  should be more than 2 X 
   + 
 YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS);
 }
 {code}
 The error msg should be 
 {code}
 YarnConfiguration.RM_CONTAINER_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS
   +  should be more than 3 X 
   + YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS);
 {code}
 Also It's {{3 X}} instead of {{2 X}}, since it's multiplied by *1.5*.
 There are few other places having same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276367#comment-14276367
 ] 

Zhijie Shen commented on YARN-2928:
---

Thanks for creating the branch, Vinod! One additional issue for developing the 
new feature. We may either create a new sub-module or or reuse the current on: 
applicationhistoryservice, but put it into blah.blah.v2 package. The latter way 
might make project organization a bit easier given we reuse the existing TS 
code. But in this case, one step back, we need to correct the his sub-module 
and package naming first,  preventing further propagating the confusing 
terminology. Thoughts?

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3059) RM web page can not display NM's health report which is healthy

2015-01-13 Thread Wang Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Hao updated YARN-3059:
---
Description: 
If the NM is healthy, it's health report can not display in RM web page.
In function reportHealthStatus of NodeHealthMonitorExecutor, I found that if 
the HealthCheckerExitStatus is successful, the output was set to a empty string 
in function setHealthStatus.
so,I change the code “setHealthStatus(true, , now)” to 
“setHealthStatus(true,shexec.getOutput(), now)” . Then, the RM web page can 
display the NM's health report.
Maybe set the output to a empty string can decrease the data that transfered 
between RM and NM. But I think we want to see the health report of NM in some 
cases.

  was:
If the NM is healthy, it's health report can not display in RM web page.
In function reportHealthStatus of NodeHealthMonitorExecutor, I found that if 
the HealthCheckerExitStatus is successful, the output was set to a empty string 
in function setHealthStatus.
so,I change the code “setHealthStatus(true, , now)” to 
“setHealthStatus(true,shexec.getOutput(), now)” . Then, the RM web page can 
display the NM's health report.
Maybe set the output to a empty string can decrease the data that transfer
ed between RM and NM. But I think we want to see the health report of NM in 
some cases.


 RM web page can not display NM's health report which is healthy
 ---

 Key: YARN-3059
 URL: https://issues.apache.org/jira/browse/YARN-3059
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wang Hao

 If the NM is healthy, it's health report can not display in RM web page.
 In function reportHealthStatus of NodeHealthMonitorExecutor, I found that if 
 the HealthCheckerExitStatus is successful, the output was set to a empty 
 string in function setHealthStatus.
 so,I change the code “setHealthStatus(true, , now)” to 
 “setHealthStatus(true,shexec.getOutput(), now)” . Then, the RM web page can 
 display the NM's health report.
 Maybe set the output to a empty string can decrease the data that transfered 
 between RM and NM. But I think we want to see the health report of NM in some 
 cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3059) RM web page can not display NM's health report which is healthy

2015-01-13 Thread Wang Hao (JIRA)
Wang Hao created YARN-3059:
--

 Summary: RM web page can not display NM's health report which is 
healthy
 Key: YARN-3059
 URL: https://issues.apache.org/jira/browse/YARN-3059
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wang Hao


If the NM is healthy, it's health report can not display in RM web page.
In function reportHealthStatus of NodeHealthMonitorExecutor, I found that if 
the HealthCheckerExitStatus is successful, the output was set to a empty string 
in function setHealthStatus.
so,I change the code “setHealthStatus(true, , now)” to 
“setHealthStatus(true,shexec.getOutput(), now)” . Then, the RM web page can 
display the NM's health report.
Maybe set the output to a empty string can decrease the data that transfer
ed between RM and NM. But I think we want to see the health report of NM in 
some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-01-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276280#comment-14276280
 ] 

Jian He commented on YARN-3055:
---

bq. Is it possible the launcher job finishes firstly, but sub-jobs are still 
running? 
This is an existing issue as discussed in 
https://issues.apache.org/jira/browse/YARN-2964?focusedCommentId=14252218page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14252218.
  And a long-term solution is to have a group Id for a group of applications so 
that the token lifetime is tied to a group of applications instead of a single 
application.

 The token is not renewed properly if it's shared by jobs (oozie) in 
 DelegationTokenRenewer
 --

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch


 After YARN-2964, there is only one timer to renew the token if it's shared by 
 jobs. 
 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276289#comment-14276289
 ] 

Vinod Kumar Vavilapalli commented on YARN-2928:
---

bq. On the process side, I propose we do work on a branch with a goal to borrow 
whatever code is possible to from current Timeline service.
Don't see any concerns on this. Creating a branch now and will get people 
participating in this branch to be branch committers if they aren't already 
committers. Irrespective of that, I think we should simply see it as RTC on 
branch - a JIRA for every task, patches uploaded to JIRA and reviewed/committed 
by someone else etc.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines

2015-01-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276324#comment-14276324
 ] 

Karthik Kambatla commented on YARN-2965:


Look forward to the diagram. I have been thinking about it, but don't have 
anything concrete in my mind yet. :)

 Enhance Node Managers to monitor and report the resource usage on machines
 --

 Key: YARN-2965
 URL: https://issues.apache.org/jira/browse/YARN-2965
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: ddoc_RT.docx


 This JIRA is about augmenting Node Managers to monitor the resource usage on 
 the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2984) Metrics for container's actual memory usage

2015-01-13 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2984:
---
Attachment: yarn-2984-2.patch

Updated patch marks the configs Private, improves the test a tad bit and cleans 
up ContainersMonitorImpl a little more. 

 Metrics for container's actual memory usage
 ---

 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2984-1.patch, yarn-2984-2.patch, 
 yarn-2984-prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track memory usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276296#comment-14276296
 ] 

Karthik Kambatla commented on YARN-2928:


+1 to work on a branch. Developing features on branches seems to be working 
very well for HDFS folks. I would like for us to adopt the same model; that 
becomes easier if *all* features are developed on a branch. 

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines

2015-01-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276313#comment-14276313
 ] 

Vinod Kumar Vavilapalli commented on YARN-2965:
---

The actual ticket that needs some collaboration is YARN-3033.

Agreed, this feature work doesn't need everything that YARN-2928 needs, but 
they are all similar to me - responsibility of obtaining stats at that node 
level. After they are collected on a single node, the stats information gets 
forwarded to RM, per-app agent etc. I'll make a short diagram to illustrate how 
all of this can be unified.

 Enhance Node Managers to monitor and report the resource usage on machines
 --

 Key: YARN-2965
 URL: https://issues.apache.org/jira/browse/YARN-2965
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: ddoc_RT.docx


 This JIRA is about augmenting Node Managers to monitor the resource usage on 
 the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-13 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276318#comment-14276318
 ] 

Sangjin Lee commented on YARN-2928:
---

We have an unofficial IRC chatroom open for quick dev discussions on this. It's 
##hadoop-ats (note 2 #'s) on irc.freenode.net.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276340#comment-14276340
 ] 

Karthik Kambatla commented on YARN-2928:


It would be nice to create a branch based on trunk instead of branch-2, so we 
can merge into trunk before branch-2. 

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276344#comment-14276344
 ] 

Vinod Kumar Vavilapalli commented on YARN-2928:
---

Makes sense, recreated the branch off trunk..

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-13 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276389#comment-14276389
 ] 

Chris Trezzo commented on YARN-2217:


Re-kicking QA build to confirm the test failures in 
org.apache.hadoop.yarn.client.api.impl.TestSharedCacheClientImpl (they passed 
on my local machine).

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274915#comment-14274915
 ] 

Hadoop QA commented on YARN-3024:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12691878/YARN-3024.03.patch
  against trunk revision c4cba61.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6318//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6318//console

This message is automatically generated.

 LocalizerRunner should give DIE action when all resources are localized
 ---

 Key: YARN-3024
 URL: https://issues.apache.org/jira/browse/YARN-3024
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, 
 YARN-3024.03.patch


 We have observed that {{LocalizerRunner}} always gives a LIVE action at the 
 end of localization process.
 The problem is {{findNextResource()}} can return null even when {{pending}} 
 was not empty prior to the call. This method removes localized resources from 
 {{pending}}, therefore we should check the return value, and gives DIE action 
 when it returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer

2015-01-13 Thread Yi Liu (JIRA)
Yi Liu created YARN-3055:


 Summary: Fix allTokens issue in DelegationTokenRenewer
 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu


In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
token is shared by other jobs, we will not cancel the token. 
Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
from {{allTokens}}. Otherwise for the existing submitted applications which 
share this token will not get renew any more, and for new submitted 
applications which share this token, the token will be renew immediately.

For example, we have 3 applications: app1, app2, app3. And they share the 
token1. See following scenario:
*1).* app1 is submitted firstly, then app2, and then app3. In this case, there 
is only one token renewal timer for token1, and is scheduled when app1 is 
submitted
*2).* app1 is finished, then the renewal timer is cancelled. token1 will not be 
renewed any more, but app2 and app3 still use it, so there is problem.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3054) Preempt policy in FairScheduler may cause mapreduce job never finish

2015-01-13 Thread Peng Zhang (JIRA)
Peng Zhang created YARN-3054:


 Summary: Preempt policy in FairScheduler may cause mapreduce job 
never finish
 Key: YARN-3054
 URL: https://issues.apache.org/jira/browse/YARN-3054
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Peng Zhang


Preemption policy is related with schedule policy now. Using comparator of 
schedule policy to find preemption candidate cannot guarantee a subset of 
containers never be preempted. And this may cause tasks to be preempted 
periodically before they finish. So job cannot make any progress. 

I think preemption in YARN should got below assurance:
1. Mapreduce jobs can get additional resources when others are idle;
2. Mapreduce jobs for one user in one queue can still progress with its min 
share when others preempt resources back.

Maybe always preempt the latest app and container can get this? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer

2015-01-13 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-3055:
-
Attachment: YARN-3055.001.patch

[~jianhe], [~kasha] and [~jlowe], can you help to take a look?

 Fix allTokens issue in DelegationTokenRenewer
 -

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch


 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3027) Scheduler should use totalAvailable resource from node instead of availableResource for maxAllocation

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275252#comment-14275252
 ] 

Hudson commented on YARN-3027:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #69 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/69/])
YARN-3027. Scheduler should use totalAvailable resource from node instead of 
availableResource for maxAllocation. (adhoot via rkanter) (rkanter: rev 
ae7bf31fe1c63f323ba5271e50fd0e4425a7510f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java


 Scheduler should use totalAvailable resource from node instead of 
 availableResource for maxAllocation
 -

 Key: YARN-3027
 URL: https://issues.apache.org/jira/browse/YARN-3027
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3027.001.patch, YARN-3027.002.patch


 YARN-2604 added support for updating maxiumum allocation resource size based 
 on nodes. But it incorrectly uses available resource instead of maximum 
 resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2957) Create unit test to automatically compare YarnConfiguration and yarn-default.xml

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275251#comment-14275251
 ] 

Hudson commented on YARN-2957:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #69 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/69/])
YARN-2957. Create unit test to automatically compare YarnConfiguration and 
yarn-default.xml. (rchiang via rkanter) (rkanter: rev 
f45163191583eadcfbe0df233a3185fd1b2b78f3)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java


 Create unit test to automatically compare YarnConfiguration and 
 yarn-default.xml
 

 Key: YARN-2957
 URL: https://issues.apache.org/jira/browse/YARN-2957
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Fix For: 2.7.0

 Attachments: YARN-2957.001.patch


 Create a unit test that will automatically compare the fields in 
 YarnConfiguration and yarn-default.xml.  It should throw an error if a 
 property is missing in either the class or the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3027) Scheduler should use totalAvailable resource from node instead of availableResource for maxAllocation

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275270#comment-14275270
 ] 

Hudson commented on YARN-3027:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2004 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2004/])
YARN-3027. Scheduler should use totalAvailable resource from node instead of 
availableResource for maxAllocation. (adhoot via rkanter) (rkanter: rev 
ae7bf31fe1c63f323ba5271e50fd0e4425a7510f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Scheduler should use totalAvailable resource from node instead of 
 availableResource for maxAllocation
 -

 Key: YARN-3027
 URL: https://issues.apache.org/jira/browse/YARN-3027
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3027.001.patch, YARN-3027.002.patch


 YARN-2604 added support for updating maxiumum allocation resource size based 
 on nodes. But it incorrectly uses available resource instead of maximum 
 resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275265#comment-14275265
 ] 

Hudson commented on YARN-2643:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2004 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2004/])
YARN-2643. Don't create a new DominantResourceCalculator on every 
FairScheduler.allocate call. (kasha via rkanter) (rkanter: rev 
51881535e659940b1b332d0c5952ee1f9958cc7f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Don't create a new DominantResourceCalculator on every FairScheduler.allocate 
 call
 --

 Key: YARN-2643
 URL: https://issues.apache.org/jira/browse/YARN-2643
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.7.0

 Attachments: yarn-2643-1.patch, yarn-2643.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2957) Create unit test to automatically compare YarnConfiguration and yarn-default.xml

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275269#comment-14275269
 ] 

Hudson commented on YARN-2957:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2004 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2004/])
YARN-2957. Create unit test to automatically compare YarnConfiguration and 
yarn-default.xml. (rchiang via rkanter) (rkanter: rev 
f45163191583eadcfbe0df233a3185fd1b2b78f3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* hadoop-yarn-project/CHANGES.txt


 Create unit test to automatically compare YarnConfiguration and 
 yarn-default.xml
 

 Key: YARN-2957
 URL: https://issues.apache.org/jira/browse/YARN-2957
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Fix For: 2.7.0

 Attachments: YARN-2957.001.patch


 Create a unit test that will automatically compare the fields in 
 YarnConfiguration and yarn-default.xml.  It should throw an error if a 
 property is missing in either the class or the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3019) Enable RM work-preserving restart by default

2015-01-13 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275278#comment-14275278
 ] 

Junping Du commented on YARN-3019:
--

bq. The final goal is to support work-preserving recovery only. So the config 
yarn.resourcemanager.work-preserving-recovery.enabled is not needed any more.
Sounds good. Thanks [~jianhe] for explanation. We can mark unnecessary 
configuration as deprecated later. [~aw], if you don't have further comments, I 
will commit this simple patch in soon.

 Enable RM work-preserving restart by default 
 -

 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3019.1.patch


 The proposal is to set 
 yarn.resourcemanager.work-preserving-recovery.enabled to true by default   
 to flip recovery mode to work-preserving recovery from non-work-preserving 
 recovery. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275291#comment-14275291
 ] 

Hadoop QA commented on YARN-3055:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12691941/YARN-3055.002.patch
  against trunk revision 08ac062.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6322//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6322//console

This message is automatically generated.

 The token is not renewed properly if it's shared by jobs (oozie) in 
 DelegationTokenRenewer
 --

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch


 After YARN-2964, there is only one timer to renew the token if it's shared by 
 jobs. 
 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275247#comment-14275247
 ] 

Hudson commented on YARN-2643:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #69 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/69/])
YARN-2643. Don't create a new DominantResourceCalculator on every 
FairScheduler.allocate call. (kasha via rkanter) (rkanter: rev 
51881535e659940b1b332d0c5952ee1f9958cc7f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 Don't create a new DominantResourceCalculator on every FairScheduler.allocate 
 call
 --

 Key: YARN-2643
 URL: https://issues.apache.org/jira/browse/YARN-2643
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.7.0

 Attachments: yarn-2643-1.patch, yarn-2643.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3056) add verification for containerLaunchDuration in TestNodeManagerMetrics.

2015-01-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reassigned YARN-3056:
---

Assignee: zhihai xu

 add verification for containerLaunchDuration in TestNodeManagerMetrics.
 ---

 Key: YARN-3056
 URL: https://issues.apache.org/jira/browse/YARN-3056
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial

 add verification for containerLaunchDuration in TestNodeManagerMetrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2679) Add metric for container launch duration

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275041#comment-14275041
 ] 

Hadoop QA commented on YARN-2679:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12691910/YARN-2679.addendum.1.patch
  against trunk revision 08ac062.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6320//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6320//console

This message is automatically generated.

 Add metric for container launch duration
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2679.000.patch, YARN-2679.001.patch, 
 YARN-2679.002.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2679) Add metric for container launch duration

2015-01-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2679:

Attachment: YARN-2679.addendum.1.patch

 Add metric for container launch duration
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2679.000.patch, YARN-2679.001.patch, 
 YARN-2679.002.patch, YARN-2679.addendum.1.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2679) Add metric for container launch duration

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275033#comment-14275033
 ] 

Hadoop QA commented on YARN-2679:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12691909/YARN-2679.addendum.1.patch
  against trunk revision 08ac062.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6319//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6319//console

This message is automatically generated.

 Add metric for container launch duration
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2679.000.patch, YARN-2679.001.patch, 
 YARN-2679.002.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3056) add verification for containerLaunchDuration in TestNodeManagerMetrics.

2015-01-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3056:

Attachment: YARN-3056.000.patch

 add verification for containerLaunchDuration in TestNodeManagerMetrics.
 ---

 Key: YARN-3056
 URL: https://issues.apache.org/jira/browse/YARN-3056
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3056.000.patch


 add verification for containerLaunchDuration in TestNodeManagerMetrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2679) Add metric for container launch duration

2015-01-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2679:

Attachment: (was: YARN-2679.addendum.1.patch)

 Add metric for container launch duration
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2679.000.patch, YARN-2679.001.patch, 
 YARN-2679.002.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2679) Add metric for container launch duration

2015-01-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275001#comment-14275001
 ] 

zhihai xu commented on YARN-2679:
-

Sorry, I forget to add verification in the test. I attached an addendum patch 
which add verification in the test.

 Add metric for container launch duration
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2679.000.patch, YARN-2679.001.patch, 
 YARN-2679.002.patch, YARN-2679.addendum.1.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3056) add verification for containerLaunchDuration in TestNodeManagerMetrics.

2015-01-13 Thread zhihai xu (JIRA)
zhihai xu created YARN-3056:
---

 Summary: add verification for containerLaunchDuration in 
TestNodeManagerMetrics.
 Key: YARN-3056
 URL: https://issues.apache.org/jira/browse/YARN-3056
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: zhihai xu
Priority: Trivial


add verification for containerLaunchDuration in TestNodeManagerMetrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-2679) Add metric for container launch duration

2015-01-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reopened YARN-2679:
-

 Add metric for container launch duration
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2679.000.patch, YARN-2679.001.patch, 
 YARN-2679.002.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2679) Add metric for container launch duration

2015-01-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2679:

Attachment: YARN-2679.addendum.1.patch

 Add metric for container launch duration
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2679.000.patch, YARN-2679.001.patch, 
 YARN-2679.002.patch, YARN-2679.addendum.1.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2679) Add metric for container launch duration

2015-01-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2679:

Attachment: (was: YARN-2679.addendum.1.patch)

 Add metric for container launch duration
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2679.000.patch, YARN-2679.001.patch, 
 YARN-2679.002.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2679) Add metric for container launch duration

2015-01-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu resolved YARN-2679.
-
Resolution: Fixed

 Add metric for container launch duration
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2679.000.patch, YARN-2679.001.patch, 
 YARN-2679.002.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2679) Add metric for container launch duration

2015-01-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275038#comment-14275038
 ] 

zhihai xu commented on YARN-2679:
-

I created YARN-3056 to add verification in the test.

 Add metric for container launch duration
 

 Key: YARN-2679
 URL: https://issues.apache.org/jira/browse/YARN-2679
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Fix For: 2.7.0

 Attachments: YARN-2679.000.patch, YARN-2679.001.patch, 
 YARN-2679.002.patch


 add metrics in NodeManagerMetrics to get prepare time to launch container.
 The prepare time is the duration between sending 
 ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
 ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3057) Need update apps' runnability when reloading allocation files for FairScheduler

2015-01-13 Thread Jun Gong (JIRA)
Jun Gong created YARN-3057:
--

 Summary: Need update apps' runnability when reloading allocation 
files for FairScheduler
 Key: YARN-3057
 URL: https://issues.apache.org/jira/browse/YARN-3057
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong


If we submit a app and the number of running app in its corresponding leaf 
queue has reached its max limit, the app will be put into 'nonRunnableApps'. 
And its runnabiltiy will only be updated when removing a 
appattempt(FairScheduler will call `updateRunnabilityOnAppRemoval` at that 
time).

Suppose there are only service apps running, they will not finish, so the 
submitted app will not be scheduled even we change leaf queue's max limit. I 
think we need update apps' runnability when reloading allocation files for 
FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3056) add verification for containerLaunchDuration in TestNodeManagerMetrics.

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275058#comment-14275058
 ] 

Hadoop QA commented on YARN-3056:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12691921/YARN-3056.000.patch
  against trunk revision 08ac062.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6321//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6321//console

This message is automatically generated.

 add verification for containerLaunchDuration in TestNodeManagerMetrics.
 ---

 Key: YARN-3056
 URL: https://issues.apache.org/jira/browse/YARN-3056
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3056.000.patch


 add verification for containerLaunchDuration in TestNodeManagerMetrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2957) Create unit test to automatically compare YarnConfiguration and yarn-default.xml

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275103#comment-14275103
 ] 

Hudson commented on YARN-2957:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #806 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/806/])
YARN-2957. Create unit test to automatically compare YarnConfiguration and 
yarn-default.xml. (rchiang via rkanter) (rkanter: rev 
f45163191583eadcfbe0df233a3185fd1b2b78f3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* hadoop-yarn-project/CHANGES.txt


 Create unit test to automatically compare YarnConfiguration and 
 yarn-default.xml
 

 Key: YARN-2957
 URL: https://issues.apache.org/jira/browse/YARN-2957
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Fix For: 2.7.0

 Attachments: YARN-2957.001.patch


 Create a unit test that will automatically compare the fields in 
 YarnConfiguration and yarn-default.xml.  It should throw an error if a 
 property is missing in either the class or the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3056) add verification for containerLaunchDuration in TestNodeManagerMetrics.

2015-01-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275111#comment-14275111
 ] 

Karthik Kambatla commented on YARN-3056:


Sorry for missing this in my review of YARN-2679, and thanks for following up. 

The patch looks good. +1. 

 add verification for containerLaunchDuration in TestNodeManagerMetrics.
 ---

 Key: YARN-3056
 URL: https://issues.apache.org/jira/browse/YARN-3056
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3056.000.patch


 add verification for containerLaunchDuration in TestNodeManagerMetrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer

2015-01-13 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-3055:
-
Attachment: YARN-3055.002.patch

 Fix allTokens issue in DelegationTokenRenewer
 -

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch


 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer

2015-01-13 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-3055:
-
Attachment: YARN-3055.002.patch

 Fix allTokens issue in DelegationTokenRenewer
 -

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch


 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer

2015-01-13 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-3055:
-
Attachment: (was: YARN-3055.002.patch)

 Fix allTokens issue in DelegationTokenRenewer
 -

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch


 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer

2015-01-13 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275172#comment-14275172
 ] 

Yi Liu commented on YARN-3055:
--

Upload a new patch.

 Fix allTokens issue in DelegationTokenRenewer
 -

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch


 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275099#comment-14275099
 ] 

Hudson commented on YARN-2643:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #806 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/806/])
YARN-2643. Don't create a new DominantResourceCalculator on every 
FairScheduler.allocate call. (kasha via rkanter) (rkanter: rev 
51881535e659940b1b332d0c5952ee1f9958cc7f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Don't create a new DominantResourceCalculator on every FairScheduler.allocate 
 call
 --

 Key: YARN-2643
 URL: https://issues.apache.org/jira/browse/YARN-2643
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.7.0

 Attachments: yarn-2643-1.patch, yarn-2643.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3027) Scheduler should use totalAvailable resource from node instead of availableResource for maxAllocation

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275104#comment-14275104
 ] 

Hudson commented on YARN-3027:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #806 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/806/])
YARN-3027. Scheduler should use totalAvailable resource from node instead of 
availableResource for maxAllocation. (adhoot via rkanter) (rkanter: rev 
ae7bf31fe1c63f323ba5271e50fd0e4425a7510f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java


 Scheduler should use totalAvailable resource from node instead of 
 availableResource for maxAllocation
 -

 Key: YARN-3027
 URL: https://issues.apache.org/jira/browse/YARN-3027
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3027.001.patch, YARN-3027.002.patch


 YARN-2604 added support for updating maxiumum allocation resource size based 
 on nodes. But it incorrectly uses available resource instead of maximum 
 resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3055) Fix allTokens issue in DelegationTokenRenewer

2015-01-13 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275115#comment-14275115
 ] 

Yi Liu commented on YARN-3055:
--

The token is still not be renewed, will update the patch later

 Fix allTokens issue in DelegationTokenRenewer
 -

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch


 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-01-13 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-3055:
-
Summary: The token is not renewed properly if it's shared by jobs (oozie) 
in DelegationTokenRenewer  (was: Fix allTokens issue in DelegationTokenRenewer)

 The token is not renewed properly if it's shared by jobs (oozie) in 
 DelegationTokenRenewer
 --

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch


 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2015-01-13 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275183#comment-14275183
 ] 

Yi Liu commented on YARN-2964:
--

It seems this JIRA will cause the token is not renewed properly if it's shared 
by jobs (oozie), I filed a JIRA YARN-3055, please take a look.

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275081#comment-14275081
 ] 

Hudson commented on YARN-2643:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #72 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/72/])
YARN-2643. Don't create a new DominantResourceCalculator on every 
FairScheduler.allocate call. (kasha via rkanter) (rkanter: rev 
51881535e659940b1b332d0c5952ee1f9958cc7f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Don't create a new DominantResourceCalculator on every FairScheduler.allocate 
 call
 --

 Key: YARN-2643
 URL: https://issues.apache.org/jira/browse/YARN-2643
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.7.0

 Attachments: yarn-2643-1.patch, yarn-2643.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3027) Scheduler should use totalAvailable resource from node instead of availableResource for maxAllocation

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275086#comment-14275086
 ] 

Hudson commented on YARN-3027:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #72 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/72/])
YARN-3027. Scheduler should use totalAvailable resource from node instead of 
availableResource for maxAllocation. (adhoot via rkanter) (rkanter: rev 
ae7bf31fe1c63f323ba5271e50fd0e4425a7510f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java


 Scheduler should use totalAvailable resource from node instead of 
 availableResource for maxAllocation
 -

 Key: YARN-3027
 URL: https://issues.apache.org/jira/browse/YARN-3027
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3027.001.patch, YARN-3027.002.patch


 YARN-2604 added support for updating maxiumum allocation resource size based 
 on nodes. But it incorrectly uses available resource instead of maximum 
 resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-01-13 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-3055:
-
Description: 
After YARN-2964, there is only one timer to renew the token if it's shared by 
jobs. 
In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
token is shared by other jobs, we will not cancel the token. 
Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
from {{allTokens}}. Otherwise for the existing submitted applications which 
share this token will not get renew any more, and for new submitted 
applications which share this token, the token will be renew immediately.

For example, we have 3 applications: app1, app2, app3. And they share the 
token1. See following scenario:
*1).* app1 is submitted firstly, then app2, and then app3. In this case, there 
is only one token renewal timer for token1, and is scheduled when app1 is 
submitted
*2).* app1 is finished, then the renewal timer is cancelled. token1 will not be 
renewed any more, but app2 and app3 still use it, so there is problem.



  was:
In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
token is shared by other jobs, we will not cancel the token. 
Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
from {{allTokens}}. Otherwise for the existing submitted applications which 
share this token will not get renew any more, and for new submitted 
applications which share this token, the token will be renew immediately.

For example, we have 3 applications: app1, app2, app3. And they share the 
token1. See following scenario:
*1).* app1 is submitted firstly, then app2, and then app3. In this case, there 
is only one token renewal timer for token1, and is scheduled when app1 is 
submitted
*2).* app1 is finished, then the renewal timer is cancelled. token1 will not be 
renewed any more, but app2 and app3 still use it, so there is problem.




 The token is not renewed properly if it's shared by jobs (oozie) in 
 DelegationTokenRenewer
 --

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch


 After YARN-2964, there is only one timer to renew the token if it's shared by 
 jobs. 
 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3027) Scheduler should use totalAvailable resource from node instead of availableResource for maxAllocation

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275327#comment-14275327
 ] 

Hudson commented on YARN-3027:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #73 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/73/])
YARN-3027. Scheduler should use totalAvailable resource from node instead of 
availableResource for maxAllocation. (adhoot via rkanter) (rkanter: rev 
ae7bf31fe1c63f323ba5271e50fd0e4425a7510f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Scheduler should use totalAvailable resource from node instead of 
 availableResource for maxAllocation
 -

 Key: YARN-3027
 URL: https://issues.apache.org/jira/browse/YARN-3027
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3027.001.patch, YARN-3027.002.patch


 YARN-2604 added support for updating maxiumum allocation resource size based 
 on nodes. But it incorrectly uses available resource instead of maximum 
 resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2957) Create unit test to automatically compare YarnConfiguration and yarn-default.xml

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275326#comment-14275326
 ] 

Hudson commented on YARN-2957:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #73 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/73/])
YARN-2957. Create unit test to automatically compare YarnConfiguration and 
yarn-default.xml. (rchiang via rkanter) (rkanter: rev 
f45163191583eadcfbe0df233a3185fd1b2b78f3)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java


 Create unit test to automatically compare YarnConfiguration and 
 yarn-default.xml
 

 Key: YARN-2957
 URL: https://issues.apache.org/jira/browse/YARN-2957
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Fix For: 2.7.0

 Attachments: YARN-2957.001.patch


 Create a unit test that will automatically compare the fields in 
 YarnConfiguration and yarn-default.xml.  It should throw an error if a 
 property is missing in either the class or the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275322#comment-14275322
 ] 

Hudson commented on YARN-2643:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #73 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/73/])
YARN-2643. Don't create a new DominantResourceCalculator on every 
FairScheduler.allocate call. (kasha via rkanter) (rkanter: rev 
51881535e659940b1b332d0c5952ee1f9958cc7f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Don't create a new DominantResourceCalculator on every FairScheduler.allocate 
 call
 --

 Key: YARN-2643
 URL: https://issues.apache.org/jira/browse/YARN-2643
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.7.0

 Attachments: yarn-2643-1.patch, yarn-2643.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2957) Create unit test to automatically compare YarnConfiguration and yarn-default.xml

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275397#comment-14275397
 ] 

Hudson commented on YARN-2957:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2023 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2023/])
YARN-2957. Create unit test to automatically compare YarnConfiguration and 
yarn-default.xml. (rchiang via rkanter) (rkanter: rev 
f45163191583eadcfbe0df233a3185fd1b2b78f3)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java


 Create unit test to automatically compare YarnConfiguration and 
 yarn-default.xml
 

 Key: YARN-2957
 URL: https://issues.apache.org/jira/browse/YARN-2957
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Fix For: 2.7.0

 Attachments: YARN-2957.001.patch


 Create a unit test that will automatically compare the fields in 
 YarnConfiguration and yarn-default.xml.  It should throw an error if a 
 property is missing in either the class or the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3027) Scheduler should use totalAvailable resource from node instead of availableResource for maxAllocation

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275398#comment-14275398
 ] 

Hudson commented on YARN-3027:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2023 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2023/])
YARN-3027. Scheduler should use totalAvailable resource from node instead of 
availableResource for maxAllocation. (adhoot via rkanter) (rkanter: rev 
ae7bf31fe1c63f323ba5271e50fd0e4425a7510f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Scheduler should use totalAvailable resource from node instead of 
 availableResource for maxAllocation
 -

 Key: YARN-3027
 URL: https://issues.apache.org/jira/browse/YARN-3027
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3027.001.patch, YARN-3027.002.patch


 YARN-2604 added support for updating maxiumum allocation resource size based 
 on nodes. But it incorrectly uses available resource instead of maximum 
 resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275393#comment-14275393
 ] 

Hudson commented on YARN-2643:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2023 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2023/])
YARN-2643. Don't create a new DominantResourceCalculator on every 
FairScheduler.allocate call. (kasha via rkanter) (rkanter: rev 
51881535e659940b1b332d0c5952ee1f9958cc7f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Don't create a new DominantResourceCalculator on every FairScheduler.allocate 
 call
 --

 Key: YARN-2643
 URL: https://issues.apache.org/jira/browse/YARN-2643
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.7.0

 Attachments: yarn-2643-1.patch, yarn-2643.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1871) We should eliminate writing *PBImpl code in YARN

2015-01-13 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1871:
-
Assignee: (was: Wangda Tan)

 We should eliminate writing *PBImpl code in YARN
 

 Key: YARN-1871
 URL: https://issues.apache.org/jira/browse/YARN-1871
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.4.0
Reporter: Wangda Tan
 Attachments: YARN-1871.demo.patch


 Currently, We need write PBImpl classes one by one. After running find . 
 -name *PBImpl*.java | xargs wc -l under hadoop source code directory, we 
 can see, there're more than 25,000 LOC. I think we should improve this, which 
 will be very helpful for YARN developers to make changes for YARN protocols.
 There're only some limited patterns in current *PBImpl,
 * Simple types, like string, int32, float.
 * List? types
 * Map? types
 * Enum types
 Code generation should be enough to generate such PBImpl classes.
 Some other requirements are,
 * Leave other related code alone, like service implemention (e.g. 
 ContainerManagerImpl).
 * (If possible) Forward compatibility, developpers can write their own PBImpl 
 or genereate them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1871) We should eliminate writing *PBImpl code in YARN

2015-01-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276086#comment-14276086
 ] 

Wangda Tan commented on YARN-1871:
--

Making it un-assigned since I don't have bandwidth to do this now.

 We should eliminate writing *PBImpl code in YARN
 

 Key: YARN-1871
 URL: https://issues.apache.org/jira/browse/YARN-1871
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.4.0
Reporter: Wangda Tan
 Attachments: YARN-1871.demo.patch


 Currently, We need write PBImpl classes one by one. After running find . 
 -name *PBImpl*.java | xargs wc -l under hadoop source code directory, we 
 can see, there're more than 25,000 LOC. I think we should improve this, which 
 will be very helpful for YARN developers to make changes for YARN protocols.
 There're only some limited patterns in current *PBImpl,
 * Simple types, like string, int32, float.
 * List? types
 * Map? types
 * Enum types
 Code generation should be enough to generate such PBImpl classes.
 Some other requirements are,
 * Leave other related code alone, like service implemention (e.g. 
 ContainerManagerImpl).
 * (If possible) Forward compatibility, developpers can write their own PBImpl 
 or genereate them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging

2015-01-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276101#comment-14276101
 ] 

Wangda Tan commented on YARN-2932:
--

[~eepayne],
Thanks for response,
*Re 2:*
You're partially correct, queue finally calls setupQueueConfig when 
reinitialize is invoked. 
The CapacityScheduler reinitialization is creating a new set of queues, and 
copy new parameters to your old queues via
{code}
setupQueueConfigs(
clusterResource,
newlyParsedLeafQueue.capacity, newlyParsedLeafQueue.absoluteCapacity, 
newlyParsedLeafQueue.maximumCapacity, 
newlyParsedLeafQueue.absoluteMaxCapacity,
...
{code}
So you need put the parameter you wants to update to setupQueueConfig as well. 
Without that, queue will not be refreshed. I didn't find any changes to 
parameter of setupQueueConfig, so I guess so, it's better to add a test to 
verify it.

*Re 3:*
You can take a look at how AbstractCSQueue initialize labels,
{code}
// get labels
this.accessibleLabels = 
cs.getConfiguration().getAccessibleNodeLabels(getQueuePath());
// inherit from parent if labels not set
if (this.accessibleLabels == null  parent != null) {
  this.accessibleLabels = parent.getAccessibleNodeLabels();
}
{code}
I think they have similar logic -- For node label is trying to get value from 
configuration, if not set, inherit from parent. With this, you can make 
getPreemptable interface without defaultVal in CapacitySchedulerConfiguration.

 Add entry for preemption setting to queue status screen and startup/refresh 
 logging
 ---

 Key: YARN-2932
 URL: https://issues.apache.org/jira/browse/YARN-2932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt


 YARN-2056 enables the ability to turn preemption on or off on a per-queue 
 level. This JIRA will provide the preemption status for each queue in the 
 {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue 
 refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2015-01-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276107#comment-14276107
 ] 

Vinod Kumar Vavilapalli commented on YARN-2791:
---

Okay folks, I've read the design docs on both YARN-2791 (this JIRA) and 
YARN-2139. This is indeed part of YARN-2139, and a direct dup of YARN-2618 and 
other tickets.

Yes YARN-2139 is a much larger effort but it encompasses both scheduling and 
isolation. The important tickets of YARN-2139 already were created before this 
JIRA. I am going to close this as a duplicate in a day unless I see specific 
tasks that are not covered under YARN-2139. If there are things that are not 
covered indeed, I urge  Swapnil Daingade, Santosh Marellaand and Yuliya Feldman 
to file sub-tasks under YARN-2791.

As Karthik appealed before, let's have the design discussion over at YARN-2139, 
merging things that are only here and missing in that JIRA. Due credit will be 
given to all contributors to the design and implementation there.

I am oblivious who contributes code, but let's work together please!

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275905#comment-14275905
 ] 

Hadoop QA commented on YARN-2637:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692009/YARN-2637.40.patch
  against trunk revision 10ac5ab.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6323//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6323//console

This message is automatically generated.

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.36.patch, 
 YARN-2637.38.patch, YARN-2637.39.patch, YARN-2637.40.patch, 
 YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-01-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275952#comment-14275952
 ] 

Jian He commented on YARN-3055:
---

bq.  Meanwhile, we should not cancel the timerTask, also we should not remove 
it from allTokens.
IIUC, this is not the case. Because If launcher job first gets added to the 
appTokens map, DelegationTokenRenewer will not add DelegationTokenToRenew 
instance for the sub-job. So the tokens in removeApplicationFromRenewal will 
return empty for the sub-job when the sub-job completes. So the token won’t be 
removed from the allTokens. 

 The token is not renewed properly if it's shared by jobs (oozie) in 
 DelegationTokenRenewer
 --

 Key: YARN-3055
 URL: https://issues.apache.org/jira/browse/YARN-3055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3055.001.patch, YARN-3055.002.patch


 After YARN-2964, there is only one timer to renew the token if it's shared by 
 jobs. 
 In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
 token is shared by other jobs, we will not cancel the token. 
 Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
 from {{allTokens}}. Otherwise for the existing submitted applications which 
 share this token will not get renew any more, and for new submitted 
 applications which share this token, the token will be renew immediately.
 For example, we have 3 applications: app1, app2, app3. And they share the 
 token1. See following scenario:
 *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
 there is only one token renewal timer for token1, and is scheduled when app1 
 is submitted
 *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
 be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275939#comment-14275939
 ] 

Hadoop QA commented on YARN-2933:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692017/YARN-2933-7.patch
  against trunk revision 10ac5ab.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6324//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6324//console

This message is automatically generated.

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >