[jira] [Created] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception and

2014-08-29 Thread Shivaji Dutta (JIRA)
Shivaji Dutta created YARN-2470:
---

 Summary: A high value for yarn.nodemanager.delete.debug-delay-sec 
causes Nodemanager to crash. Slider needs this value to be high. Setting a very 
high value throws an exception and nodemanager does not start
 Key: YARN-2470
 URL: https://issues.apache.org/jira/browse/YARN-2470
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Shivaji Dutta
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set

2014-08-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114911#comment-14114911
 ] 

Zhijie Shen commented on YARN-2449:
---

The patch should work, but we can improve the logic a bit.

{code}
-if (!actualInitializers.equals(initializers)) {
+if (!actualInitializers.equals(initializers) || modifiedInitialiers) {
{code}

We can set modifiedInitialiers = true when 
TimelineAuthenticationFilterInitializer is added and 
AuthenticationFilterInitializer is skipped. These are only two possible changes.

Then, we don't need to check !actualInitializers.equals(initializers) , but 
only modifiedInitialiers in the aforementioned condition.

Can you add one more case:
{code}
+driver.put(, TimelineAuthenticationFilterInitializer.class.getName());
{code}

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 ---

 Key: YARN-2449
 URL: https://issues.apache.org/jira/browse/YARN-2449
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
 Environment: Deploy security enabled cluster is ATS also enabled and 
 running, but no hadoop.http.filter.initializers set in core-site.xml
Reporter: Karam Singh
Assignee: Varun Vasudev
Priority: Critical
 Attachments: apache-yarn-2449.0.patch


 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 Looks in it is regression from YARN-2397
 After YARN-2397. when no hadoop.http.filter.initializers is set
 Now when try fetch DELEGATION token from timelineserver, it returns invalid 
 token
 Tried to fetch timeline delegation by using curl commands :
 {code}
 1. curl -i -k -s -b 'timeline-cookie.txt' 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
 Or
 2. curl -i -k -s --negotiate -u : 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
 {code}
 Return response is for both queries: 
 {code}
 {About:Timeline API}
 {code}
 Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
 TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
 First query returns DT and Second used to fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2405) NPE in FairSchedulerAppsBlock

2014-08-29 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2405:
---

Summary: NPE in FairSchedulerAppsBlock  (was: NPE in FairSchedulerAppsBlock 
(scheduler page))

 NPE in FairSchedulerAppsBlock
 -

 Key: YARN-2405
 URL: https://issues.apache.org/jira/browse/YARN-2405
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, 
 YARN-2405.4.patch


 FairSchedulerAppsBlock#render throws NPE at this line
 {code}
   int fairShare = fsinfo.getAppFairShare(attemptId);
 {code}
 This causes the scheduler page now showing the app since it lack the 
 definition of appsTableData
 {code}
  Uncaught ReferenceError: appsTableData is not defined 
 {code}
 The problem is temporary meaning that it is usually resolved by itself either 
 after a retry or after a few hours.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock (scheduler page)

2014-08-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114910#comment-14114910
 ] 

Karthik Kambatla commented on YARN-2405:


+1.

 NPE in FairSchedulerAppsBlock (scheduler page)
 --

 Key: YARN-2405
 URL: https://issues.apache.org/jira/browse/YARN-2405
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, 
 YARN-2405.4.patch


 FairSchedulerAppsBlock#render throws NPE at this line
 {code}
   int fairShare = fsinfo.getAppFairShare(attemptId);
 {code}
 This causes the scheduler page now showing the app since it lack the 
 definition of appsTableData
 {code}
  Uncaught ReferenceError: appsTableData is not defined 
 {code}
 The problem is temporary meaning that it is usually resolved by itself either 
 after a retry or after a few hours.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-08-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114913#comment-14114913
 ] 

Zhijie Shen commented on YARN-2033:
---

The test failure should not be related. It seems that the configuration 
resource was not read correctly on jenkins:

{code}
java.lang.RuntimeException: java.util.zip.ZipException: oversubscribed dynamic 
bit lengths tree
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:105)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at 
org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown 
Source)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
Source)
at 
org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:153)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
at 
org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:605)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247)
at 
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:296)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.setup(TestRMRestart.java:119)
{code}

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
 YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, 
 YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
 YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock

2014-08-29 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114972#comment-14114972
 ] 

Tsuyoshi OZAWA commented on YARN-2405:
--

Thanks Maysam, Gera, and Karthik for review!

 NPE in FairSchedulerAppsBlock
 -

 Key: YARN-2405
 URL: https://issues.apache.org/jira/browse/YARN-2405
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, 
 YARN-2405.4.patch


 FairSchedulerAppsBlock#render throws NPE at this line
 {code}
   int fairShare = fsinfo.getAppFairShare(attemptId);
 {code}
 This causes the scheduler page now showing the app since it lack the 
 definition of appsTableData
 {code}
  Uncaught ReferenceError: appsTableData is not defined 
 {code}
 The problem is temporary meaning that it is usually resolved by itself either 
 after a retry or after a few hours.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto

2014-08-29 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114977#comment-14114977
 ] 

Tsuyoshi OZAWA commented on YARN-2406:
--

Thanks Jian for reviewing and updating!

 Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
 

 Key: YARN-2406
 URL: https://issues.apache.org/jira/browse/YARN-2406
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2406.1.patch, YARN-2406.2.patch


 Today most recovery related proto records are defined in 
 yarn_server_resourcemanager_service_protos.proto which is inside YARN-API 
 module. Since these records are internally used by RM only, we can move them 
 to the yarn_server_resourcemanager_recovery.proto file inside RM-server module



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a

2014-08-29 Thread Beckham007 (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115011#comment-14115011
 ] 

Beckham007 commented on YARN-2470:
--

Could u give some logs about this?

 A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager 
 to crash. Slider needs this value to be high. Setting a very high value 
 throws an exception and nodemanager does not start
 --

 Key: YARN-2470
 URL: https://issues.apache.org/jira/browse/YARN-2470
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Shivaji Dutta
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible

2014-08-29 Thread Krisztian Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Horvath updated YARN-2280:


Attachment: (was: YARN-2280.patch)

 Resource manager web service fields are not accessible
 --

 Key: YARN-2280
 URL: https://issues.apache.org/jira/browse/YARN-2280
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0, 2.4.1
Reporter: Krisztian Horvath
Assignee: Krisztian Horvath
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-2280.patch


 Using the resource manager's rest api 
 (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
 rest call returns a class where the fields after the unmarshal cannot be 
 accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same 
 classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible

2014-08-29 Thread Krisztian Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Horvath updated YARN-2280:


Attachment: YARN-2280.patch

 Resource manager web service fields are not accessible
 --

 Key: YARN-2280
 URL: https://issues.apache.org/jira/browse/YARN-2280
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0, 2.4.1
Reporter: Krisztian Horvath
Assignee: Krisztian Horvath
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-2280.patch


 Using the resource manager's rest api 
 (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
 rest call returns a class where the fields after the unmarshal cannot be 
 accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same 
 classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115081#comment-14115081
 ] 

Hadoop QA commented on YARN-2280:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665305/YARN-2280.patch
  against trunk revision 4ae8178.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4769//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4769//console

This message is automatically generated.

 Resource manager web service fields are not accessible
 --

 Key: YARN-2280
 URL: https://issues.apache.org/jira/browse/YARN-2280
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0, 2.4.1
Reporter: Krisztian Horvath
Assignee: Krisztian Horvath
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-2280.patch


 Using the resource manager's rest api 
 (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
 rest call returns a class where the fields after the unmarshal cannot be 
 accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same 
 classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set

2014-08-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2449:


Attachment: apache-yarn-2449.1.patch

Uploaded new patch addressing [~zjshen] comments.

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 ---

 Key: YARN-2449
 URL: https://issues.apache.org/jira/browse/YARN-2449
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
 Environment: Deploy security enabled cluster is ATS also enabled and 
 running, but no hadoop.http.filter.initializers set in core-site.xml
Reporter: Karam Singh
Assignee: Varun Vasudev
Priority: Critical
 Attachments: apache-yarn-2449.0.patch, apache-yarn-2449.1.patch


 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 Looks in it is regression from YARN-2397
 After YARN-2397. when no hadoop.http.filter.initializers is set
 Now when try fetch DELEGATION token from timelineserver, it returns invalid 
 token
 Tried to fetch timeline delegation by using curl commands :
 {code}
 1. curl -i -k -s -b 'timeline-cookie.txt' 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
 Or
 2. curl -i -k -s --negotiate -u : 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
 {code}
 Return response is for both queries: 
 {code}
 {About:Timeline API}
 {code}
 Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
 TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
 First query returns DT and Second used to fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto

2014-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115142#comment-14115142
 ] 

Hudson commented on YARN-2406:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #663 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/663/])
YARN-2406. Move RM recovery related proto to 
yarn_server_resourcemanager_recovery.proto. Contributed by Tsuyoshi OZAWA 
(jianhe: rev 7b3e27ab7393214e35a575bc9093100e94dd8c89)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java
Add CHANGES.txt for YARN-2406. (jianhe: rev 
9d68445710feff9fda9ee69847beeaf3e99b85ef)
* hadoop-yarn-project/CHANGES.txt


 Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
 

 Key: YARN-2406
 URL: https://issues.apache.org/jira/browse/YARN-2406
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2406.1.patch, YARN-2406.2.patch


 Today most recovery related proto records are defined in 
 yarn_server_resourcemanager_service_protos.proto which is inside YARN-API 
 module. Since these records are internally used by RM only, we can move them 
 to the yarn_server_resourcemanager_recovery.proto file inside RM-server module



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock

2014-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115144#comment-14115144
 ] 

Hudson commented on YARN-2405:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #663 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/663/])
YARN-2405. NPE in FairSchedulerAppsBlock. (Tsuyoshi Ozawa via kasha) (kasha: 
rev fa80ca49bdd741823ff012ddbd7a0f1aecf26195)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 NPE in FairSchedulerAppsBlock
 -

 Key: YARN-2405
 URL: https://issues.apache.org/jira/browse/YARN-2405
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, 
 YARN-2405.4.patch


 FairSchedulerAppsBlock#render throws NPE at this line
 {code}
   int fairShare = fsinfo.getAppFairShare(attemptId);
 {code}
 This causes the scheduler page now showing the app since it lack the 
 definition of appsTableData
 {code}
  Uncaught ReferenceError: appsTableData is not defined 
 {code}
 The problem is temporary meaning that it is usually resolved by itself either 
 after a retry or after a few hours.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock

2014-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115257#comment-14115257
 ] 

Hudson commented on YARN-2405:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1854 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/])
YARN-2405. NPE in FairSchedulerAppsBlock. (Tsuyoshi Ozawa via kasha) (kasha: 
rev fa80ca49bdd741823ff012ddbd7a0f1aecf26195)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java


 NPE in FairSchedulerAppsBlock
 -

 Key: YARN-2405
 URL: https://issues.apache.org/jira/browse/YARN-2405
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, 
 YARN-2405.4.patch


 FairSchedulerAppsBlock#render throws NPE at this line
 {code}
   int fairShare = fsinfo.getAppFairShare(attemptId);
 {code}
 This causes the scheduler page now showing the app since it lack the 
 definition of appsTableData
 {code}
  Uncaught ReferenceError: appsTableData is not defined 
 {code}
 The problem is temporary meaning that it is usually resolved by itself either 
 after a retry or after a few hours.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto

2014-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115255#comment-14115255
 ] 

Hudson commented on YARN-2406:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1854 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/])
YARN-2406. Move RM recovery related proto to 
yarn_server_resourcemanager_recovery.proto. Contributed by Tsuyoshi OZAWA 
(jianhe: rev 7b3e27ab7393214e35a575bc9093100e94dd8c89)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
Add CHANGES.txt for YARN-2406. (jianhe: rev 
9d68445710feff9fda9ee69847beeaf3e99b85ef)
* hadoop-yarn-project/CHANGES.txt


 Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
 

 Key: YARN-2406
 URL: https://issues.apache.org/jira/browse/YARN-2406
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2406.1.patch, YARN-2406.2.patch


 Today most recovery related proto records are defined in 
 yarn_server_resourcemanager_service_protos.proto which is inside YARN-API 
 module. Since these records are internally used by RM only, we can move them 
 to the yarn_server_resourcemanager_recovery.proto file inside RM-server module



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (YARN-2471) DEFAULT_YARN_APPLICATION_CLASSPATH doesn't honor hadoop-layout.sh

2014-08-29 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved HADOOP-11024 to YARN-2471:
-

Key: YARN-2471  (was: HADOOP-11024)
Project: Hadoop YARN  (was: Hadoop Common)

 DEFAULT_YARN_APPLICATION_CLASSPATH doesn't honor hadoop-layout.sh
 -

 Key: YARN-2471
 URL: https://issues.apache.org/jira/browse/YARN-2471
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer

 In 0.21, hadoop-layout.sh was introduced to allow for vendors to reorganize 
 the Hadoop distribution in a way that pleases them.  
 DEFAULT_YARN_APPLICATION_CLASSPATH hard-codes the paths that hadoop-layout.sh 
 was meant to override.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a

2014-08-29 Thread Shivaji Dutta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115324#comment-14115324
 ] 

Shivaji Dutta commented on YARN-2470:
-

2014-08-27 23:37:30,566 INFO  service.AbstractService 
(AbstractService.java:noteFailure(272)) - Service 
org.apache.hadoop.yarn.server.nodemanager.DeletionService failed in state 
INITED; cause: java.lang.NumberFormatException: For input string: 36
java.lang.NumberFormatException: For input string: 36
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:495)
at java.lang.Integer.parseInt(Integer.java:527)
at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1094)
at 
org.apache.hadoop.yarn.server.nodemanager.DeletionService.serviceInit(DeletionService.java:105)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:186)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:357)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:404)

 A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager 
 to crash. Slider needs this value to be high. Setting a very high value 
 throws an exception and nodemanager does not start
 --

 Key: YARN-2470
 URL: https://issues.apache.org/jira/browse/YARN-2470
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Shivaji Dutta
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a

2014-08-29 Thread Shivaji Dutta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115326#comment-14115326
 ] 

Shivaji Dutta commented on YARN-2470:
-

The number is obscenely high. Since I was experimenting with it. I used Ambari 
to set this value. Ambari should have atleast gave me a warning for this.

 A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager 
 to crash. Slider needs this value to be high. Setting a very high value 
 throws an exception and nodemanager does not start
 --

 Key: YARN-2470
 URL: https://issues.apache.org/jira/browse/YARN-2470
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Shivaji Dutta
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a

2014-08-29 Thread Shivaji Dutta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115334#comment-14115334
 ] 

Shivaji Dutta commented on YARN-2470:
-

I have put an Ambari issue for Validating the field - AMBARI-7082.

 A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager 
 to crash. Slider needs this value to be high. Setting a very high value 
 throws an exception and nodemanager does not start
 --

 Key: YARN-2470
 URL: https://issues.apache.org/jira/browse/YARN-2470
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Shivaji Dutta
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-08-29 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:


  Component/s: resourcemanager
 Target Version/s: 2.6.0
Affects Version/s: (was: 3.0.0)
   2.5.0
   2.4.1
 Assignee: Steve Loughran  (was: Robert Joseph Evans)

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: RegistrationServiceDetails.txt


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2472) yarn-daemons.sh should just call yarn directly

2014-08-29 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-2472:
--

 Summary: yarn-daemons.sh should just call yarn directly
 Key: YARN-2472
 URL: https://issues.apache.org/jira/browse/YARN-2472
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer


There is little-to-no need for it to go through yarn-daemon.sh anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set

2014-08-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115415#comment-14115415
 ] 

Zhijie Shen commented on YARN-2449:
---

+1, will commit the latter patch.

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 ---

 Key: YARN-2449
 URL: https://issues.apache.org/jira/browse/YARN-2449
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
 Environment: Deploy security enabled cluster is ATS also enabled and 
 running, but no hadoop.http.filter.initializers set in core-site.xml
Reporter: Karam Singh
Assignee: Varun Vasudev
Priority: Critical
 Attachments: apache-yarn-2449.0.patch, apache-yarn-2449.1.patch


 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 Looks in it is regression from YARN-2397
 After YARN-2397. when no hadoop.http.filter.initializers is set
 Now when try fetch DELEGATION token from timelineserver, it returns invalid 
 token
 Tried to fetch timeline delegation by using curl commands :
 {code}
 1. curl -i -k -s -b 'timeline-cookie.txt' 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
 Or
 2. curl -i -k -s --negotiate -u : 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
 {code}
 Return response is for both queries: 
 {code}
 {About:Timeline API}
 {code}
 Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
 TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
 First query returns DT and Second used to fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto

2014-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115427#comment-14115427
 ] 

Hudson commented on YARN-2406:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1880 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1880/])
YARN-2406. Move RM recovery related proto to 
yarn_server_resourcemanager_recovery.proto. Contributed by Tsuyoshi OZAWA 
(jianhe: rev 7b3e27ab7393214e35a575bc9093100e94dd8c89)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
Add CHANGES.txt for YARN-2406. (jianhe: rev 
9d68445710feff9fda9ee69847beeaf3e99b85ef)
* hadoop-yarn-project/CHANGES.txt


 Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
 

 Key: YARN-2406
 URL: https://issues.apache.org/jira/browse/YARN-2406
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2406.1.patch, YARN-2406.2.patch


 Today most recovery related proto records are defined in 
 yarn_server_resourcemanager_service_protos.proto which is inside YARN-API 
 module. Since these records are internally used by RM only, we can move them 
 to the yarn_server_resourcemanager_recovery.proto file inside RM-server module



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock

2014-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115429#comment-14115429
 ] 

Hudson commented on YARN-2405:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1880 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1880/])
YARN-2405. NPE in FairSchedulerAppsBlock. (Tsuyoshi Ozawa via kasha) (kasha: 
rev fa80ca49bdd741823ff012ddbd7a0f1aecf26195)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java


 NPE in FairSchedulerAppsBlock
 -

 Key: YARN-2405
 URL: https://issues.apache.org/jira/browse/YARN-2405
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, 
 YARN-2405.4.patch


 FairSchedulerAppsBlock#render throws NPE at this line
 {code}
   int fairShare = fsinfo.getAppFairShare(attemptId);
 {code}
 This causes the scheduler page now showing the app since it lack the 
 definition of appsTableData
 {code}
  Uncaught ReferenceError: appsTableData is not defined 
 {code}
 The problem is temporary meaning that it is usually resolved by itself either 
 after a retry or after a few hours.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-08-29 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115447#comment-14115447
 ] 

Carlo Curino commented on YARN-1707:


[~jianhe] that is expected. As I was saying in one of the [early comments | 
https://issues.apache.org/jira/browse/YARN-1707?focusedCommentId=14075076page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14075076]
 we are cutting YARN-1051 into several smaller patches for ease of reviewing, 
but we are trying to make each patch work as standalone (too many dependencies, 
and a bit of a waste of time, 
as they will not be valuable independently). So the fact that doesn't compile 
is expected. We mark them as patch available to signal they are ready to be 
reviewed.   

[~wangda]: we have implemented the getDisplayName alternative I mentioned 
above, and we are in the process of testing it. We will post an updated patch 
soon (again not a stand-alone one necessarily).

Thanks again to both of you for quick rounds of review and insightful comments.


 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, 
 YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2473) YARN never cleans up container directories from a full disk

2014-08-29 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-2473:


 Summary: YARN never cleans up container directories from a full 
disk
 Key: YARN-2473
 URL: https://issues.apache.org/jira/browse/YARN-2473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Priority: Blocker


After YARN-1781 when a container ends up filling a local disk the nodemanager 
will mark it as a bad disk and remove it from the list of good local dirs.  
When the container eventually completes the files that filled the disk will not 
be removed because the NM thinks the directory is bad.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-08-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115534#comment-14115534
 ] 

Jason Lowe commented on YARN-1781:
--

We've run into situations where this new behavior results in disks that end up 
being filled by containers remain full and never recover.  See YARN-2473.

YARN-90 won't help much in this case because the files that filled the disk 
won't be deleted.  Prior to this change the disks would auto-recover when the 
container completed, so this is a significant regression.

 NM should allow users to specify max disk utilization for local disks
 -

 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.4.0

 Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, 
 apache-yarn-1781.2.patch, apache-yarn-1781.3.patch, apache-yarn-1781.4.patch


 This is related to YARN-257(it's probably a sub task?). Currently, the NM 
 does not detect full disks and allows full disks to be used by containers 
 leading to repeated failures. YARN-257 deals with graceful handling of full 
 disks. This ticket is only about detection of full disks by the disk health 
 checkers.
 The NM should allow users to set a maximum disk utilization for local disks 
 and mark disks as bad once they exceed that utilization. At the very least, 
 the NM should at least detect full disks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-08-29 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1506:
--

Attachment: YARN-1506-v17.patch

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, 
 YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, 
 YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, 
 YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, 
 YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, 
 YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2473) YARN never cleans up container directories from a full disk

2014-08-29 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115540#comment-14115540
 ] 

Varun Vasudev commented on YARN-2473:
-

[~jlowe] are you going to work on this? I can take it up if it's fine by you.

 YARN never cleans up container directories from a full disk
 ---

 Key: YARN-2473
 URL: https://issues.apache.org/jira/browse/YARN-2473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Priority: Blocker

 After YARN-1781 when a container ends up filling a local disk the nodemanager 
 will mark it as a bad disk and remove it from the list of good local dirs.  
 When the container eventually completes the files that filled the disk will 
 not be removed because the NM thinks the directory is bad.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-08-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115543#comment-14115543
 ] 

Jian He commented on YARN-1506:
---

bq.  In AdminService: we may updateNodeResource only if node resource changes?
I think this may not be accurate, as the previous update event maybe still in 
transit. Updated the patch myself with this change reverted.

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, 
 YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, 
 YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, 
 YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, 
 YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, 
 YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2473) YARN never cleans up container directories from a full disk

2014-08-29 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115556#comment-14115556
 ] 

Varun Vasudev commented on YARN-2473:
-

My apologies for missing this when I put up the patch for YARN-1781

 YARN never cleans up container directories from a full disk
 ---

 Key: YARN-2473
 URL: https://issues.apache.org/jira/browse/YARN-2473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Priority: Blocker

 After YARN-1781 when a container ends up filling a local disk the nodemanager 
 will mark it as a bad disk and remove it from the list of good local dirs.  
 When the container eventually completes the files that filled the disk will 
 not be removed because the NM thinks the directory is bad.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2473) YARN never cleans up container directories from a full disk

2014-08-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115580#comment-14115580
 ] 

Jason Lowe commented on YARN-2473:
--

No worries, Varun, we all missed it. ;-)

We may need to track full disks separately from bad disks so we can know 
whether or not it's OK to try to delete a container directory from a particular 
disk that isn't a known good disk.  I'm hesitant to have the NM try to remove 
container directories even from bad disks since touching them can cause a 
very long pause for the thread that did it.

 YARN never cleans up container directories from a full disk
 ---

 Key: YARN-2473
 URL: https://issues.apache.org/jira/browse/YARN-2473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Priority: Blocker

 After YARN-1781 when a container ends up filling a local disk the nodemanager 
 will mark it as a bad disk and remove it from the list of good local dirs.  
 When the container eventually completes the files that filled the disk will 
 not be removed because the NM thinks the directory is bad.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2473) YARN never cleans up container directories from a full disk

2014-08-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-2473:


Assignee: Varun Vasudev

 YARN never cleans up container directories from a full disk
 ---

 Key: YARN-2473
 URL: https://issues.apache.org/jira/browse/YARN-2473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Vasudev
Priority: Blocker

 After YARN-1781 when a container ends up filling a local disk the nodemanager 
 will mark it as a bad disk and remove it from the list of good local dirs.  
 When the container eventually completes the files that filled the disk will 
 not be removed because the NM thinks the directory is bad.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a

2014-08-29 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115581#comment-14115581
 ] 

Chris Douglas commented on YARN-2470:
-

Failing to start is the correct behavior; that timeout is not valid. Is your 
intent to disable cleanup entirely?

 A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager 
 to crash. Slider needs this value to be high. Setting a very high value 
 throws an exception and nodemanager does not start
 --

 Key: YARN-2470
 URL: https://issues.apache.org/jira/browse/YARN-2470
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Shivaji Dutta
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115586#comment-14115586
 ] 

Hadoop QA commented on YARN-2459:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665124/YARN-2459.3.patch
  against trunk revision 4bd0194.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4771//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4771//console

This message is automatically generated.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2450) Fix typos in log messages

2014-08-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115606#comment-14115606
 ] 

Hitesh Shah commented on YARN-2450:
---

+1. Committing shortly. 

 Fix typos in log messages
 -

 Key: YARN-2450
 URL: https://issues.apache.org/jira/browse/YARN-2450
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie
 Attachments: YARN-2450-01.patch


 There are a bunch of typos in log messages.  HADOOP-10946 was initially 
 created, but may have failed due to being in multiple components.  Try fixing 
 typos on a per-component basis.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2447) RM web services app submission doesn't pass secrets correctly

2014-08-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115630#comment-14115630
 ] 

Jian He commented on YARN-2447:
---

looks good. committing

 RM web services app submission doesn't pass secrets correctly
 -

 Key: YARN-2447
 URL: https://issues.apache.org/jira/browse/YARN-2447
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2447.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115637#comment-14115637
 ] 

Hadoop QA commented on YARN-1506:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665373/YARN-1506-v17.patch
  against trunk revision 4bd0194.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4772//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4772//console

This message is automatically generated.

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, 
 YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, 
 YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, 
 YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, 
 YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, 
 YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2447) RM web services app submission doesn't pass secrets correctly

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115663#comment-14115663
 ] 

Hadoop QA commented on YARN-2447:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12664088/apache-yarn-2447.0.patch
  against trunk revision 4bd0194.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4773//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4773//console

This message is automatically generated.

 RM web services app submission doesn't pass secrets correctly
 -

 Key: YARN-2447
 URL: https://issues.apache.org/jira/browse/YARN-2447
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2447.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2450) Fix typos in log messages

2014-08-29 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115688#comment-14115688
 ] 

Ray Chiang commented on YARN-2450:
--

Great.  Thanks!

 Fix typos in log messages
 -

 Key: YARN-2450
 URL: https://issues.apache.org/jira/browse/YARN-2450
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2450-01.patch


 There are a bunch of typos in log messages.  HADOOP-10946 was initially 
 created, but may have failed due to being in multiple components.  Try fixing 
 typos on a per-component basis.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-29 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2395:
--

Attachment: YARN-2395-4.patch

Update a new patch to address the backward compatible.

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, 
 YARN-2395-3.patch, YARN-2395-4.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2462) TestNodeManagerResync#testBlockNewContainerRequestsOnStartAndResync should have a test timeout

2014-08-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115736#comment-14115736
 ] 

Jason Lowe commented on YARN-2462:
--

+1 lgtm.  Committing this.

 TestNodeManagerResync#testBlockNewContainerRequestsOnStartAndResync should 
 have a test timeout
 --

 Key: YARN-2462
 URL: https://issues.apache.org/jira/browse/YARN-2462
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Eric Payne
  Labels: newbie
 Attachments: YARN-2462.201408281422.txt, YARN-2462.201408281427.txt


 TestNodeManagerResync#testBlockNewContainerRequestsOnStartAndResync can hang 
 indefinitely and should have a test timeout.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-08-29 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---

Attachment: YARN-2198.3.patch

This 3.patch addresses the code review feedback. It also adds the separate 
etc/hadoop/wsce-site.xml  configuration for winutils (the location and file 
name is configured from hadoop-common's pom.xml). While at it I fixed 
winutils/libwinutils to use 'target/winutils' as intermediate build path and I 
removed the hardcoded '../../../target/bin' output path and such and use 
instead msbuild params passed from pom.xml.

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.separation.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2474) document the wsce-site.xml keys in hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm

2014-08-29 Thread Remus Rusanu (JIRA)
Remus Rusanu created YARN-2474:
--

 Summary: document the wsce-site.xml keys in 
hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
 Key: YARN-2474
 URL: https://issues.apache.org/jira/browse/YARN-2474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Critical


document the keys used to configure WSCE 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115797#comment-14115797
 ] 

Hadoop QA commented on YARN-2395:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665397/YARN-2395-4.patch
  against trunk revision b1dce2a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4774//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4774//console

This message is automatically generated.

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, 
 YARN-2395-3.patch, YARN-2395-4.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem

2014-08-29 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1709:
---

Attachment: YARN-1709.patch

Updating the patch as result of API changes based on [~vinodkv] [feedback 
|https://issues.apache.org/jira/browse/YARN-1708?focusedCommentId=14112669] on 
YARN-1708.

 Admission Control: Reservation subsystem
 

 Key: YARN-1709
 URL: https://issues.apache.org/jira/browse/YARN-1709
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Subramaniam Krishnan
 Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch


 This JIRA is about the key data structure used to track resources over time 
 to enable YARN-1051. The Reservation subsystem is conceptually a plan of 
 how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-29 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2360:
---

Attachment: yarn-2360-6.patch

Patch looks good to me. Uploading a patch with minor language changes. 
[~wei.yan] - does that look okay to you? 

 Fair Scheduler : Display dynamic fair share for queues on the scheduler page
 

 Key: YARN-2360
 URL: https://issues.apache.org/jira/browse/YARN-2360
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
 Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
 YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, 
 yarn-2360-6.patch


 Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
 share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115826#comment-14115826
 ] 

Karthik Kambatla commented on YARN-2395:


Good catch, Ashwin. I missed the backward incompatibility issue. 

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, 
 YARN-2395-3.patch, YARN-2395-4.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-29 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115830#comment-14115830
 ] 

Wei Yan commented on YARN-2360:
---

Thanks, Karthik. LGTM.

 Fair Scheduler : Display dynamic fair share for queues on the scheduler page
 

 Key: YARN-2360
 URL: https://issues.apache.org/jira/browse/YARN-2360
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
 Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
 YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, 
 yarn-2360-6.patch


 Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
 share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-08-29 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-2080:
---

Attachment: YARN-2080.patch

Uploading a new patch that adds a scheduler agnostic AbstractReservationSystem 
which is extended by the CapacityReservationSystem scheduler configuration as 
suggested by [~kasha]. CapacityReservationSystem essentially just loads configs 
from capacity scheduler xml. Attempted to converge this with Fair Scheduler as 
part of YARN-2386 but figured that it was not feasible.

It has also minor changes as a result of API changes based on [~vinodkv]  
[feedback | 
https://issues.apache.org/jira/browse/YARN-1708?focusedCommentId=14112669] on 
YARN-1708.

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan
 Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115851#comment-14115851
 ] 

Vinod Kumar Vavilapalli commented on YARN-2459:
---

Can we please add two more tests for future proofing this?
 - Add one in TestRMRestart to get an app rejected and make sure that the 
final-status gets recorded
 - Another one in RMStateStoreTestBase to ensure it is okay to have an 
updateApp call without a storeApp call like in this case.

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115854#comment-14115854
 ] 

Karthik Kambatla commented on YARN-2395:


Thanks for quickly updating the patch, Wei. The patch looks mostly good, a 
couple of minor comments (sorry, I should have done a more thorough review 
earlier): 
# Instead of calling updatePreemptionTimeouts() in FairScheduler multiple 
times, we should probably call it in QueueManager#updateAllocationConfiguration 
once where we call recomputeSteadyShares().
# Can we augment the test (or add a new one) to verify we are not breaking 
backward compatibility with the preemptionTimeout defaults? 

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, 
 YARN-2395-3.patch, YARN-2395-4.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-08-29 Thread Subramaniam Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115860#comment-14115860
 ] 

Subramaniam Krishnan commented on YARN-2080:


Typo in previous comment. Read it as:

Uploading a new patch that adds a scheduler agnostic AbstractReservationSystem 
which is extended by the CapacityReservationSystem for capacity scheduler as 
suggested by [~kasha]. CapacityReservationSystem essentially just loads configs 
from capacity scheduler xml. Attempted to converge this with Fair Scheduler as 
part of YARN-2386 but figured that it was not feasible.

It has also minor changes as a result of API changes based on [~vinodkv] 
[feedback | 
https://issues.apache.org/jira/browse/YARN-1708?focusedCommentId=14112669] on 
YARN-1708.


 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan
 Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue

2014-08-29 Thread Subramaniam Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115863#comment-14115863
 ] 

Subramaniam Krishnan commented on YARN-2385:


Thanks [~sunilg] for verifying. I am fine either ways, i.e. if you want to take 
up the splitting now or later as currently we have ensured that the behavior of 
CS  FS are consistent for _getAppsInQueue_. [~leftnoteasy],  [~zjshen] what do 
you guys feel?

 Consider splitting getAppsinQueue to getRunningAppsInQueue + 
 getPendingAppsInQueue
 --

 Key: YARN-2385
 URL: https://issues.apache.org/jira/browse/YARN-2385
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Krishnan
  Labels: abstractyarnscheduler

 Currently getAppsinQueue returns both pending  running apps. The purpose of 
 the JIRA is to explore splitting it to getRunningAppsInQueue + 
 getPendingAppsInQueue that will provide more flexibility to callers



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-29 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2395:
--

Attachment: YARN-2395-5.patch

Update a patch to address Karthik's comments.
bq.Can we augment the test (or add a new one) to verify we are not breaking 
backward compatibility with the preemptionTimeout defaults?
I have already added testcase both in TestAllocationFileLoaderService and 
TestFairScheduler.
TestAllocationFileLoaderService.testBackwardsCompatibleAllocationFileParsing().
{code}
// Set fair share preemption timeout to 5 minutes
out.println(fairSharePreemptionTimeout300/fairSharePreemptionTimeout);
out.println(/allocations);
{code}
TestFairScheduler.testBackwardsCompatiblePreemptionConfiguration().
{code}

out.print(defaultMinSharePreemptionTimeout15/defaultMinSharePreemptionTimeout);

out.print(defaultFairSharePreemptionTimeout25/defaultFairSharePreemptionTimeout);
out.print(fairSharePreemptionTimeout30/fairSharePreemptionTimeout);
out.println(/allocations);
{code}

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, 
 YARN-2395-3.patch, YARN-2395-4.patch, YARN-2395-5.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115914#comment-14115914
 ] 

Hadoop QA commented on YARN-2360:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665421/yarn-2360-6.patch
  against trunk revision b03653f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4776//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4776//console

This message is automatically generated.

 Fair Scheduler : Display dynamic fair share for queues on the scheduler page
 

 Key: YARN-2360
 URL: https://issues.apache.org/jira/browse/YARN-2360
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
 Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
 YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, 
 yarn-2360-6.patch


 Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
 share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1709) Admission Control: Reservation subsystem

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115913#comment-14115913
 ] 

Hadoop QA commented on YARN-1709:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665419/YARN-1709.patch
  against trunk revision b03653f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4775//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4775//console

This message is automatically generated.

 Admission Control: Reservation subsystem
 

 Key: YARN-1709
 URL: https://issues.apache.org/jira/browse/YARN-1709
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Subramaniam Krishnan
 Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch


 This JIRA is about the key data structure used to track resources over time 
 to enable YARN-1051. The Reservation subsystem is conceptually a plan of 
 how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115968#comment-14115968
 ] 

Hadoop QA commented on YARN-2080:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665427/YARN-2080.patch
  against trunk revision c60da4d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl
  
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4777//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4777//console

This message is automatically generated.

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan
 Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115969#comment-14115969
 ] 

Hadoop QA commented on YARN-2395:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665433/YARN-2395-5.patch
  against trunk revision c60da4d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4778//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4778//console

This message is automatically generated.

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, 
 YARN-2395-3.patch, YARN-2395-4.patch, YARN-2395-5.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-08-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115999#comment-14115999
 ] 

Jian He commented on YARN-1707:
---

Hi Carlo, thanks for your work !  I looked at the patch, some comments and 
questions:
- to simplify, we can use getNumApplications() method
{code}
disposableLeafQueue.getApplications().size()  0
|| disposableLeafQueue.pendingApplications.size()  0
{code}
- PlanQueue.java 80 column limit
- why “newQueue.changeCapacity(sesConf.getCapacity());” is inside the check and 
“queue.setMaxCapacity(sesConf.getMaxCapacity());” is outside the check
- CapacityScheduler#getReservationQueueNames seems getting the child 
reservation queues of the given plan queue. We can use the 
planQueue#childQueues directly
- DynamicQueueConf, how about calling it QueueEntitlement to be consistent ?
- CapacityScheduler#parseQueue method, I think we can simplify the condition 
for isReservableQueue flag something like this:
 {code}
boolean isReservableQueue = conf.isReservableQueue(fullQueueName);
if (isReservableQueue) {
  ParentQueue parentQueue = 
  new PlanQueue(csContext, queueName, parent,
  oldQueues.get(queueName));
  queue = hook.hook(parentQueue);
} else if ((childQueueNames == null || childQueueNames.length == 0))
{code}
- just to simplify, this log msg may be put after previous “qiter.remove();” to 
avoid the removed boolean flag.
{code}
if (LOG.isDebugEnabled()) {
  LOG.debug(updateChildQueues (action: remove queue):  + removed +  
  + getChildQueuesToPrint());
}
{code}
- we can add a new reinitialize in ReservationQueue which does all these 
initializations. 
{code}
  CSQueueUtils.updateQueueStatistics(
  schedulerContext.getResourceCalculator(), ses, this,
  schedulerContext.getClusterResource(),
  schedulerContext.getMinimumResourceCapability());
  ses.reinitialize(ses, clusterResource);
  ((ReservationQueue) ses).setMaxApplications(this
  .getMaxApplicationsForReservations());
  ((ReservationQueue) ses).setMaxApplicationsPerUser(this
  .getMaxApplicationsPerUserForReservation());
{code}
- IIUC, right now, queueName here is for the planQueue(inherits parentQueue), 
and the reservationID is for the reservationQueue(inherits from leafQueue). I 
think if we can get the proper reservationQueueName(leafQueue) upfront and pass 
it as the queueName parameter into this method, we can avoid some if/else 
condition changes inside this method and the method signature. 
{code}
  private synchronized void addApplication(ApplicationId applicationId,
String queueName, String user, boolean isAppRecovering,
ReservationId reservationID)
{code}

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, 
 YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-29 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2395:
--

Attachment: YARN-2395-5.patch

All tests passed locally. Just re-trigger the jenkins.

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, 
 YARN-2395-3.patch, YARN-2395-4.patch, YARN-2395-5.patch, YARN-2395-5.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-08-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116013#comment-14116013
 ] 

Karthik Kambatla commented on YARN-2360:


The test failure should be unrelated, it passes locally. 

+1. 

 Fair Scheduler : Display dynamic fair share for queues on the scheduler page
 

 Key: YARN-2360
 URL: https://issues.apache.org/jira/browse/YARN-2360
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
 Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
 YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, 
 yarn-2360-6.patch


 Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
 share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic

2014-08-29 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-1707:
---

Attachment: YARN-1707.5.patch

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, 
 YARN-1707.5.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-08-29 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116029#comment-14116029
 ] 

Carlo Curino commented on YARN-1707:


[~jianhe] Thanks for the feedback... The version I just posted contains the 
getDisplayName implementation, but does not address your last comments yet. We 
will get to those next.

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, 
 YARN-1707.5.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2475) ReservationSystem: replan upon capacity reduction

2014-08-29 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-2475:
--

 Summary: ReservationSystem: replan upon capacity reduction
 Key: YARN-2475
 URL: https://issues.apache.org/jira/browse/YARN-2475
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino


In the context of YARN-1051, if capacity of the cluster drops significantly 
upon machine failures we need to trigger a reorganization of the planned 
reservations. As reservations are absolute it is possible that they will not 
all fit, and some need to be rejected a-posteriori.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2475) ReservationSystem: replan upon capacity reduction

2014-08-29 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116037#comment-14116037
 ] 

Carlo Curino commented on YARN-2475:


The first version of this is a simple greedy policy, that walk the plan, and 
for every instant in time that violate the new capacity, it removes reservation 
in reverse acceptance order (i.e., the reservation accepted last is the first 
to be rejected, thus protected older reservations).

 ReservationSystem: replan upon capacity reduction
 -

 Key: YARN-2475
 URL: https://issues.apache.org/jira/browse/YARN-2475
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino

 In the context of YARN-1051, if capacity of the cluster drops significantly 
 upon machine failures we need to trigger a reorganization of the planned 
 reservations. As reservations are absolute it is possible that they will 
 not all fit, and some need to be rejected a-posteriori.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2475) ReservationSystem: replan upon capacity reduction

2014-08-29 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-2475:
---

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1051

 ReservationSystem: replan upon capacity reduction
 -

 Key: YARN-2475
 URL: https://issues.apache.org/jira/browse/YARN-2475
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino

 In the context of YARN-1051, if capacity of the cluster drops significantly 
 upon machine failures we need to trigger a reorganization of the planned 
 reservations. As reservations are absolute it is possible that they will 
 not all fit, and some need to be rejected a-posteriori.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116039#comment-14116039
 ] 

Hadoop QA commented on YARN-2395:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665475/YARN-2395-5.patch
  against trunk revision 9ad413b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4779//console

This message is automatically generated.

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, 
 YARN-2395-3.patch, YARN-2395-4.patch, YARN-2395-5.patch, YARN-2395-5.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116041#comment-14116041
 ] 

Hadoop QA commented on YARN-1707:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665477/YARN-1707.5.patch
  against trunk revision 9ad413b.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4780//console

This message is automatically generated.

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, 
 YARN-1707.5.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2475) ReservationSystem: replan upon capacity reduction

2014-08-29 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-2475:
---

Attachment: YARN-2475.patch

 ReservationSystem: replan upon capacity reduction
 -

 Key: YARN-2475
 URL: https://issues.apache.org/jira/browse/YARN-2475
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: YARN-2475.patch


 In the context of YARN-1051, if capacity of the cluster drops significantly 
 upon machine failures we need to trigger a reorganization of the planned 
 reservations. As reservations are absolute it is possible that they will 
 not all fit, and some need to be rejected a-posteriori.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1710) Admission Control: agents to allocate reservation

2014-08-29 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-1710:
---

Attachment: YARN-1710.1.patch

 Admission Control: agents to allocate reservation
 -

 Key: YARN-1710
 URL: https://issues.apache.org/jira/browse/YARN-1710
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: YARN-1710.1.patch, YARN-1710.patch


 This JIRA tracks the algorithms used to allocate a user ReservationRequest 
 coming in from the new reservation API (YARN-1708), in the inventory 
 subsystem (YARN-1709) maintaining the current plan for the cluster. The focus 
 of this agents is to quickly find a solution for the set of contraints 
 provided by the user, and the physical constraints of the plan.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1712) Admission Control: plan follower

2014-08-29 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-1712:
---

Attachment: YARN-1712.1.patch

 Admission Control: plan follower
 

 Key: YARN-1712
 URL: https://issues.apache.org/jira/browse/YARN-1712
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations, scheduler
 Attachments: YARN-1712.1.patch, YARN-1712.patch


 This JIRA tracks a thread that continuously propagates the current state of 
 an inventory subsystem to the scheduler. As the inventory subsystem store the 
 plan of how the resources should be subdivided, the work we propose in this 
 JIRA realizes such plan by dynamically instructing the CapacityScheduler to 
 add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-08-29 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-1711:
---

Attachment: YARN-1711.1.patch

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1712) Admission Control: plan follower

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116068#comment-14116068
 ] 

Hadoop QA commented on YARN-1712:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665487/YARN-1712.1.patch
  against trunk revision 9ad413b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4782//console

This message is automatically generated.

 Admission Control: plan follower
 

 Key: YARN-1712
 URL: https://issues.apache.org/jira/browse/YARN-1712
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations, scheduler
 Attachments: YARN-1712.1.patch, YARN-1712.patch


 This JIRA tracks a thread that continuously propagates the current state of 
 an inventory subsystem to the scheduler. As the inventory subsystem store the 
 plan of how the resources should be subdivided, the work we propose in this 
 JIRA realizes such plan by dynamically instructing the CapacityScheduler to 
 add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1710) Admission Control: agents to allocate reservation

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116069#comment-14116069
 ] 

Hadoop QA commented on YARN-1710:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665487/YARN-1712.1.patch
  against trunk revision 9ad413b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4781//console

This message is automatically generated.

 Admission Control: agents to allocate reservation
 -

 Key: YARN-1710
 URL: https://issues.apache.org/jira/browse/YARN-1710
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: YARN-1710.1.patch, YARN-1710.patch


 This JIRA tracks the algorithms used to allocate a user ReservationRequest 
 coming in from the new reservation API (YARN-1708), in the inventory 
 subsystem (YARN-1709) maintaining the current plan for the cluster. The focus 
 of this agents is to quickly find a solution for the set of contraints 
 provided by the user, and the physical constraints of the plan.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116079#comment-14116079
 ] 

Hadoop QA commented on YARN-1711:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665488/YARN-1711.1.patch
  against trunk revision 9ad413b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4783//console

This message is automatically generated.

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2360) Fair Scheduler: Display dynamic fair share for queues on the scheduler page

2014-08-29 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2360:
---

Summary: Fair Scheduler: Display dynamic fair share for queues on the 
scheduler page  (was: Fair Scheduler : Display dynamic fair share for queues on 
the scheduler page)

 Fair Scheduler: Display dynamic fair share for queues on the scheduler page
 ---

 Key: YARN-2360
 URL: https://issues.apache.org/jira/browse/YARN-2360
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
 Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
 YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, 
 yarn-2360-6.patch


 Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
 share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-08-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116184#comment-14116184
 ] 

Wangda Tan commented on YARN-1707:
--

Carlo, thanks updating the patch. In addition to Jian's comment, I think the 
changes for displayQueueName looks good to me.
I don't have further comments about this patch for now.

Thanks,
Wangda

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, 
 YARN-1707.5.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2476) Apps are scheduled in random order after RM failover

2014-08-29 Thread Santosh Marella (JIRA)
Santosh Marella created YARN-2476:
-

 Summary: Apps are scheduled in random order after RM failover
 Key: YARN-2476
 URL: https://issues.apache.org/jira/browse/YARN-2476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
 Environment: Linux
Reporter: Santosh Marella


RM HA is configured with 2 RMs. Used FileSystemRMStateStore.

Fairscheduler allocation file is configured in yarn-site.xml:
property
  nameyarn.scheduler.fair.allocation.file/name
  value/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop/allocation-pools.xml/value
/property

FS allocation-pools.xml:
?xml version=1.0?
allocations
   queue name=dev
  minResources1 mb,10vcores/minResources
  maxResources19000 mb,100vcores/maxResources
  maxRunningApps5525/maxRunningApps
  weight4.5/weight
  schedulingPolicyfair/schedulingPolicy
  fairSharePreemptionTimeout3600/fairSharePreemptionTimeout
   /queue
   queue name=default
  minResources1 mb,10vcores/minResources
  maxResources19000 mb,100vcores/maxResources
  maxRunningApps5525/maxRunningApps
  weight1.5/weight
  schedulingPolicyfair/schedulingPolicy
  fairSharePreemptionTimeout3600/fairSharePreemptionTimeout
   /queue
defaultMinSharePreemptionTimeout600/defaultMinSharePreemptionTimeout
fairSharePreemptionTimeout600/fairSharePreemptionTimeout
/allocations


Submitted 10 sleep jobs to a FS queue using the command:
hadoop jar hadoop-mapreduce-examples-2.4.1-mapr-4.0.1-SNAPSHOT.jar sleep
-Dmapreduce.job.queuename=root.dev  -m 10 -r 10 -mt 1 -rt 1

All the jobs were submitted by the same user, with the same priority and to 
the
same queue. No other jobs were running in the cluster. Jobs started 
executing
in the order in which they were submitted (jobs 6 to 10 were active, while 
11
to 15 were waiting):
root@perfnode131:/opt/mapr/hadoop/hadoop-2.4.1/logs# yarn application -list
Total number of applications (application-types: [] and states: 
[SUBMITTED,ACCEPTED, RUNNING]):10
Application-Id  Application-NameApplication-Type User   
Queue   State Final-State Progress  
  Tracking-URL
application_1408572781346_0012 Sleep job   
MAPREDUCE userAroot.devACCEPTED   UNDEFINED 
0% N/A
application_1408572781346_0014 Sleep job   
MAPREDUCE userAroot.devACCEPTED   UNDEFINED 
0% N/A
application_1408572781346_0011 Sleep job   
MAPREDUCE userAroot.devACCEPTED   UNDEFINED 
0% N/A
application_1408572781346_0010 Sleep job   
MAPREDUCE userAroot.dev RUNNING   UNDEFINED 
5% http://perfnode132:52799
application_1408572781346_0008 Sleep job   
MAPREDUCE userAroot.dev RUNNING   UNDEFINED 
5% http://perfnode131:33766
application_1408572781346_0009 Sleep job   
MAPREDUCE userAroot.dev RUNNING   UNDEFINED 
5% http://perfnode132:50964
application_1408572781346_0007 Sleep job   
MAPREDUCE userAroot.dev RUNNING   UNDEFINED 
5% http://perfnode134:52966
application_1408572781346_0015 Sleep job   
MAPREDUCE userAroot.devACCEPTED   UNDEFINED 
0% N/A
application_1408572781346_0006 Sleep job   
MAPREDUCE userAroot.dev RUNNING   UNDEFINED 
9.5% http://perfnode134:34094
application_1408572781346_0013 Sleep job   
MAPREDUCE userAroot.devACCEPTED   UNDEFINED 
0%  N/A


Stopped RM1. There was a failover and RM2 became active. But the jobs seem 
to
have started in a different order:
root@perfnode131:~/scratch/raw_rm_logs_fs_hang# yarn application -list
14/08/21 07:26:13 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
over to rm2
Total number of applications (application-types: [] and states: 
[SUBMITTED,ACCEPTED, RUNNING]):10
Application-Id  Application-NameApplication-Type User   
Queue   State Final-State Progress  
  Tracking-URL
application_1408572781346_0012 Sleep job   
MAPREDUCE userAroot.dev RUNNING   UNDEFINED 
5%http://perfnode134:59351
application_1408572781346_0014 Sleep job   
MAPREDUCE userAroot.dev RUNNING 

[jira] [Updated] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class

2014-08-29 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2404:
-

Attachment: YARN-2404.1.patch

Attached a first patch. 

 Remove ApplicationAttemptState and ApplicationState class in RMStateStore 
 class 
 

 Key: YARN-2404
 URL: https://issues.apache.org/jira/browse/YARN-2404
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2404.1.patch


 We can remove ApplicationState and ApplicationAttemptState class in 
 RMStateStore, given that we already have ApplicationStateData and 
 ApplicationAttemptStateData records. we may just replace ApplicationState 
 with ApplicationStateData, similarly for ApplicationAttemptState.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2394) FairScheduler: Configure fairSharePreemptionThreshold per queue

2014-08-29 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2394:
---

Summary: FairScheduler: Configure fairSharePreemptionThreshold per queue  
(was: Fair Scheduler : ability to configure fairSharePreemptionThreshold per 
queue)

 FairScheduler: Configure fairSharePreemptionThreshold per queue
 ---

 Key: YARN-2394
 URL: https://issues.apache.org/jira/browse/YARN-2394
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2394-1.patch, YARN-2394-2.patch


 Preemption based on fair share starvation happens when usage of a queue is 
 less than 50% of its fair share. This 50% is hardcoded. We'd like to make 
 this configurable on a per queue basis, so that we can choose the threshold 
 at which we want to preempt. Calling this config 
 fairSharePreemptionThreshold. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)