date:20140825


[ 
https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108787#comment-14108787
 ] 

Tsuyoshi OZAWA commented on YARN-2427:
--

Hi [~vvasudev], how about calling rm.stop in testGetAppQueue after testing? 

 Add support for moving apps between queues in RM web services
 -

 Key: YARN-2427
 URL: https://issues.apache.org/jira/browse/YARN-2427
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2427.0.patch


 Support for moving apps from one queue to another is now present in 
 CapacityScheduler and FairScheduler. We should expose the functionality via 
 RM web services as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2427) Add support for moving apps between queues in RM web services


 [ 
https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2427:


Attachment: apache-yarn-2427.1.patch

[~ozawa] thanks for the suggestion! I thought the tearDown method would handle 
it. I've uploaded a new patch with your suggestion. Hopefully, it'll fix the 
issue.

 Add support for moving apps between queues in RM web services
 -

 Key: YARN-2427
 URL: https://issues.apache.org/jira/browse/YARN-2427
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch


 Support for moving apps from one queue to another is now present in 
 CapacityScheduler and FairScheduler. We should expose the functionality via 
 RM web services as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register


 [ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2448:


Attachment: apache-yarn-2448.1.patch

I didn't do a clean which led to me missing an override in hadoop-tools. 
Uploaded new patch with fix. Thanks for [~leftnoteasy] for the help!

 RM should expose the name of the ResourceCalculator being used when AMs 
 register
 

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-2435) Capacity scheduler should only allow Kill Application Requests from ADMINISTER_QUEUE users

2014-08-25 Thread Amir Mal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Mal resolved YARN-2435.


Resolution: Invalid

I was missing the following setting in my yarn-site.xml:
yarn.acl.enable = true
yarn.admin.acl  = the default is '*' which allow everyone to be admin


 Capacity scheduler should only allow Kill Application Requests from 
 ADMINISTER_QUEUE users
 --

 Key: YARN-2435
 URL: https://issues.apache.org/jira/browse/YARN-2435
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.5.0, 2.4.1
 Environment: Red Hat Enterprise Linux Server release 6.4 (Santiago);  
 Linux 2.6.32-358.el6.x86_64 GNU/Linux; 
 $JAVA_HOME/bin/java -version
 java version 1.7.0_55
 OpenJDK Runtime Environment (rhel-2.4.7.1.el6_5-x86_64 u55-b13)
 OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
Reporter: Amir Mal

 A user without ADMINISTER_QUEUE privilege can kill application from all 
 queues.
 to replicate the bug:
 1) install cluster with {{yarn.resourcemanager.scheduler.class}} set to 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.*CapacityScheduler*
 2) created 2 users (user1, user2) each belong to a separate group (group1, 
 group2)
 3) set {{acl_submit_applications}} and {{acl_administer_queue}} of the 
 {{root}} and {{root.default}} queues to group1
 4) submit job to {{default}} queue by user1
 {quote}
 [user1@htc2n3 ~]$ mapred  queue -showacls
 ...
 Queue acls for user :  user1
 Queue  Operations
 =
 root  ADMINISTER_QUEUE,SUBMIT_APPLICATIONS
 default  ADMINISTER_QUEUE,SUBMIT_APPLICATIONS
 [user1@htc2n3 ~]$ yarn  jar 
 /opt/apache/hadoop-2.5.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar
  pi -Dmapreduce.job.queuename=default 4 10
 {quote}
 5) kill the application by user2
 {quote}
 [user2@htc2n4 ~]$ mapred  queue -showacls
 ...
 Queue acls for user :  user2
 Queue  Operations
 =
 root
 default
 [user2@htc2n4 ~]$ yarn application -kill application_1408540602935_0004
 ...
 Killing application application_1408540602935_0004
 14/08/21 14:37:54 INFO impl.YarnClientImpl: Killed application 
 application_1408540602935_0004
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register

2014-08-25 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108822#comment-14108822
 ] 

Wangda Tan commented on YARN-2448:
--

[~vvasudev],
Thanks for working on the patch, it is LGTM, +1

Wangda

 RM should expose the name of the ResourceCalculator being used when AMs 
 register
 

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services


[ 
https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108844#comment-14108844
 ] 

Hadoop QA commented on YARN-2427:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12664104/apache-yarn-2427.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4716//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4716//console

This message is automatically generated.

 Add support for moving apps between queues in RM web services
 -

 Key: YARN-2427
 URL: https://issues.apache.org/jira/browse/YARN-2427
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch


 Support for moving apps from one queue to another is now present in 
 CapacityScheduler and FairScheduler. We should expose the functionality via 
 RM web services as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register


[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108845#comment-14108845
 ] 

Hadoop QA commented on YARN-2448:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12664105/apache-yarn-2448.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.sls.TestSLSRunner
  org.apache.hadoop.yarn.sls.nodemanager.TestNMSimulator
  org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4717//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4717//console

This message is automatically generated.

 RM should expose the name of the ResourceCalculator being used when AMs 
 register
 

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2427) Add support for moving apps between queues in RM web services


 [ 
https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2427:


Attachment: apache-yarn-2427.2.patch

That fixed some of the tests. I found a similar missing rm.stop() in 
TestFifoScheduler that was probably leading to the failing TestSchedulerUtils. 
I'm unsure why the other test is failing.

 Add support for moving apps between queues in RM web services
 -

 Key: YARN-2427
 URL: https://issues.apache.org/jira/browse/YARN-2427
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, 
 apache-yarn-2427.2.patch


 Support for moving apps from one queue to another is now present in 
 CapacityScheduler and FairScheduler. We should expose the functionality via 
 RM web services as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set

Karam Singh created YARN-2449:
-

 Summary: Timelineserver returns invalid Delegation token in secure 
kerberos enabled cluster when hadoop.http.filter.initializers are not set
 Key: YARN-2449
 URL: https://issues.apache.org/jira/browse/YARN-2449
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
 Environment: Deploy security enabled cluster is ATS also enabled and 
running, but no hadoop.http.filter.initializers set in core-site.xml
Reporter: Karam Singh


Timelineserver returns invalid Delegation token in secure kerberos enabled 
cluster when hadoop.http.filter.initializers are not set
Looks in it is regression from YARN-2397
After YARN-2397. when no hadoop.http.filter.initializers is set

Now when try fetch DELEGATION token from ATS, it returns invalid token
Tried to fetch timeline delegation by using curl commands :
{code}
1. curl -i -k -s -b 
'/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 
'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
Or
2. curl -i -k -s --negotiate -u : 
'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
{code}
Return response is for both queries: 
{code}
{About:Timeline API}
{code}
Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
First query returns DT and Second used to fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set


 [ 
https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karam Singh updated YARN-2449:
--

Priority: Critical  (was: Major)

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 ---

 Key: YARN-2449
 URL: https://issues.apache.org/jira/browse/YARN-2449
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
 Environment: Deploy security enabled cluster is ATS also enabled and 
 running, but no hadoop.http.filter.initializers set in core-site.xml
Reporter: Karam Singh
Priority: Critical

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 Looks in it is regression from YARN-2397
 After YARN-2397. when no hadoop.http.filter.initializers is set
 Now when try fetch DELEGATION token from ATS, it returns invalid token
 Tried to fetch timeline delegation by using curl commands :
 {code}
 1. curl -i -k -s -b 
 '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
 Or
 2. curl -i -k -s --negotiate -u : 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
 {code}
 Return response is for both queries: 
 {code}
 {About:Timeline API}
 {code}
 Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
 TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
 First query returns DT and Second used to fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set


 [ 
https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karam Singh updated YARN-2449:
--

Description: 
Timelineserver returns invalid Delegation token in secure kerberos enabled 
cluster when hadoop.http.filter.initializers are not set
Looks in it is regression from YARN-2397
After YARN-2397. when no hadoop.http.filter.initializers is set

Now when try fetch DELEGATION token from timelineserver, it returns invalid 
token
Tried to fetch timeline delegation by using curl commands :
{code}
1. curl -i -k -s -b 
'/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 
'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
Or
2. curl -i -k -s --negotiate -u : 
'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
{code}
Return response is for both queries: 
{code}
{About:Timeline API}
{code}
Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
First query returns DT and Second used to fail

  was:
Timelineserver returns invalid Delegation token in secure kerberos enabled 
cluster when hadoop.http.filter.initializers are not set
Looks in it is regression from YARN-2397
After YARN-2397. when no hadoop.http.filter.initializers is set

Now when try fetch DELEGATION token from ATS, it returns invalid token
Tried to fetch timeline delegation by using curl commands :
{code}
1. curl -i -k -s -b 
'/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 
'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
Or
2. curl -i -k -s --negotiate -u : 
'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
{code}
Return response is for both queries: 
{code}
{About:Timeline API}
{code}
Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
First query returns DT and Second used to fail


 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 ---

 Key: YARN-2449
 URL: https://issues.apache.org/jira/browse/YARN-2449
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
 Environment: Deploy security enabled cluster is ATS also enabled and 
 running, but no hadoop.http.filter.initializers set in core-site.xml
Reporter: Karam Singh
Priority: Critical

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 Looks in it is regression from YARN-2397
 After YARN-2397. when no hadoop.http.filter.initializers is set
 Now when try fetch DELEGATION token from timelineserver, it returns invalid 
 token
 Tried to fetch timeline delegation by using curl commands :
 {code}
 1. curl -i -k -s -b 
 '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
 Or
 2. curl -i -k -s --negotiate -u : 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
 {code}
 Return response is for both queries: 
 {code}
 {About:Timeline API}
 {code}
 Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
 TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
 First query returns DT and Second used to fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set


[ 
https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108873#comment-14108873
 ] 

Karam Singh commented on YARN-2449:
---

Similarly If you run hadoop applications e.g. without settings 
hadoop.http.filter.initializers with timelineserver enabled
e.g :
{code}
hadoop jar 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.5.0.2.2.0.0-532.jar pi 10 
10
{code}

Application submission fails with following type of excpetion:
{code}
org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized field 
About (Class 
org.apache.hadoop.yarn.api.records.timeline.TimelineDelegationTokenResponse), 
not marked as ignorable
 at [Source: N/A; line: -1, column: -1] (through reference chain: 
org.apache.hadoop.yarn.api.records.timeline.TimelineDelegationTokenResponse[About])
{code}

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 ---

 Key: YARN-2449
 URL: https://issues.apache.org/jira/browse/YARN-2449
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
 Environment: Deploy security enabled cluster is ATS also enabled and 
 running, but no hadoop.http.filter.initializers set in core-site.xml
Reporter: Karam Singh
Priority: Critical

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 Looks in it is regression from YARN-2397
 After YARN-2397. when no hadoop.http.filter.initializers is set
 Now when try fetch DELEGATION token from ATS, it returns invalid token
 Tried to fetch timeline delegation by using curl commands :
 {code}
 1. curl -i -k -s -b 
 '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
 Or
 2. curl -i -k -s --negotiate -u : 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
 {code}
 Return response is for both queries: 
 {code}
 {About:Timeline API}
 {code}
 Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
 TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
 First query returns DT and Second used to fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set


 [ 
https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karam Singh updated YARN-2449:
--

Description: 
Timelineserver returns invalid Delegation token in secure kerberos enabled 
cluster when hadoop.http.filter.initializers are not set
Looks in it is regression from YARN-2397
After YARN-2397. when no hadoop.http.filter.initializers is set

Now when try fetch DELEGATION token from timelineserver, it returns invalid 
token
Tried to fetch timeline delegation by using curl commands :
{code}
1. curl -i -k -s -b 'timeline-cookie.txt' 
'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
Or
2. curl -i -k -s --negotiate -u : 
'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
{code}
Return response is for both queries: 
{code}
{About:Timeline API}
{code}
Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
First query returns DT and Second used to fail

  was:
Timelineserver returns invalid Delegation token in secure kerberos enabled 
cluster when hadoop.http.filter.initializers are not set
Looks in it is regression from YARN-2397
After YARN-2397. when no hadoop.http.filter.initializers is set

Now when try fetch DELEGATION token from timelineserver, it returns invalid 
token
Tried to fetch timeline delegation by using curl commands :
{code}
1. curl -i -k -s -b 
'/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 
'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
Or
2. curl -i -k -s --negotiate -u : 
'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
{code}
Return response is for both queries: 
{code}
{About:Timeline API}
{code}
Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
First query returns DT and Second used to fail


 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 ---

 Key: YARN-2449
 URL: https://issues.apache.org/jira/browse/YARN-2449
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
 Environment: Deploy security enabled cluster is ATS also enabled and 
 running, but no hadoop.http.filter.initializers set in core-site.xml
Reporter: Karam Singh
Assignee: Varun Vasudev
Priority: Critical

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 Looks in it is regression from YARN-2397
 After YARN-2397. when no hadoop.http.filter.initializers is set
 Now when try fetch DELEGATION token from timelineserver, it returns invalid 
 token
 Tried to fetch timeline delegation by using curl commands :
 {code}
 1. curl -i -k -s -b 'timeline-cookie.txt' 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
 Or
 2. curl -i -k -s --negotiate -u : 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
 {code}
 Return response is for both queries: 
 {code}
 {About:Timeline API}
 {code}
 Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
 TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
 First query returns DT and Second used to fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set


 [ 
https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev reassigned YARN-2449:
---

Assignee: Varun Vasudev

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 ---

 Key: YARN-2449
 URL: https://issues.apache.org/jira/browse/YARN-2449
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
 Environment: Deploy security enabled cluster is ATS also enabled and 
 running, but no hadoop.http.filter.initializers set in core-site.xml
Reporter: Karam Singh
Assignee: Varun Vasudev
Priority: Critical

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 Looks in it is regression from YARN-2397
 After YARN-2397. when no hadoop.http.filter.initializers is set
 Now when try fetch DELEGATION token from timelineserver, it returns invalid 
 token
 Tried to fetch timeline delegation by using curl commands :
 {code}
 1. curl -i -k -s -b 
 '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
 Or
 2. curl -i -k -s --negotiate -u : 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
 {code}
 Return response is for both queries: 
 {code}
 {About:Timeline API}
 {code}
 Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
 TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
 First query returns DT and Second used to fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services


[ 
https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108901#comment-14108901
 ] 

Hadoop QA commented on YARN-2427:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12664114/apache-yarn-2427.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4718//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4718//console

This message is automatically generated.

 Add support for moving apps between queues in RM web services
 -

 Key: YARN-2427
 URL: https://issues.apache.org/jira/browse/YARN-2427
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, 
 apache-yarn-2427.2.patch


 Support for moving apps from one queue to another is now present in 
 CapacityScheduler and FairScheduler. We should expose the functionality via 
 RM web services as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-08-25 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108932#comment-14108932
 ] 

Wangda Tan commented on YARN-1707:
--

Hi [~curino],
Thanks for updating, I just took a look, some minor comments,

1) CapacityScheduler#removeQueue
{code}
if (disposableLeafQueue.getCapacity()  0) {
  throw new SchedulerConfigEditException(The queue  + queueName
  +  has non-zero capacity:  + disposableLeafQueue.getCapacity());
}
{code}
removeQueue check disposableLeafQueue's capacity  0, but addQueue doesn't 
check. In addition, 
After previous check, ParentQueue#removeChildQueue/addChildQueue doesn't need 
check its capacity again.
And they should throw same type of exception (both SchedulerConfigEditException 
or both IllegalArgumentException)

2) CS#addQueue
{code}
  throw new SchedulerConfigEditException(Queue  + queue.getQueueName()
  +  is not a dynamic Queue);
{code}
Should dynamic Queue should be reservation queue comparing to similar 
exception throw in removeQueue?

3) CS#setEntitlement
{code}
  if (sesConf.getCapacity()  queue.getCapacity()) {
newQueue.addCapacity((sesConf.getCapacity() - queue.getCapacity()));
  } else {
newQueue
.subtractCapacity((queue.getCapacity() - sesConf.getCapacity()));
  }
{code}
Maybe it's better to merge the add/substractCapacity to changeCapacity(delta)
Or just create a setCapacity in ReservationQueue?

4) CS#getReservableQueues
Is it better to rename it to getPlanQueues?

5) ReservationQueue#getQueueName
{code}
  @Override
  public String getQueueName() {
return this.getParent().getQueueName();
  }
{code}
I'm not sure why doing this, could you please elaborate? This makes 
this.queueName and this.getQueueName has different semantic.

6) ReservationQueue#substractCapacity
{code}
this.setCapacity(this.getCapacity() - capacity);
{code}
With EPSILON, it is possible this.capacity  0 set substract, its better to cap 
this.capacity in range of [0,1]. Also addCapacity

7) DynamicQueueConf
I think unfold it to two float as parameter for setEntitlement maybe more 
straigtforward, is it possible more fields will be add to DynamicQueueConf?

8) ParentQueue#setChildQueues
Since only PlanQueue need sum of capacity = 1, I would suggest make this 
method protected, and PlanQueue can overwrite this method. Or add a check in 
ParentQueue#setChildQueues.

Wangda

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic


[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108948#comment-14108948
 ] 

Hadoop QA commented on YARN-1707:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663571/YARN-1707.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4719//console

This message is automatically generated.

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services


[ 
https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108975#comment-14108975
 ] 

Varun Vasudev commented on YARN-2427:
-

The TestAMRestart failure is unrelated.

 Add support for moving apps between queues in RM web services
 -

 Key: YARN-2427
 URL: https://issues.apache.org/jira/browse/YARN-2427
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, 
 apache-yarn-2427.2.patch


 Support for moving apps from one queue to another is now present in 
 CapacityScheduler and FairScheduler. We should expose the functionality via 
 RM web services as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set


 [ 
https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2449:


Attachment: apache-yarn-2449.0.patch

Uploaded patch with fix.

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 ---

 Key: YARN-2449
 URL: https://issues.apache.org/jira/browse/YARN-2449
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
 Environment: Deploy security enabled cluster is ATS also enabled and 
 running, but no hadoop.http.filter.initializers set in core-site.xml
Reporter: Karam Singh
Assignee: Varun Vasudev
Priority: Critical
 Attachments: apache-yarn-2449.0.patch


 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 Looks in it is regression from YARN-2397
 After YARN-2397. when no hadoop.http.filter.initializers is set
 Now when try fetch DELEGATION token from timelineserver, it returns invalid 
 token
 Tried to fetch timeline delegation by using curl commands :
 {code}
 1. curl -i -k -s -b 'timeline-cookie.txt' 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
 Or
 2. curl -i -k -s --negotiate -u : 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
 {code}
 Return response is for both queries: 
 {code}
 {About:Timeline API}
 {code}
 Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
 TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
 First query returns DT and Second used to fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-25 Thread Wangda Tan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108991#comment-14108991
]

Wangda Tan commented on YARN-1198:
--

Hi [~cwelch],
Thanks for updating, I went through your patch just now.

I think the current approach makes more sense to me comparing to patch#4, it
avoids iterating all apps when computing headroom. But currently,
CapacityHeadroomProvider#getHeadroom will recompute headroom for each
application heartbeat. Assume we have #application #user in a queue (the
most possible case), it's still a little costly.

I agree with the method which mentioned by Jason more: Specifically, we can
create a map of user, headroom for each queue, when we need update headroom,
we can update the all headroom in the map. And each SchedulerApplicationAttempt
will hold a reference to headroom. The headroom in the map maybe as same as
the {{HeadroomProvider}} in your patch. I would suggest to rename the
{{HeadroomProvider}} to {{HeadroomReference}}, because we don't need do any
computation in it anymore.

Another benefit is, we don't need write HeadroomProvider for each scheduler. A
simple HeadroomReference with getter/setter should be enough.

Two more things we should take care with previous method:
1) As mentioned by Jason, currently, fair/capacity scheduler all support moving
app between queues, we should recompute and change the reference after finished
moving app.
2) In LeafQueue#assignContainers, we don't need call
{code}
Resource userLimit =
computeUserLimitAndSetHeadroom(application, clusterResource,
required);
{code}
For each application, and in LeafQueue#updateClusterResource iterate and update
the map of user, headroom should be enough

Wangda

Capacity Scheduler headroom calculation does not work as expected
-

Key: YARN-1198
URL: https://issues.apache.org/jira/browse/YARN-1198
Project: Hadoop YARN
Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch,
YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch

Today headroom calculation (for the app) takes place only when
* New node is added/removed from the cluster
* New container is getting assigned to the application.
However there are potentially lot of situations which are not considered for
this calculation
* If a container finishes then headroom for that application will change and
should be notified to the AM accordingly.
* If a single user has submitted multiple applications (app1 and app2) to the
same queue then
** If app1's container finishes then not only app1's but also app2's AM
should be notified about the change in headroom.
** Similarly if a container is assigned to any applications app1/app2 then
both AM should be notified about their headroom.
** To simplify the whole communication process it is ideal to keep headroom
per User per LeafQueue so that everyone gets the same picture (apps belonging
to same user and submitted in same queue).
* If a new user submits an application to the queue then all applications
submitted by all users in that queue should be notified of the headroom
change.
* Also today headroom is an absolute number ( I think it should be normalized
but then this is going to be not backward compatible..)
* Also when admin user refreshes queue headroom has to be updated.
These all are the potential bugs in headroom calculations

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set


[ 
https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109005#comment-14109005
 ] 

Hadoop QA commented on YARN-2449:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12664132/apache-yarn-2449.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4720//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4720//console

This message is automatically generated.

 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 ---

 Key: YARN-2449
 URL: https://issues.apache.org/jira/browse/YARN-2449
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
 Environment: Deploy security enabled cluster is ATS also enabled and 
 running, but no hadoop.http.filter.initializers set in core-site.xml
Reporter: Karam Singh
Assignee: Varun Vasudev
Priority: Critical
 Attachments: apache-yarn-2449.0.patch


 Timelineserver returns invalid Delegation token in secure kerberos enabled 
 cluster when hadoop.http.filter.initializers are not set
 Looks in it is regression from YARN-2397
 After YARN-2397. when no hadoop.http.filter.initializers is set
 Now when try fetch DELEGATION token from timelineserver, it returns invalid 
 token
 Tried to fetch timeline delegation by using curl commands :
 {code}
 1. curl -i -k -s -b 'timeline-cookie.txt' 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa'
 Or
 2. curl -i -k -s --negotiate -u : 
 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user'
 {code}
 Return response is for both queries: 
 {code}
 {About:Timeline API}
 {code}
 Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = 
 TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer
 First query returns DT and Second used to fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

[
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Varun Vasudev updated YARN-160:
---

Attachment: apache-yarn-160.2.patch

Comments from [~jlowe] in YARN-2440 about this feature led to some more
changes. The latest patch introduces some new config variables
1. yarn.nodemanager.containers-cpu-cores - the number of cores to be used for
yarn containers. By default we use all cores.
2. yarn.nodemanager.containers-cpu-percentage - the percentage of overall cpu
to be used for yarn containers. By default we use all CPU.
3. yarn.nodemanager.pcores-vcores-multiplier - a multiplier to convert pcores
to vcores. By default it is 1. This can be used on clusters with heterogeneous
hardware to have more containers run on faster CPUs.
4. yarn.nodemanager.count-logical-processors-as-cores - flag to determine if
hperthreads should be counted as cores. By default it is true.

There's a some code between YARN-2440 and this patch. Depending on which one
gets committed first, I'll change the patch appropriately.

nodemanagers should obtain cpu/memory values from underlying OS
---

Key: YARN-160
URL: https://issues.apache.org/jira/browse/YARN-160
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
Fix For: 2.6.0

Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch,
apache-yarn-160.2.patch

As mentioned in YARN-2
*NM memory and CPU configs*
Currently these values are coming from the config of the NM, we should be
able to obtain those values from the OS (ie, in the case of Linux from
/proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have
an interface that obtains this information. In addition implementations of
this interface should be able to specify a mem/cpu offset (amount of mem/cpu
not to be avail as YARN resource), this would allow to reserve mem/cpu for
the OS and other services outside of YARN containers.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109075#comment-14109075
 ] 

Hadoop QA commented on YARN-160:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12664137/apache-yarn-160.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-gridmix hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4721//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4721//console

This message is automatically generated.

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, 
 apache-yarn-160.2.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2035) FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode

2014-08-25 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109340#comment-14109340
 ] 

Zhijie Shen commented on YARN-2035:
---

+1 for the latest patch. Will commit it.

 FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode
 ---

 Key: YARN-2035
 URL: https://issues.apache.org/jira/browse/YARN-2035
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2035-v2.patch, YARN-2035-v3.patch, YARN-2035.patch


 Small bug that prevents ResourceManager and ApplicationHistoryService from 
 coming up while Namenode is in safemode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2450) Fix typos in log messages


 [ 
https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang reassigned YARN-2450:


Assignee: Ray Chiang

 Fix typos in log messages
 -

 Key: YARN-2450
 URL: https://issues.apache.org/jira/browse/YARN-2450
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie

 There are a bunch of typos in log messages.  HADOOP-10946 was initially 
 created, but may have failed due to being in multiple components.  Try fixing 
 typos on a per-component basis.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2450) Fix typos in log messages

Ray Chiang created YARN-2450:


 Summary: Fix typos in log messages
 Key: YARN-2450
 URL: https://issues.apache.org/jira/browse/YARN-2450
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Priority: Trivial


There are a bunch of typos in log messages.  HADOOP-10946 was initially 
created, but may have failed due to being in multiple components.  Try fixing 
typos on a per-component basis.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2450) Fix typos in log messages


 [ 
https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2450:
-

Attachment: YARN-2450-01.patch

First attempt for YARN-only log fixes.

 Fix typos in log messages
 -

 Key: YARN-2450
 URL: https://issues.apache.org/jira/browse/YARN-2450
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie
 Attachments: YARN-2450-01.patch


 There are a bunch of typos in log messages.  HADOOP-10946 was initially 
 created, but may have failed due to being in multiple components.  Try fixing 
 typos on a per-component basis.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (YARN-2035) FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode

2014-08-25 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109340#comment-14109340
 ] 

Zhijie Shen edited comment on YARN-2035 at 8/25/14 6:01 PM:


+1 for the latest patch. Hold on commit until figure out the proper way to 
commit via git.


was (Author: zjshen):
+1 for the latest patch. Will commit it.

 FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode
 ---

 Key: YARN-2035
 URL: https://issues.apache.org/jira/browse/YARN-2035
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2035-v2.patch, YARN-2035-v3.patch, YARN-2035.patch


 Small bug that prevents ResourceManager and ApplicationHistoryService from 
 coming up while Namenode is in safemode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU

2014-08-25 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109442#comment-14109442
 ] 

Wei Yan commented on YARN-810:
--

[~vvasudev], for the cfs_quota_us and cfs_period_us settings problem, as we 
need to get the number of physical cores used by YARN, I'll update a patch here 
once your YARN-2440 committed.

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini
Assignee: Sandy Ryza
 Attachments: YARN-810.patch, YARN-810.patch


 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you can see that CFS is in use in the CGroup, based on the file names:
 {noformat}
 [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
 total 0
 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
 drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
 -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
 -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
 -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
 10
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
 -1
 {noformat}
 Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
 We can place processes in hard limits. I have process 4370 running YARN 
 container container_1371141151815_0003_01_03 on a host. By default, it's 
 running at ~300% cpu usage.
 {noformat}
 CPU
 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
 {noformat}
 When I set the CFS quote:
 {noformat}
 echo 1000  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
  CPU
 4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
 {noformat}
 It drops to 1% usage, and you can see the box has room to spare:
 {noformat}
 Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
 0.0%st
 {noformat}
 Turning the quota back to -1:
 {noformat}
 echo -1  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 {noformat}
 Burns the cores again:
 {noformat}
 Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
 0.0%st
 CPU
 4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
 {noformat}
 On my dev box, I was testing CGroups by running a python process eight times, 
 to burn through all the cores, since it was doing as described above (giving 
 extra CPU to the process, even with

[jira] [Commented] (YARN-2377) Localization exception stack traces are not passed as diagnostic info


[ 
https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109479#comment-14109479
 ] 

Karthik Kambatla commented on YARN-2377:


+1 to improving the debuggability here. 

Can we re-use StringUtils.stringifyException, preferably in 
ResourceLocalizationService? 


 Localization exception stack traces are not passed as diagnostic info
 -

 Key: YARN-2377
 URL: https://issues.apache.org/jira/browse/YARN-2377
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-2377.v01.patch


 In the Localizer log one can only see this kind of message
 {code}
 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
 hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
  1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos 
 tException: ha-nn-uri-0
 {code}
 And then only {{ java.net.UnknownHostException: ha-nn-uri-0}} message is 
 propagated as diagnostics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2450) Fix typos in log messages


[ 
https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109488#comment-14109488
 ] 

Hadoop QA commented on YARN-2450:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664177/YARN-2450-01.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4722//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4722//console

This message is automatically generated.

 Fix typos in log messages
 -

 Key: YARN-2450
 URL: https://issues.apache.org/jira/browse/YARN-2450
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie
 Attachments: YARN-2450-01.patch


 There are a bunch of typos in log messages.  HADOOP-10946 was initially 
 created, but may have failed due to being in multiple components.  Try fixing 
 typos on a per-component basis.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1326) RM should log using RMStore at startup time


[ 
https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109490#comment-14109490
 ] 

Karthik Kambatla commented on YARN-1326:


Minor nit that I can fix at commit time: Change 
{code}
this.rmStateStoreName = rm.getRMContext().getStateStore().getClass()
.getName();
{code}
to 
{code}
this.rmStateStoreName = 
rm.getRMContext().getStateStore().getClass().getName();
{code}

Otherwise, +1. Will commit this when the repo becomes writable. 

 RM should log using RMStore at startup time
 ---

 Key: YARN-1326
 URL: https://issues.apache.org/jira/browse/YARN-1326
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.5.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1326.1.patch, YARN-1326.2.patch, YARN-1326.3.patch, 
 YARN-1326.4.patch, demo.png

   Original Estimate: 3h
  Remaining Estimate: 3h

 Currently there are no way to know which RMStore RM uses. It's useful to log 
 the information at RM's startup time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2450) Fix typos in log messages


[ 
https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109504#comment-14109504
 ] 

Ray Chiang commented on YARN-2450:
--

Changes restricted to log messages only.  Will not write tests specific to log 
messages.

 Fix typos in log messages
 -

 Key: YARN-2450
 URL: https://issues.apache.org/jira/browse/YARN-2450
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie
 Attachments: YARN-2450-01.patch


 There are a bunch of typos in log messages.  HADOOP-10946 was initially 
 created, but may have failed due to being in multiple components.  Try fixing 
 typos on a per-component basis.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2451) Delete .orig files

Karthik Kambatla created YARN-2451:
--

 Summary: Delete .orig files
 Key: YARN-2451
 URL: https://issues.apache.org/jira/browse/YARN-2451
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


Looks like we checked in a few .orig files. We should delete them.
{noformat}
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java.orig
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java.orig
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java.orig
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java.orig
{noformat} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2450) Fix typos in log messages

2014-08-25 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109524#comment-14109524
 ] 

Akira AJISAKA commented on YARN-2450:
-

Thanks [~rchiang] for splitting the patch. LGTM, +1 (non-binding).

 Fix typos in log messages
 -

 Key: YARN-2450
 URL: https://issues.apache.org/jira/browse/YARN-2450
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie
 Attachments: YARN-2450-01.patch


 There are a bunch of typos in log messages.  HADOOP-10946 was initially 
 created, but may have failed due to being in multiple components.  Try fixing 
 typos on a per-component basis.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register


[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109577#comment-14109577
 ] 

Karthik Kambatla commented on YARN-2448:


I am not sure I understand the usecase very well. The AM's requirements 
shouldn't change based on what the RM does internally. 

Shouldn't the application ask for all the resources that YARN supports? It is 
upto the scheduler (queue, user, app type etc.) to decide on what resources it 
would consider for scheduling. If the app doesn't specify any resources at all 
for a type, we can assume zero for that type (e.g. in clusters not configured 
to use a particular type). 


 RM should expose the name of the ResourceCalculator being used when AMs 
 register
 

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register


[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109589#comment-14109589
 ] 

Varun Vasudev commented on YARN-2448:
-

The use case that springs to mind is adding support for cpu to map-reduce. 
Currently the map-reduce AM only looks at memory when it is deciding things 
like pre-empting reducers. If we wish to add support for cpu as a resource to 
map-reduce, it needs to consider vcores as well. However, if the YARN scheduler 
if using the DefaultResourceCalculator, which ignores cpu, and the map-reduce 
AM doesn't know this, it leads to inefficient asks and allocations. The aim is 
just to let the AM know which calculator is being used and let the AM go from 
there. Does that make sense?

 RM should expose the name of the ResourceCalculator being used when AMs 
 register
 

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register


[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109603#comment-14109603
 ] 

Karthik Kambatla commented on YARN-2448:


May be I am missing your point. Why not have the MR AM request both CPU and 
memory? If the RM/scheduler doesn't consider CPU, it will just ignore it. 

Related, but orthogonal point: In the case of FairScheduler, the policy depends 
on the queue the app is submitted to. So, some queues might consider only CPU, 
some only memory, and some both. So, exposing the ResourceCalculator doesn't 
really tell the AM anything, it has to look at the queue configuration. 

 RM should expose the name of the ResourceCalculator being used when AMs 
 register
 

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-25 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109738#comment-14109738
 ] 

Craig Welch commented on YARN-1198:
---

I initially considered an approach like this one, but did not go in that 
direction for a couple of reasons.  One is that, to avoid introducing a 
calculation during the heartbeat, you do end up iterating all the users in the 
queue with every headroom calculation.  While this may generally be less than 
iterating all of the applications in a queue it may still be fairly significant 
in some usage patterns, and in a worst case (different user for each 
application) it is exactly equivalent to what we are trying to avoid.  The 
other is the Resource required which is application specific and included in 
the userlimit calculation - the comments indicate this  

ensures that jobs in queues
//   with miniscule capacity ( 1 slot) make progress

- I notice that updateClusterResource just provides the .none for this value 
- so it is not being honored in all cases, but I'm concerned about breaking the 
case it is meant to handle by detaching it generally from the headroom 
calculation.  Handling this value as we do today requires an application 
specific calculation - hence placing it in the application path and handling it 
as I do in the .7 patch during heartbeat/using an application specific value.  
If we move to calculating it at the user level then we would have to choose one 
value for the required from one of the user's applications to avoid iterating 
them otherwise we are back to iterating all applications at each go.  In a 
practical sense that might be fine, unless different applications for the same 
user are passing significantly different values for required - I suppose we 
could use a max for that value, but then an unusually large value for 
required could be carried forward indefinitely (for as long as a user has 
active applications) - or we could just use the last one provided for that user 
and understand that it changes the results a bit, possibly in an undesired way.

Couple of other points:

-re we don't need write HeadroomProvider for each scheduler - we already 
don't need one - the base implementation I've provided maintains the current 
behavior for other schedulers, and it appears that other schedulers may not 
require the same treatment as they do not necessarily vary their headroom as 
dynamically/in the interrelated way that the capacity scheduler does - in any 
case, the pattern I'm introducing here can be reused by them - but they would, 
in any case, require their own logic to effect this kind of update if they 
require it.

-re As mentioned by Jason, currently, fair/capacity scheduler all support 
moving app between queues, we should recompute and change the reference after 
finished moving app 
I take this to properly be a task to take on when providing support for moving 
between queues - not having the location in code at present where this will 
happen prevents me from really addressing it, it's not part of the current 
effort, and in any case this change is not making that any more difficult (it 
may be making it easier... hard to be sure until we're ready to do it... but I 
am sure it is not making it more difficult - The first time an application 
calls computUserLimit... after it is moved it will automatically update to the 
proper configuration to provide headroom from then onward, with no other 
changes so far as I can see.  We could also effect this by simply setting the 
headroom provider during the move.)

Provider vs Reference - I went with a more general term as I'm not sure that in 
all cases it will be simple reference/will have no logic of it's own - Provider 
is a superset/more generic term :-)

-re the cost of the calculation - if you look through the code, it's factored 
such that everything is referring to local members of a relatively small object 
graph - basically, it's just doing a few member lookups and a little math (I 
know, you could say that about anything - but in this case, it really isn't 
very much) - no significant data structures have to be accessed and while it's 
hidden behind calls to Resources it really is just a bit of calculation...

That said, I can see benefits to avoiding some of the work being done in the 
heartbeat - the one hard limit is the impact to how the Resource required 
value is handled, possibly not a significant tradeoff.  I also had some 
concurrency concerns - by moving this out to the heartbeat we are accessing 
some shared Resource values concurrently which are not at present, and I ran 
into some concurrency issues with LeafQueue when making the change (all 
resolved, but caused some alarm/required some workaround) - there could be 
other latent concurrency issues there which will be corner cases, where if we 
have all calculation happening in the calculate... call in

[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-08-25 Thread Craig Welch (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109798#comment-14109798
]

Craig Welch commented on YARN-1857:
---

[~jianhe] [~wangda] could you have a look at this patch?

CapacityScheduler headroom doesn't account for other AM's running
-

Key: YARN-1857
URL: https://issues.apache.org/jira/browse/YARN-1857
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
Attachments: YARN-1857.1.patch, YARN-1857.patch, YARN-1857.patch,
YARN-1857.patch

Its possible to get an application to hang forever (or a long time) in a
cluster with multiple users. The reason why is that the headroom sent to the
application is based on the user limit but it doesn't account for other
Application masters using space in that queue. So the headroom (user limit -
user consumed) can be 0 even though the cluster is 100% full because the
other space is being used by application masters from other users.
For instance if you have a cluster with 1 queue, user limit is 100%, you have
multiple users submitting applications. One very large application by user 1
starts up, runs most of its maps and starts running reducers. other users try
to start applications and get their application masters started but not
tasks. The very large application then gets to the point where it has
consumed the rest of the cluster resources with all reduces. But at this
point it needs to still finish a few maps. The headroom being sent to this
application is only based on the user limit (which is 100% of the cluster
capacity) its using lets say 95% of the cluster for reduces and then other 5%
is being used by other users running application masters. The MRAppMaster
thinks it still has 5% so it doesn't know that it should kill a reduce in
order to run a map.
This can happen in other scenarios also. Generally in a large cluster with
multiple queues this shouldn't cause a hang forever but it could cause the
application to take much longer.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores

2014-08-25 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109801#comment-14109801
]

Jason Lowe commented on YARN-2440:
--

Thanks for updating the patch, Varun.

I don't see why we need both a containers-cpu-cores and
containers-cpu-percentage, and I think it leads to confusion when both exist.
At first I did not realize that one overrode the other. Instead I assumed that
if you set cpu-cores to X and cpu-percentage to Y then you were requesting Y%
of X cores. Then there's the additional question of whether container usage is
pinned to those cores, etc. Only having cpu-percentage is a simpler model that
still allows the user to specify cores indirectly (e.g.: 25% of an 8 core
system is 2 cores). Maybe I'm missing the use case where we really need
containers-cpu-cores and the confusing (to me at least) override behavior
between the two properties.

Other comments on the patch:

- I'm not thrilled about the name template containers-cpu-* since it could
easily be misinterpreted as a per-container thing as well, but I'm currently at
a loss for a better prefix. Suggestions welcome.
- Does getOverallLimits need to check for a quotaUS that's too low as well?
- I think minimally we need to log a warning if we're going to ignore setting
up cgroups to limit CPU usage across all containers if the user specified to do
so.
- Related to the previous comment, I think it would be nice if we didn't try to
setup any limits if none were specified. That way if there's some issue with
correctly determining the number of cores on a particular system it can still
work in the default, use everything scenario.
- NodeManagerHardwareUtils.getContainerCores should be getContainersCores (the
per-container vs. all-containers confusion again)

Cgroups should allow YARN containers to be limited to allocated cores
-

Key: YARN-2440
URL: https://issues.apache.org/jira/browse/YARN-2440
Project: Hadoop YARN
Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch,
apache-yarn-2440.2.patch, screenshot-current-implementation.jpg

The current cgroups implementation does not limit YARN containers to the
cores allocated in yarn-site.xml.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-08-25 Thread Chen He (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chen He updated YARN-1857:
--

Target Version/s: 2.6.0 (was: 2.4.1)

CapacityScheduler headroom doesn't account for other AM's running
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1442) change yarn minicluster base directory via system property

2014-08-25 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109849#comment-14109849
 ] 

Ken Krugler commented on YARN-1442:
---

I'm curious why we wouldn't use the existing yarn.nodemanager.xxx conf settings 
for controlling where to put files. That's what I had originally done, and 
would seem like the most consistent approach.

Related, I was assuming dfs.data.dir would control where to put HDFS blocks, 
but instead there's an undocumented MiniDFSCluster.HDFS_MINIDFS_BASEDIR 
property...why?

 change yarn minicluster base directory via system property
 --

 Key: YARN-1442
 URL: https://issues.apache.org/jira/browse/YARN-1442
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: André Kelpe
Priority: Minor
 Attachments: HADOOP-10122.patch


 The yarn minicluster used for testing uses the target directory by default. 
 We use gradle for building our projects and we would like to see it using a 
 different directory. This patch makes it possible to use a different 
 directory by setting the yarn.minicluster.directory system property.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores

2014-08-25 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109856#comment-14109856
 ] 

Sangjin Lee commented on YARN-2440:
---

It might be good to use several fairly representative scenarios and see how we 
can satisfy them with clear configuration.

One scenario I can see pretty common is this (just for illustration):
- 8-core system
- want to use only 6 cores for containers (reserving 2 for NM and DN, etc.)
- want to allocate 1/2 core per container by default

IMO, the simplest config is
{panel}
yarn.nodemanager.resource.cpu-vcores = 60
yarn.nodemanager.containers-cores-to-vcores = 10
each container asks 5 vcores
{panel}

Or I could have
{panel}
yarn.nodemanager.resource.cpu-vcores = 60
yarn.nodemanager.containers-cpu-cores = 6 (core-to-vcore ratio understood as 
the ratio of these two)
each container asks 5 vcores
{panel}

I'm not sure how I can use containers-cpu-percentage to describe this 
scenario...

Does this help? Are there other types of use cases that we should review this 
with?

 Cgroups should allow YARN containers to be limited to allocated cores
 -

 Key: YARN-2440
 URL: https://issues.apache.org/jira/browse/YARN-2440
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
 apache-yarn-2440.2.patch, screenshot-current-implementation.jpg


 The current cgroups implementation does not limit YARN containers to the 
 cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2182) Update ContainerId#toString() to avoid conflicts before and after RM restart

2014-08-25 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109943#comment-14109943
 ] 

Jian He commented on YARN-2182:
---

looks good, +1

 Update ContainerId#toString() to avoid conflicts before and after RM restart
 

 Key: YARN-2182
 URL: https://issues.apache.org/jira/browse/YARN-2182
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2182.1.patch, YARN-2182.2.patch


 ContainerId#toString() doesn't include any information about current cluster 
 id. This leads conflict between container ids. We can avoid the conflicts 
 without breaking backward compatibility by using epoch introduced on 
 YARN-2052.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue


[ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110007#comment-14110007
 ] 

Karthik Kambatla commented on YARN-2395:


Comments on the latest patch:
# Typo - should be If and not In
{code}
  // Fair share preemption timeout for each queue in seconds. In a job in the
{code}
# Documentation typos - s/will inherits/will inherit in two places.
# In TestAllocationFileLoaderService, can we make some changes to minShare as 
well and verify them. 

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-25 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2395:
--

Attachment: YARN-2395-3.patch

Update a new patch to address Karthik's comments.

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue

2014-08-25 Thread Ashwin Shankar (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110059#comment-14110059
 ] 

Ashwin Shankar commented on YARN-2395:
--

[~ywskycn], I'll post my comments soon. But quick comment on skimming through 
the patch - 
I see you have NOT made FairScheduler#isStarvedForMinShare and 
isStarvedForFairShare recursive. Which means starvation at
parent queues would not be detected and preemption at parent will not happen. 
Am I missing something ?

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler

zhihai xu created YARN-2452:
---

 Summary: TestRMApplicationHistoryWriter is failed for FairScheduler
 Key: YARN-2452
 URL: https://issues.apache.org/jira/browse/YARN-2452
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu


TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is the 
following:
T E S T S
---
Running 
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter)
  Time elapsed: 66.261 sec   FAILURE!
java.lang.AssertionError: expected:1 but was:200
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430)
at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391)





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

zhihai xu created YARN-2453:
---

 Summary: TestProportionalCapacityPreemptionPolicy is failed for 
FairScheduler
 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu


TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
The following is error message:
Running 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
  Time elapsed: 1.61 sec   FAILURE!
java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
check what happened
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)

This test should only work for capacity scheduler because the following source 
code in ResourceManager.java prove it will only work for capacity scheduler.
{code}
if (scheduler instanceof PreemptableResourceScheduler
   conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
  YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
{code}

Because CapacityScheduler is instance of PreemptableResourceScheduler and 
FairScheduler is not  instance of PreemptableResourceScheduler.
I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-25 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--

Attachment: YARN-1198.8.patch

Patch based on my last comment which iterates/calculates headroom at the user 
level - which is (I believe) favored by [~jlowe] and [~wangda] (I'm comfortable 
with it, too...)

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
 YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, 
 YARN-1198.8.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class

2014-08-25 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2404:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-128

 Remove ApplicationAttemptState and ApplicationState class in RMStateStore 
 class 
 

 Key: YARN-2404
 URL: https://issues.apache.org/jira/browse/YARN-2404
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He

 We can remove ApplicationState and ApplicationAttemptState class in 
 RMStateStore, given that we already have ApplicationStateData and 
 ApplicationAttemptStateData records. we may just replace ApplicationState 
 with ApplicationStateData, similarly for ApplicationAttemptState.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue


[ 
https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110191#comment-14110191
 ] 

Hadoop QA commented on YARN-2395:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664263/YARN-2395-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4724//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4724//console

This message is automatically generated.

 FairScheduler: Preemption timeout should be configurable per queue
 --

 Key: YARN-2395
 URL: https://issues.apache.org/jira/browse/YARN-2395
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch


 Currently in fair scheduler, the preemption logic considers fair share 
 starvation only at leaf queue level. This jira is created to implement it at 
 the parent queue as well.
 It involves :
 1. Making check for fair share starvation and amount of resource to 
 preempt  recursive such that they traverse the queue hierarchy from root to 
 leaf.
 2. Currently fairSharePreemptionTimeout is a global config. We could make it 
 configurable on a per queue basis,so that we can specify different timeouts 
 for parent queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2056) Disable preemption at Queue level


[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110204#comment-14110204
 ] 

Hadoop QA commented on YARN-2056:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12664283/YARN-2056.201408260128.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4723//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4723//console

This message is automatically generated.

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2377) Localization exception stack traces are not passed as diagnostic info

2014-08-25 Thread Gera Shegalov (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110221#comment-14110221
 ] 

Gera Shegalov commented on YARN-2377:
-

Hi [~kasha], I considered {{StringUtils#stringifyException}} but discarded it 
due to the following disadvantages: 
# redundant deserialization of the exception object just for the sake of 
serializing it right away
# as a consequence, hypothetically,  when localization service runs as a 
separate process with a dedicated classpath, we can encounter a 
{{ClassNotFoundException}} during deserialization

 Localization exception stack traces are not passed as diagnostic info
 -

 Key: YARN-2377
 URL: https://issues.apache.org/jira/browse/YARN-2377
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-2377.v01.patch


 In the Localizer log one can only see this kind of message
 {code}
 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
 hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
  1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos 
 tException: ha-nn-uri-0
 {code}
 And then only {{ java.net.UnknownHostException: ha-nn-uri-0}} message is 
 propagated as diagnostics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores

2014-08-25 Thread Beckham007 (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110227#comment-14110227
 ] 

Beckham007 commented on YARN-2440:
--

Hi, all
Why not use the cpuset subsystem of cgroups? 
The cpuset could make container to run on allocated cores, and reserving some 
cores for system.


 Cgroups should allow YARN containers to be limited to allocated cores
 -

 Key: YARN-2440
 URL: https://issues.apache.org/jira/browse/YARN-2440
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
 apache-yarn-2440.2.patch, screenshot-current-implementation.jpg


 The current cgroups implementation does not limit YARN containers to the 
 cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto


 [ 
https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2406:
-

Attachment: YARN-2406.1.patch

 Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
 

 Key: YARN-2406
 URL: https://issues.apache.org/jira/browse/YARN-2406
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2406.1.patch


 Today most recovery related proto records are defined in 
 yarn_server_resourcemanager_service_protos.proto which is inside YARN-API 
 module. Since these records are internally used by RM only, we can move them 
 to the yarn_server_resourcemanager_recovery.proto file inside RM-server module



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected


[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110230#comment-14110230
 ] 

Hadoop QA commented on YARN-1198:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664285/YARN-1198.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4726//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4726//console

This message is automatically generated.

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
 YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, 
 YARN-1198.8.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class


 [ 
https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA reassigned YARN-2404:


Assignee: Tsuyoshi OZAWA

 Remove ApplicationAttemptState and ApplicationState class in RMStateStore 
 class 
 

 Key: YARN-2404
 URL: https://issues.apache.org/jira/browse/YARN-2404
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA

 We can remove ApplicationState and ApplicationAttemptState class in 
 RMStateStore, given that we already have ApplicationStateData and 
 ApplicationAttemptStateData records. we may just replace ApplicationState 
 with ApplicationStateData, similarly for ApplicationAttemptState.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2453:


Attachment: YARN-2453.000.patch

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto


[ 
https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110262#comment-14110262
 ] 

Hadoop QA commented on YARN-2406:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664302/YARN-2406.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4727//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4727//console

This message is automatically generated.

 Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
 

 Key: YARN-2406
 URL: https://issues.apache.org/jira/browse/YARN-2406
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2406.1.patch


 Today most recovery related proto records are defined in 
 yarn_server_resourcemanager_service_protos.proto which is inside YARN-API 
 module. Since these records are internally used by RM only, we can move them 
 to the yarn_server_resourcemanager_recovery.proto file inside RM-server module



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2452:


Attachment: YARN-2452.000.patch

 TestRMApplicationHistoryWriter is failed for FairScheduler
 --

 Key: YARN-2452
 URL: https://issues.apache.org/jira/browse/YARN-2452
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2452.000.patch


 TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is 
 the following:
 T E S T S
 ---
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter)
   Time elapsed: 66.261 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:200
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto


[ 
https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110269#comment-14110269
 ] 

Tsuyoshi OZAWA commented on YARN-2406:
--

The test failure looks not related to a patch. [~jianhe], could you take a look?

 Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
 

 Key: YARN-2406
 URL: https://issues.apache.org/jira/browse/YARN-2406
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2406.1.patch


 Today most recovery related proto records are defined in 
 yarn_server_resourcemanager_service_protos.proto which is inside YARN-API 
 module. Since these records are internally used by RM only, we can move them 
 to the yarn_server_resourcemanager_recovery.proto file inside RM-server module



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110272#comment-14110272
 ] 

zhihai xu commented on YARN-2453:
-

I uploaded a patch YARN-2453.000.patch for review.
This patch is to skip the test testPolicyInitializeAfterSchedulerInitialized 
for FairScheduler.

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110276#comment-14110276
 ] 

zhihai xu commented on YARN-2452:
-

I uploaded a patch YARN-2452.000.patch for review.
This patch is to enable assignmultiple, so FairScheduler can assign multiple 
containers on each Node HeartBeat otherwise by default FairScheduler can only 
assign one container on each Node HeartBeat.

 TestRMApplicationHistoryWriter is failed for FairScheduler
 --

 Key: YARN-2452
 URL: https://issues.apache.org/jira/browse/YARN-2452
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2452.000.patch


 TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is 
 the following:
 T E S T S
 ---
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter)
   Time elapsed: 66.261 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:200
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110283#comment-14110283
 ] 

Hadoop QA commented on YARN-2453:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664307/YARN-2453.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4728//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4728//console

This message is automatically generated.

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register


[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110319#comment-14110319
 ] 

Hadoop QA commented on YARN-2448:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12664105/apache-yarn-2448.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-tools/hadoop-sls 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4729//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4729//console

This message is automatically generated.

 RM should expose the name of the ResourceCalculator being used when AMs 
 register
 

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler