[jira] [Commented] (YARN-2397) RM web interface sometimes returns request is a replay error in secure mode

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090368#comment-14090368
 ] 

Hadoop QA commented on YARN-2397:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12660551/apache-yarn-2397.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4559//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4559//console

This message is automatically generated.

 RM web interface sometimes returns request is a replay error in secure mode
 ---

 Key: YARN-2397
 URL: https://issues.apache.org/jira/browse/YARN-2397
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2397.0.patch


 The RM web interface sometimes returns a request is a replay error if the 
 default kerberos http filter is enabled. This is because it uses the new 
 RMAuthenticationFilter in addition to the AuthenticationFilter. There is a 
 workaround to set 
 yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. 
 This bug is to fix the code to use only the RMAuthenticationFilter and not 
 both.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090406#comment-14090406
 ] 

Hadoop QA commented on YARN-2249:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660542/YARN-2249.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4560//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4560//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4560//console

This message is automatically generated.

 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2014-08-08 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090420#comment-14090420
 ] 

Sandy Ryza commented on YARN-807:
-

bq. If you think it's a bug, we can resolve it in YARN-2385. 

bq. We may need to create a Mapqueue-name, app-id in RMContext.
It's also worth considering only holding this map for completed applications, 
so we don't need to keep two maps for running applications.

 When querying apps by queue, iterating over all apps is inefficient and 
 limiting 
 -

 Key: YARN-807
 URL: https://issues.apache.org/jira/browse/YARN-807
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-807-1.patch, YARN-807-2.patch, YARN-807-3.patch, 
 YARN-807-4.patch, YARN-807.patch


 The question which apps are in queue x can be asked via the RM REST APIs, 
 through the ClientRMService, and through the command line.  In all these 
 cases, the question is answered by scanning through every RMApp and filtering 
 by the app's queue name.
 All schedulers maintain a mapping of queues to applications.  I think it 
 would make more sense to ask the schedulers which applications are in a given 
 queue. This is what was done in MR1. This would also have the advantage of 
 allowing a parent queue to return all the applications on leaf queues under 
 it, and allow queue name aliases, as in the way that root.default and 
 default refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-08-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1372:


Attachment: YARN-1372.prelim2.patch

Second patch uploaded that adds expiration to the entries in NM
getContainersToCleanup is used to remove containers currently. Not sure how we 
can resuse it for acking the containers are notified to AM. Are you saying 
first time a containerId is in that list, its for removing it and the next time 
its used to ack the AM has received it?


 Ensure all completed containers are reported to the AMs across RM restart
 -

 Key: YARN-1372
 URL: https://issues.apache.org/jira/browse/YARN-1372
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1372.prelim.patch, YARN-1372.prelim2.patch


 Currently the NM informs the RM about completed containers and then removes 
 those containers from the RM notification list. The RM passes on that 
 completed container information to the AM and the AM pulls this data. If the 
 RM dies before the AM pulls this data then the AM may not be able to get this 
 information again. To fix this, NM should maintain a separate list of such 
 completed container notifications sent to the RM. After the AM has pulled the 
 containers from the RM then the RM will inform the NM about it and the NM can 
 remove the completed container from the new list. Upon re-register with the 
 RM (after RM restart) the NM should send the entire list of completed 
 containers to the RM along with any other containers that completed while the 
 RM was dead. This ensures that the RM can inform the AM's about all completed 
 containers. Some container completions may be reported more than once since 
 the AM may have pulled the container but the RM may die before notifying the 
 NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2014-08-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090440#comment-14090440
 ] 

Wangda Tan commented on YARN-807:
-

bq. It's also worth considering only holding this map for completed 
applications, so we don't need to keep two maps for running applications.

I suggest we can do this way:
1) Rename scheduler side getAppsInQueue to getRunningAppsInQueue
2) Create MapQueue-name, SetApp-ID in RMContext, it will contain 
completed/running apps. The benefit to store them separately is we don't need 
query two places while client want to get applications. And 
getRunningAppsInQueue in scheduler side will be used when we need query running 
apps in queue like YARN-2378.

Thanks,
Wangda

 When querying apps by queue, iterating over all apps is inefficient and 
 limiting 
 -

 Key: YARN-807
 URL: https://issues.apache.org/jira/browse/YARN-807
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-807-1.patch, YARN-807-2.patch, YARN-807-3.patch, 
 YARN-807-4.patch, YARN-807.patch


 The question which apps are in queue x can be asked via the RM REST APIs, 
 through the ClientRMService, and through the command line.  In all these 
 cases, the question is answered by scanning through every RMApp and filtering 
 by the app's queue name.
 All schedulers maintain a mapping of queues to applications.  I think it 
 would make more sense to ask the schedulers which applications are in a given 
 queue. This is what was done in MR1. This would also have the advantage of 
 allowing a parent queue to return all the applications on leaf queues under 
 it, and allow queue name aliases, as in the way that root.default and 
 default refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2397) RM web interface sometimes returns request is a replay error in secure mode

2014-08-08 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2397:


Attachment: apache-yarn-2397.1.patch

Patch to address [~zjshen] comments and fix the test case.

 RM web interface sometimes returns request is a replay error in secure mode
 ---

 Key: YARN-2397
 URL: https://issues.apache.org/jira/browse/YARN-2397
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch


 The RM web interface sometimes returns a request is a replay error if the 
 default kerberos http filter is enabled. This is because it uses the new 
 RMAuthenticationFilter in addition to the AuthenticationFilter. There is a 
 workaround to set 
 yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. 
 This bug is to fix the code to use only the RMAuthenticationFilter and not 
 both.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2014-08-08 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090447#comment-14090447
 ] 

Sandy Ryza commented on YARN-807:
-

I just remembered a couple reasons why it's important that we go through the 
scheduler:
* *Getting all the apps underneath a parent queue* - the scheduler holds queue 
hierarchy information that allows us to return applications in all leaf queues 
underneath a parent queue.
* *Alisases* - In the Fair Scheduler, default is shorthand for 
root.default, so querying on either of these names should return applications 
in that queue.

I'm open to approaches that don't require going through the scheduler, but I 
think we should make sure they keep supporting these capabilities.

 When querying apps by queue, iterating over all apps is inefficient and 
 limiting 
 -

 Key: YARN-807
 URL: https://issues.apache.org/jira/browse/YARN-807
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-807-1.patch, YARN-807-2.patch, YARN-807-3.patch, 
 YARN-807-4.patch, YARN-807.patch


 The question which apps are in queue x can be asked via the RM REST APIs, 
 through the ClientRMService, and through the command line.  In all these 
 cases, the question is answered by scanning through every RMApp and filtering 
 by the app's queue name.
 All schedulers maintain a mapping of queues to applications.  I think it 
 would make more sense to ask the schedulers which applications are in a given 
 queue. This is what was done in MR1. This would also have the advantage of 
 allowing a parent queue to return all the applications on leaf queues under 
 it, and allow queue name aliases, as in the way that root.default and 
 default refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2014-08-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090455#comment-14090455
 ] 

Wangda Tan commented on YARN-807:
-

Hi Sandy, 
Thanks for your elaboration. As you said, I agree we need to go through 
scheduler according to two capabilities you mentioned.
Maybe a possible way is saving completed app in leaf queue as you mentioned, I 
remember now YARN will evict some apps when total number of apps exceeds a 
configuration number (like 10,000). We should do such evicting for completed 
app in leaf queue as well.

 When querying apps by queue, iterating over all apps is inefficient and 
 limiting 
 -

 Key: YARN-807
 URL: https://issues.apache.org/jira/browse/YARN-807
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-807-1.patch, YARN-807-2.patch, YARN-807-3.patch, 
 YARN-807-4.patch, YARN-807.patch


 The question which apps are in queue x can be asked via the RM REST APIs, 
 through the ClientRMService, and through the command line.  In all these 
 cases, the question is answered by scanning through every RMApp and filtering 
 by the app's queue name.
 All schedulers maintain a mapping of queues to applications.  I think it 
 would make more sense to ask the schedulers which applications are in a given 
 queue. This is what was done in MR1. This would also have the advantage of 
 allowing a parent queue to return all the applications on leaf queues under 
 it, and allow queue name aliases, as in the way that root.default and 
 default refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2248) Capacity Scheduler changes for moving apps between queues

2014-08-08 Thread Janos Matyas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090476#comment-14090476
 ] 

Janos Matyas commented on YARN-2248:


Sounds good - let us know if we can help anyhow - we use this feature 
internally, so once you submit a patch we can check/test on our side as well. 

 Capacity Scheduler changes for moving apps between queues
 -

 Key: YARN-2248
 URL: https://issues.apache.org/jira/browse/YARN-2248
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Janos Matyas
Assignee: Janos Matyas
 Fix For: 2.6.0

 Attachments: YARN-2248-1.patch, YARN-2248-2.patch, YARN-2248-3.patch


 We would like to have the capability (same as the Fair Scheduler has) to move 
 applications between queues. 
 We have made a baseline implementation and tests to start with - and we would 
 like the community to review, come up with suggestions and finally have this 
 contributed. 
 The current implementation is available for 2.4.1 - so the first thing is 
 that we'd need to identify the target version as there are differences 
 between 2.4.* and 3.* interfaces.
 The story behind is available at 
 http://blog.sequenceiq.com/blog/2014/07/02/move-applications-between-queues/ 
 and the baseline implementation and test at:
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/ExtendedCapacityScheduler.java#L924
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/TestExtendedCapacitySchedulerAppMove.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090495#comment-14090495
 ] 

Varun Saxena commented on YARN-2138:


Thanks [~jianhe] for the review. I will make the necessary changes and upload a 
new patch.
Sure [~kkambatl], let me know if any further changes are required.

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-675) In YarnClient, pull AM logs on AM container failure

2014-08-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090501#comment-14090501
 ] 

Steve Loughran commented on YARN-675:
-

This is dangerous if the logs are more than a few gigabytes

 In YarnClient, pull AM logs on AM container failure
 ---

 Key: YARN-675
 URL: https://issues.apache.org/jira/browse/YARN-675
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Li Lu

 Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to 
 pull its logs from the NM to the client so that they can be displayed 
 immediately to the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2014-08-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090502#comment-14090502
 ] 

Steve Loughran commented on YARN-2392:
--

...there's no tests here as there is nothing to test except for visual review 
of the message.

A lot of existing tests do look for the Failed the application string at the 
end of the message, with that string hard coded into the test methods. Those 
should really be reworked to use a constant string, as otherwise they are very 
brittle. This patch leaves the relevant text alone to avoid breaking anything.

 add more diags about app retry limits on AM failures
 

 Key: YARN-2392
 URL: https://issues.apache.org/jira/browse/YARN-2392
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
 Attachments: YARN-2392-001.patch


 # when an app fails the failure count is shown, but not what the global + 
 local limits are. If the two are different, they should both be printed. 
 # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2397) RM web interface sometimes returns request is a replay error in secure mode

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090566#comment-14090566
 ] 

Hadoop QA commented on YARN-2397:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12660578/apache-yarn-2397.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4561//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4561//console

This message is automatically generated.

 RM web interface sometimes returns request is a replay error in secure mode
 ---

 Key: YARN-2397
 URL: https://issues.apache.org/jira/browse/YARN-2397
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch


 The RM web interface sometimes returns a request is a replay error if the 
 default kerberos http filter is enabled. This is because it uses the new 
 RMAuthenticationFilter in addition to the AuthenticationFilter. There is a 
 workaround to set 
 yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. 
 This bug is to fix the code to use only the RMAuthenticationFilter and not 
 both.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2138:
---

Attachment: YARN-2138.002.patch

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.002.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords

2014-08-08 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090659#comment-14090659
 ] 

Varun Vasudev commented on YARN-2373:
-

[~lmccay] thanks for the patch! Some general questions(since this is part of a 
larger effort) -
1. For the null case(where the WebAppUtils.getPassword() returns null), should 
we add a warning or an audit log that someone was trying to get a password that 
was null?
2. Will you update documentation in another ticket(just to let users know that 
they can use a CredentialProvider instead of using plain text)?

Other than that, it looks good to me.

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords

2014-08-08 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090662#comment-14090662
 ] 

Varun Vasudev commented on YARN-2373:
-

Missed one more question - are you taking care of changes to ssl-client.xml as 
well?

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090679#comment-14090679
 ] 

Hudson commented on YARN-2008:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #638 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/638/])
YARN-2008. Fixed CapacityScheduler to calculate headroom based on max available 
capacity instead of configured max capacity. Contributed by Craig Welch 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616580)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DefaultResourceCalculator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceCalculator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/Resources.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java


 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch, YARN-2008.5.patch, YARN-2008.6.patch, YARN-2008.7.patch, 
 YARN-2008.8.patch, YARN-2008.9.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2288) Data persistent in timelinestore should be versioned

2014-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090685#comment-14090685
 ] 

Hudson commented on YARN-2288:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #638 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/638/])
YARN-2288. Made persisted data in LevelDB timeline store be versioned. 
Contributed by Junping Du. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616540)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java


 Data persistent in timelinestore should be versioned
 

 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.6.0

 Attachments: YARN-2288-v2.patch, YARN-2288-v3.patch, 
 YARN-2288-v4.patch, YARN-2288-v5.patch, YARN-2288.patch


 We have LevelDB-backed TimelineStore, it should have schema version for 
 changes in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2302) Refactor TimelineWebServices

2014-08-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090707#comment-14090707
 ] 

Junping Du commented on YARN-2302:
--

Thanks for the patch, [~zjshen]! A couple of comments so far:

In ApplicationHistoryServer.java,

{code}
+  protected TimelineDataManager timelineDataManager;
{code}
Better to be private, as it only get consumed within ApplicationHistoryServer.

{code}
timelineACLsManager = createTimelineACLsManager(conf);
{code}
Looks like we don’t need timelineACLsManager anymore except initiating 
TimeLineDataManager. We can completely remove it (method and variable) after 
merge below
{code}
  protected TimelineACLsManager createTimelineACLsManager(Configuration conf) {
return new TimelineACLsManager(conf);
  }

  protected TimelineDataManager createTimelineDataManager(Configuration conf) {
return new TimelineDataManager(timelineStore, timelineACLsManager);
  }
{code}

to:

{code}
  private TimelineDataManager createTimelineDataManager(Configuration conf) {
return new TimelineDataManager(timelineStore, new 
TimelineACLsManager(conf));
  }
{code}
The visibility of method should be private, as it is not get assumed outside of 
class. There are also some similar unnecessary protected methods around in this 
class, see if you want to do update here also or we can do it separately later.

In TimelineDataManager.java,
{code}
+  try {
+if (existingEntity == null) {
+  injectOwnerInfo(entity, callerUGI.getShortUserName());
+}
+  } catch (YarnException e) {
+// Skip the entity which messes up the primary filter and record the
+// error
+LOG.warn(Skip the timeline entity:  + entityID + , because 
++ e.getMessage());
{code}
This exception sounds more serious than just a warn, so Log.error here may make 
more sense?

 Refactor TimelineWebServices
 

 Key: YARN-2302
 URL: https://issues.apache.org/jira/browse/YARN-2302
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2302.1.patch


 Now TimelineWebServices contains non-trivial logic to process the HTTP 
 requests, manipulate the data, check the access, and interact with the 
 timeline store.
 I propose the move the data-oriented logic to a middle layer (so called 
 TimelineDataManager), and TimelineWebServices only processes the requests, 
 and call TimelineDataManager to complete the remaining tasks.
 By doing this, we make the generic history module reuse TimelineDataManager 
 internally (YARN-2033), invoking the putting/getting methods directly. 
 Otherwise, we have to send the HTTP requests to TimelineWebServices to query 
 the generic history data, which is not an efficient way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2398) TestResourceTrackerOnHA crashes

2014-08-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090771#comment-14090771
 ] 

Jason Lowe commented on YARN-2398:
--

System.exit is being called from the test, which is known to make surefire 
upset and fail the build.  From the test output, it looks like a scheduler 
event is being dispatched and the test didn't setup a handler for it:

{noformat}
2014-08-08 13:48:28,867 INFO  [AsyncDispatcher event handler] rmnode.RMNodeImpl 
(RMNodeImpl.java:handle(387)) - localhost:0 Node Transitioned from NEW to 
RUNNING
2014-08-08 13:48:28,867 FATAL [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(179)) - Error in 
dispatcher thread
java.lang.Exception: No handler for registered for class 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:724)
2014-08-08 13:48:28,868 INFO  [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(184)) - Exiting, bbye..
{noformat}

 TestResourceTrackerOnHA crashes
 ---

 Key: YARN-2398
 URL: https://issues.apache.org/jira/browse/YARN-2398
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jason Lowe

 TestResourceTrackerOnHA is currently crashing and failing trunk builds.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance

2014-08-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090790#comment-14090790
 ] 

Karthik Kambatla commented on YARN-2352:


Both tests pass locally for me, and the failures seen here are unrelated to the 
patch. 

Committing this. 

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch, 
 yarn-2352-2.patch, yarn-2352-2.patch, yarn-2352-3.patch, yarn-2352-4.patch, 
 yarn-2352-5.patch, yarn-2352-6.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090821#comment-14090821
 ] 

Hudson commented on YARN-2008:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1831 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1831/])
YARN-2008. Fixed CapacityScheduler to calculate headroom based on max available 
capacity instead of configured max capacity. Contributed by Craig Welch 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616580)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DefaultResourceCalculator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceCalculator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/Resources.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java


 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch, YARN-2008.5.patch, YARN-2008.6.patch, YARN-2008.7.patch, 
 YARN-2008.8.patch, YARN-2008.9.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2288) Data persistent in timelinestore should be versioned

2014-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090827#comment-14090827
 ] 

Hudson commented on YARN-2288:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1831 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1831/])
YARN-2288. Made persisted data in LevelDB timeline store be versioned. 
Contributed by Junping Du. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616540)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java


 Data persistent in timelinestore should be versioned
 

 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.6.0

 Attachments: YARN-2288-v2.patch, YARN-2288-v3.patch, 
 YARN-2288-v4.patch, YARN-2288-v5.patch, YARN-2288.patch


 We have LevelDB-backed TimelineStore, it should have schema version for 
 changes in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords

2014-08-08 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090840#comment-14090840
 ] 

Larry McCay commented on YARN-2373:
---

Hi [~vvasudev] - thanks for the review and the good questions:

bq. 1. For the null case(where the WebAppUtils.getPassword() returns null), 
should we add a warning or an audit log that someone was trying to get a 
password that was null?

There was no such log or audit record in that case before adding the additional 
check for an alias in credential provider - so I didn't add anything new for 
it. It probably would be a good idea to do so - I don't know that this change 
makes it any more necessary though. Your question raises an interesting point 
for the Configuration.getPassword implementation though. I think that it would 
make sense to log a failure to get a password if there is no provisioned alias 
and it is configured to not allow fallback to config. We don't currently do 
that - it will just return null. I think we should file a separate jira for 
that.

bq. 2. Will you update documentation in another ticket(just to let users know 
that they can use a CredentialProvider instead of using plain text)?

We could do that. There is a jira for adding credential provider api 
documentation already are you thinking that it needs to have YARN specific 
documentation as well?

bq. Missed one more question - are you taking care of changes to ssl-client.xml 
as well?

This is a good point. I will have to track down those usages as well and file 
separate jiras.


Are any of these questions/answers blockers for this patch?

Thanks again for the review!

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance

2014-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090846#comment-14090846
 ] 

Hudson commented on YARN-2352:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6037 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6037/])
YARN-2352. Add missing file. FairScheduler: Collect metrics on duration of 
critical methods that affect performance. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616784)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSOpDurations.java
YARN-2352. FairScheduler: Collect metrics on duration of critical methods that 
affect performance. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616769)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsCollectorImpl.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/lib/MutableStat.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch, 
 yarn-2352-2.patch, yarn-2352-2.patch, yarn-2352-3.patch, yarn-2352-4.patch, 
 yarn-2352-5.patch, yarn-2352-6.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2397) RM web interface sometimes returns request is a replay error in secure mode

2014-08-08 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2397:
---

Priority: Critical  (was: Major)
Target Version/s: 2.6.0

 RM web interface sometimes returns request is a replay error in secure mode
 ---

 Key: YARN-2397
 URL: https://issues.apache.org/jira/browse/YARN-2397
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Critical
 Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch


 The RM web interface sometimes returns a request is a replay error if the 
 default kerberos http filter is enabled. This is because it uses the new 
 RMAuthenticationFilter in addition to the AuthenticationFilter. There is a 
 workaround to set 
 yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. 
 This bug is to fix the code to use only the RMAuthenticationFilter and not 
 both.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2288) Data persistent in timelinestore should be versioned

2014-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090864#comment-14090864
 ] 

Hudson commented on YARN-2288:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1857 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1857/])
YARN-2288. Made persisted data in LevelDB timeline store be versioned. 
Contributed by Junping Du. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616540)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java


 Data persistent in timelinestore should be versioned
 

 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.6.0

 Attachments: YARN-2288-v2.patch, YARN-2288-v3.patch, 
 YARN-2288-v4.patch, YARN-2288-v5.patch, YARN-2288.patch


 We have LevelDB-backed TimelineStore, it should have schema version for 
 changes in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance

2014-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090861#comment-14090861
 ] 

Hudson commented on YARN-2352:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1857 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1857/])
YARN-2352. Add missing file. FairScheduler: Collect metrics on duration of 
critical methods that affect performance. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616784)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSOpDurations.java
YARN-2352. FairScheduler: Collect metrics on duration of critical methods that 
affect performance. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616769)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsCollectorImpl.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/lib/MutableStat.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: fs-perf-metrics.png, yarn-2352-1.patch, 
 yarn-2352-2.patch, yarn-2352-2.patch, yarn-2352-3.patch, yarn-2352-4.patch, 
 yarn-2352-5.patch, yarn-2352-6.patch


 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090857#comment-14090857
 ] 

Hudson commented on YARN-2008:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1857 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1857/])
YARN-2008. Fixed CapacityScheduler to calculate headroom based on max available 
capacity instead of configured max capacity. Contributed by Craig Welch 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616580)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DefaultResourceCalculator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceCalculator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/Resources.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java


 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch, YARN-2008.5.patch, YARN-2008.6.patch, YARN-2008.7.patch, 
 YARN-2008.8.patch, YARN-2008.9.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090876#comment-14090876
 ] 

Hadoop QA commented on YARN-2138:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660614/YARN-2138.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4562//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4562//console

This message is automatically generated.

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.002.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2396) RpcClientFactoryPBImpl.stopClient always throws due to missing close method

2014-08-08 Thread chang li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chang li updated YARN-2396:
---

Attachment: yarn2396.patch

 RpcClientFactoryPBImpl.stopClient always throws due to missing close method
 ---

 Key: YARN-2396
 URL: https://issues.apache.org/jira/browse/YARN-2396
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.4.1
Reporter: Jason Lowe
Assignee: chang li
 Attachments: yarn2396.patch


 RpcClientFactoryPBImpl.stopClient will throw a YarnRuntimeException if the 
 protocol does not have a close method, despite the log message indicating it 
 is ignoring errors.  It's interesting to note that none of the YARN protocol 
 classes currently have a close method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically

2014-08-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090991#comment-14090991
 ] 

Jian He commented on YARN-2212:
---

looks good, +1

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212.1.patch, YARN-2212.2.patch, 
 YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, 
 YARN-2212.5.patch, YARN-2212.5.rebase.patch, YARN-2212.6.patch, 
 YARN-2212.6.patch, YARN-2212.7.patch, YARN-2212.7.patch, YARN-2212.8.patch, 
 YARN-2212.9.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091002#comment-14091002
 ] 

Karthik Kambatla commented on YARN-2138:


Looks good to me. 

Thanks Varun for looking into this. 

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.002.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091037#comment-14091037
 ] 

Karthik Kambatla edited comment on YARN-2026 at 8/8/14 5:54 PM:


Thanks for bearing with us on this JIRA, Ashwin. That patch looks mostly good. 
Minor comments:
# This is a very subjective opinion. In ComputeFairShares, would it be 
cleaner/simpler to rename existing {{public computeShares}} to {{private 
computeSharesInternal}}, and add a new {{public computeShares}} that calls the 
internal version only with active queues? 
# Thanks for adding a bunch of tests in TestFairSchedulerFairShare. Post 
YARN-1474, 
## setup() need not call 
{{scheduler.setRMContext(resourceManager.getRMContext());}}
## configureClusterWithQueuesAndOneNode need not call the following:
{code}
scheduler.init(conf);
scheduler.start();
scheduler.reinitialize(conf, resourceManager.getRMContext());
{code}


was (Author: kkambatl):
Thanks for bearing with us on this JIRA, Ashwin. That patch looks mostly good. 
Minor comments:
# This is a very subjective opinion. In ComputeFairShares, would it be 
cleaner/simpler to rename existing {{public computeShares}} to {{private 
computeSharesInternal}}, and add a new {{public computeShares}} that takes 
calls the internal version only with active queues? 
# Thanks for adding a bunch of tests in TestFairSchedulerFairShare. Post 
YARN-1474, 
## setup() need not call 
{{scheduler.setRMContext(resourceManager.getRMContext());}}
## configureClusterWithQueuesAndOneNode need not call the following:
{code}
scheduler.init(conf);
scheduler.start();
scheduler.reinitialize(conf, resourceManager.getRMContext());
{code}

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt, YARN-2026-v3.txt, 
 YARN-2026-v4.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after 

[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091037#comment-14091037
 ] 

Karthik Kambatla commented on YARN-2026:


Thanks for bearing with us on this JIRA, Ashwin. That patch looks mostly good. 
Minor comments:
# This is a very subjective opinion. In ComputeFairShares, would it be 
cleaner/simpler to rename existing {{public computeShares}} to {{private 
computeSharesInternal}}, and add a new {{public computeShares}} that takes 
calls the internal version only with active queues? 
# Thanks for adding a bunch of tests in TestFairSchedulerFairShare. Post 
YARN-1474, 
## setup() need not call 
{{scheduler.setRMContext(resourceManager.getRMContext());}}
## configureClusterWithQueuesAndOneNode need not call the following:
{code}
scheduler.init(conf);
scheduler.start();
scheduler.reinitialize(conf, resourceManager.getRMContext());
{code}

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt, YARN-2026-v3.txt, 
 YARN-2026-v4.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Problem2 - Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
 40%,which would ensure childQ1 gets upto 40% resource if needed through 
 preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2393) Fair Scheduler : Implement static fair share

2014-08-08 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091152#comment-14091152
 ] 

Wei Yan commented on YARN-2393:
---

Hey, [~ashwinshankar77], would u mind if I take this one?

 Fair Scheduler : Implement static fair share
 

 Key: YARN-2393
 URL: https://issues.apache.org/jira/browse/YARN-2393
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Ashwin Shankar

 Static fair share is a fair share allocation considering all(active/inactive) 
 queues.It would be shown on the UI for better predictability of finish time 
 of applications.
 We would compute static fair share only when needed, like on queue creation, 
 node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications

2014-08-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091154#comment-14091154
 ] 

Zhijie Shen commented on YARN-2317:
---

[~gtCarrera9], thanks for updating this document, which I think will be really 
helpful to the new YARN app developers. I read through the updated document, 
and it looks good to me in general. I've some minor comments so far:

0. Resource Manager - ResourceManager, Node Manager - NodeManager, 
Application Master - ApplicationMaster

1. application submission context
{code}
+  client can then set up application context, prepare the very first container 
of
{code}

2. Should ignore Unix, it is expected to work on windows as well.
{code}
Unix environment settings
{code}

3. YARN cluster
{code}
+  YARN platform, and handles application execution. It performs operations in 
an
{code}

4. object
{code}
+  AMRMClientAsync objects, with event handling methods specified in a
{code}

5. Don't say event. Users don't need to know the internal, 4 callback methods?
{code}
+  NMClientAsync. Typical container events include start, stop, status
{code}

6. Use Runnable objects to launch containers. can be removed, because it's 
not necessary to be on a separate thread.
{code}
+Use Runnable objects to launch containers. Communicate with node managers
{code}

7. ContainerManagerProtocol
{code}
+  ApplicationMasterProtocol and ContainerManager) are still preserved. The
{code}

8. Is this still valid?
{code}
+  // Set the necessary security tokens as needed
+  //amContainer.setContainerTokens(containerToken);
{code}

9. Perhaps you want to mention unregistration after the AM determines the work 
is done.

10. In Useful Links section, how about linking to the analog webpages on this 
web site: YARN Architecture and Capacity Scheduler

11. In Sample Code section, maybe we don't want to talk about the IDE. And 
maybe call it sample application?

BTW, I looked into the updated webpage directly, instead of doing side-by-side 
comparison between the old and the new webpages. It would be great if you can 
comment what are the itemized significant changes in your patch, such that the 
community can be aware of them.

 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, 
 YARN-2317-073014.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2393) Fair Scheduler : Implement static fair share

2014-08-08 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091177#comment-14091177
 ] 

Ashwin Shankar commented on YARN-2393:
--

hey [~ywskycn], please go ahead.

 Fair Scheduler : Implement static fair share
 

 Key: YARN-2393
 URL: https://issues.apache.org/jira/browse/YARN-2393
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Ashwin Shankar

 Static fair share is a fair share allocation considering all(active/inactive) 
 queues.It would be shown on the UI for better predictability of finish time 
 of applications.
 We would compute static fair share only when needed, like on queue creation, 
 node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2393) Fair Scheduler : Implement static fair share

2014-08-08 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan reassigned YARN-2393:
-

Assignee: Wei Yan

 Fair Scheduler : Implement static fair share
 

 Key: YARN-2393
 URL: https://issues.apache.org/jira/browse/YARN-2393
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan

 Static fair share is a fair share allocation considering all(active/inactive) 
 queues.It would be shown on the UI for better predictability of finish time 
 of applications.
 We would compute static fair share only when needed, like on queue creation, 
 node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2396) RpcClientFactoryPBImpl.stopClient always throws due to missing close method

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091250#comment-14091250
 ] 

Hadoop QA commented on YARN-2396:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660649/yarn2396.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4563//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4563//console

This message is automatically generated.

 RpcClientFactoryPBImpl.stopClient always throws due to missing close method
 ---

 Key: YARN-2396
 URL: https://issues.apache.org/jira/browse/YARN-2396
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.4.1
Reporter: Jason Lowe
Assignee: chang li
 Attachments: yarn2396.patch


 RpcClientFactoryPBImpl.stopClient will throw a YarnRuntimeException if the 
 protocol does not have a close method, despite the log message indicating it 
 is ignoring errors.  It's interesting to note that none of the YARN protocol 
 classes currently have a close method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2396) RpcClientFactoryPBImpl.stopClient always throws due to missing close method

2014-08-08 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091255#comment-14091255
 ] 

Mit Desai commented on YARN-2396:
-

lgtm +1 non-binding
This is a one line change to remove the throw exception line because it was 
supposed to ignore the exception in first place.

 RpcClientFactoryPBImpl.stopClient always throws due to missing close method
 ---

 Key: YARN-2396
 URL: https://issues.apache.org/jira/browse/YARN-2396
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.4.1
Reporter: Jason Lowe
Assignee: chang li
 Attachments: yarn2396.patch


 RpcClientFactoryPBImpl.stopClient will throw a YarnRuntimeException if the 
 protocol does not have a close method, despite the log message indicating it 
 is ignoring errors.  It's interesting to note that none of the YARN protocol 
 classes currently have a close method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2067) FairScheduler update/continuous-scheduling threads should start only when after the scheduler is started

2014-08-08 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2067.


Resolution: Invalid

This has been addressed by other JIRAs already. 

 FairScheduler update/continuous-scheduling threads should start only when 
 after the scheduler is started
 

 Key: YARN-2067
 URL: https://issues.apache.org/jira/browse/YARN-2067
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-08-08 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2277:
--

Attachment: YARN-2277-v3.patch

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-08-08 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091304#comment-14091304
 ] 

Jonathan Eagles commented on YARN-2277:
---

[~zjshen], Thank you for you feedback as this is going to have a much bigger 
impact on Hadoop as a whole. I have provided a minimal CORS filter that will 
give us an idea if this is the direction to go. Based on the direction of this 
patch, the scope has widened to create a general CrossOriginFilter for use 
within all Hadoop REST APIs. Probably, we will want to split the different 
pieces us across JIRAs, umbrella, Filter and FilterInitializer, additional 
configuration, and individual REST servers. This way we can focus on the end 
goal of getting Tez UI done in a timely manner without forgetting completeness 
of CORS support.

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-08 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2026:
-

Attachment: YARN-2026-v5.txt

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt, YARN-2026-v3.txt, 
 YARN-2026-v4.txt, YARN-2026-v5.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Problem2 - Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
 40%,which would ensure childQ1 gets upto 40% resource if needed through 
 preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-08 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091308#comment-14091308
 ] 

Ashwin Shankar commented on YARN-2026:
--

Thanks [~kasha]. All comments addressed in v5 patch.

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt, YARN-2026-v3.txt, 
 YARN-2026-v4.txt, YARN-2026-v5.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Problem2 - Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
 40%,which would ensure childQ1 gets upto 40% resource if needed through 
 preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically

2014-08-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2212:


Attachment: YARN-2212-branch-2.patch

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2212-branch-2.patch, YARN-2212.1.patch, 
 YARN-2212.2.patch, YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, 
 YARN-2212.5.patch, YARN-2212.5.patch, YARN-2212.5.rebase.patch, 
 YARN-2212.6.patch, YARN-2212.6.patch, YARN-2212.7.patch, YARN-2212.7.patch, 
 YARN-2212.8.patch, YARN-2212.9.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2237) MRAppMaster changes for AMRMToken roll-up

2014-08-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-2237.
-

   Resolution: Fixed
Fix Version/s: 2.6.0

Fixed and committed with YARN-2212

 MRAppMaster changes for AMRMToken roll-up
 -

 Key: YARN-2237
 URL: https://issues.apache.org/jira/browse/YARN-2237
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2237.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2207) Add ability to roll over AMRMToken

2014-08-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-2207.
-

   Resolution: Fixed
Fix Version/s: 2.6.0

 Add ability to roll over AMRMToken
 --

 Key: YARN-2207
 URL: https://issues.apache.org/jira/browse/YARN-2207
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0


 Currently, the master key is fixed after it created. But It is not ideal. We 
 need to add ability to roll over the AMRMToken. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically

2014-08-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091330#comment-14091330
 ] 

Xuan Gong commented on YARN-2212:
-

Committed into trunk and branch-2. Thanks Jian for review.

 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2212-branch-2.patch, YARN-2212.1.patch, 
 YARN-2212.2.patch, YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, 
 YARN-2212.5.patch, YARN-2212.5.patch, YARN-2212.5.rebase.patch, 
 YARN-2212.6.patch, YARN-2212.6.patch, YARN-2212.7.patch, YARN-2212.7.patch, 
 YARN-2212.8.patch, YARN-2212.9.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-356) Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env

2014-08-08 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-356.
---

Resolution: Duplicate

 Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env
 ---

 Key: YARN-356
 URL: https://issues.apache.org/jira/browse/YARN-356
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Lohit Vijayarenu

 At present it is difficult to set different Xmx values for RM and NM without 
 having different yarn-env.sh. Like HDFS, it would be good to have 
 YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2302) Refactor TimelineWebServices

2014-08-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2302:
--

Attachment: YARN-2302.2.patch

[~djp], thanks for your review. The general response to your comments on 
ApplicationHistoryServer is that protected vars/methods are the legacy things. 
Anyway before it grows worth, I did some more refactoring for this class in the 
new patch. In addition, I address the Log level issue in the new patch as well.

 Refactor TimelineWebServices
 

 Key: YARN-2302
 URL: https://issues.apache.org/jira/browse/YARN-2302
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2302.1.patch, YARN-2302.2.patch


 Now TimelineWebServices contains non-trivial logic to process the HTTP 
 requests, manipulate the data, check the access, and interact with the 
 timeline store.
 I propose the move the data-oriented logic to a middle layer (so called 
 TimelineDataManager), and TimelineWebServices only processes the requests, 
 and call TimelineDataManager to complete the remaining tasks.
 By doing this, we make the generic history module reuse TimelineDataManager 
 internally (YARN-2033), invoking the putting/getting methods directly. 
 Otherwise, we have to send the HTTP requests to TimelineWebServices to query 
 the generic history data, which is not an efficient way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-2302) Refactor TimelineWebServices

2014-08-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091364#comment-14091364
 ] 

Zhijie Shen edited comment on YARN-2302 at 8/8/14 10:12 PM:


[~djp], thanks for your review. The general response to your comments on 
ApplicationHistoryServer is that protected vars/methods are the legacy things. 
Anyway before it grows worse, I did some more refactoring for this class in the 
new patch. In addition, I address the Log level issue in the new patch as well.


was (Author: zjshen):
[~djp], thanks for your review. The general response to your comments on 
ApplicationHistoryServer is that protected vars/methods are the legacy things. 
Anyway before it grows worth, I did some more refactoring for this class in the 
new patch. In addition, I address the Log level issue in the new patch as well.

 Refactor TimelineWebServices
 

 Key: YARN-2302
 URL: https://issues.apache.org/jira/browse/YARN-2302
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2302.1.patch, YARN-2302.2.patch


 Now TimelineWebServices contains non-trivial logic to process the HTTP 
 requests, manipulate the data, check the access, and interact with the 
 timeline store.
 I propose the move the data-oriented logic to a middle layer (so called 
 TimelineDataManager), and TimelineWebServices only processes the requests, 
 and call TimelineDataManager to complete the remaining tasks.
 By doing this, we make the generic history module reuse TimelineDataManager 
 internally (YARN-2033), invoking the putting/getting methods directly. 
 Otherwise, we have to send the HTTP requests to TimelineWebServices to query 
 the generic history data, which is not an efficient way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091388#comment-14091388
 ] 

Jian He commented on YARN-2138:
---

thanks Karthik for the review !

Varun, patch not applying on trunk any more. mind updating the patch please ? 
Minor comment: noticed this seems having a tab before 
RMAppAttemptEventType.ATTEMPT_UPDATE_SAVED));
{code}
+  new RMAppAttemptEvent(applicationAttempt.getAppAttemptId(), 
+ RMAppAttemptEventType.ATTEMPT_UPDATE_SAVED));
{code}

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.002.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically

2014-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091415#comment-14091415
 ] 

Hudson commented on YARN-2212:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6039 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6039/])
YARN-2212: ApplicationMaster needs to find a way to update the AMRMToken 
periodically. Contributed by Xuan Gong (xgong: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616892)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/AMRMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRMWithCustomAMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 ApplicationMaster needs to find a way to update the AMRMToken periodically
 --

 Key: YARN-2212
 URL: https://issues.apache.org/jira/browse/YARN-2212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2212-branch-2.patch, YARN-2212.1.patch, 
 YARN-2212.2.patch, YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, 
 YARN-2212.5.patch, YARN-2212.5.patch, YARN-2212.5.rebase.patch, 
 YARN-2212.6.patch, YARN-2212.6.patch, YARN-2212.7.patch, 

[jira] [Updated] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

2014-08-08 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2249:
--

Attachment: YARN-2249.4.patch

New patch fixed the comments from Wangda

 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2138:
---

Attachment: YARN-2138.003.patch

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.002.patch, YARN-2138.003.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091438#comment-14091438
 ] 

Varun Saxena commented on YARN-2138:


Thanks [~jianhe] and [~kasha] for the review. I have uploaded a new patch which 
should apply to trunk.

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.002.patch, YARN-2138.003.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)

2014-08-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091447#comment-14091447
 ] 

Zhijie Shen commented on YARN-1954:
---

+1 except some nits:

1. The abstract method can actually be part of AMRMClient(Async) directly, 
instead of putting it into the impl, right? Just need an additional LOG in 
AMRMClient(Async). 
{code}
  public abstract void waitFor(SupplierBoolean check, int checkEveryMillis,
  int logInterval) throws InterruptedException, IllegalArgumentException;
{code}

2. IllegalArgumentException doesn't need to be declared

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch, YARN-1954.6.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091516#comment-14091516
 ] 

Hadoop QA commented on YARN-2026:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660716/YARN-2026-v5.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4564//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4564//console

This message is automatically generated.

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt, YARN-2026-v3.txt, 
 YARN-2026-v4.txt, YARN-2026-v5.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Problem2 - Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can 

[jira] [Commented] (YARN-2302) Refactor TimelineWebServices

2014-08-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091522#comment-14091522
 ] 

Junping Du commented on YARN-2302:
--

Thanks for updating the patch, [~zjshen]! The patch looks good now in overall. 
Some minor comments:
{code}
+  public TimelinePutResponse postEntities(
+  TimelineEntities entities,
+  UserGroupInformation callerUGI) throws YarnException, IOException {
+if (entities == null) {
{code}
Shall we rename this method to putEntities? There is slightly different between 
put and post operation (in REST prospective) while post is create (first 
time) but put is an update. The internal behavior of the method is like 
update and call put operation actually, so put could be more properly.
In addition, I think we should have javadoc for public methods in 
TimelineDataManager.java. 
Other looks fine.

 Refactor TimelineWebServices
 

 Key: YARN-2302
 URL: https://issues.apache.org/jira/browse/YARN-2302
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2302.1.patch, YARN-2302.2.patch


 Now TimelineWebServices contains non-trivial logic to process the HTTP 
 requests, manipulate the data, check the access, and interact with the 
 timeline store.
 I propose the move the data-oriented logic to a middle layer (so called 
 TimelineDataManager), and TimelineWebServices only processes the requests, 
 and call TimelineDataManager to complete the remaining tasks.
 By doing this, we make the generic history module reuse TimelineDataManager 
 internally (YARN-2033), invoking the putting/getting methods directly. 
 Otherwise, we have to send the HTTP requests to TimelineWebServices to query 
 the generic history data, which is not an efficient way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091527#comment-14091527
 ] 

Hadoop QA commented on YARN-2277:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660715/YARN-2277-v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4565//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4565//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4565//console

This message is automatically generated.

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2302) Refactor TimelineWebServices

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091539#comment-14091539
 ] 

Hadoop QA commented on YARN-2302:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660727/YARN-2302.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4566//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4566//console

This message is automatically generated.

 Refactor TimelineWebServices
 

 Key: YARN-2302
 URL: https://issues.apache.org/jira/browse/YARN-2302
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2302.1.patch, YARN-2302.2.patch


 Now TimelineWebServices contains non-trivial logic to process the HTTP 
 requests, manipulate the data, check the access, and interact with the 
 timeline store.
 I propose the move the data-oriented logic to a middle layer (so called 
 TimelineDataManager), and TimelineWebServices only processes the requests, 
 and call TimelineDataManager to complete the remaining tasks.
 By doing this, we make the generic history module reuse TimelineDataManager 
 internally (YARN-2033), invoking the putting/getting methods directly. 
 Otherwise, we have to send the HTTP requests to TimelineWebServices to query 
 the generic history data, which is not an efficient way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-08 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2399:
--

 Summary: FairScheduler: Merge AppSchedulable and FSSchedulerApp 
into FSAppAttempt
 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


FairScheduler has two data structures for an application, making the code hard 
to track. We should merge these for better maintainability in the long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-08-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091577#comment-14091577
 ] 

Karthik Kambatla commented on YARN-2026:


+1. Checking this in..

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt, YARN-2026-v3.txt, 
 YARN-2026-v4.txt, YARN-2026-v5.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Problem2 - Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
 40%,which would ensure childQ1 gets upto 40% resource if needed through 
 preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091583#comment-14091583
 ] 

Hadoop QA commented on YARN-2249:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660744/YARN-2249.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4567//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4567//console

This message is automatically generated.

 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)

2014-08-08 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1954:
-

Attachment: YARN-1954.6.patch

Thanks for your review, Zhijie. Updated:
1. Removed AMRMClient(Async)Impl#waitFor and put  AMRMClient(Async)#waitFor 
directly.
2. Removed IllegalArgumentException from the method declaration.

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch, YARN-1954.6.patch, 
 YARN-1954.6.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)

2014-08-08 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1954:
-

Attachment: (was: YARN-1954.6.patch)

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch, YARN-1954.6.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)

2014-08-08 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1954:
-

Attachment: YARN-1954.7.patch

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch, YARN-1954.6.patch, 
 YARN-1954.7.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)

2014-08-08 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1954:
-

Attachment: (was: YARN-1954.7.patch)

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch, YARN-1954.6.patch, 
 YARN-1954.7.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)

2014-08-08 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1954:
-

Attachment: YARN-1954.7.patch

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch, YARN-1954.6.patch, 
 YARN-1954.7.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091608#comment-14091608
 ] 

Hadoop QA commented on YARN-2138:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660749/YARN-2138.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4568//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4568//console

This message is automatically generated.

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.002.patch, YARN-2138.003.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler: Consider only active queues for computing fairshare

2014-08-08 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091609#comment-14091609
 ] 

Ashwin Shankar commented on YARN-2026:
--

Thanks a lot [~kasha], [~sandyr] for reviewing and committing my patch !

 Fair scheduler: Consider only active queues for computing fairshare
 ---

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Fix For: 2.6.0

 Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt, YARN-2026-v3.txt, 
 YARN-2026-v4.txt, YARN-2026-v5.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Problem2 - Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
 40%,which would ensure childQ1 gets upto 40% resource if needed through 
 preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler: Consider only active queues for computing fairshare

2014-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091610#comment-14091610
 ] 

Hudson commented on YARN-2026:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6041 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6041/])
YARN-2026. Fair scheduler: Consider only active queues for computing fairshare. 
(Ashwin Shankar via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616915)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/ComputeFairShares.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerFairShare.java


 Fair scheduler: Consider only active queues for computing fairshare
 ---

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Fix For: 2.6.0

 Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt, YARN-2026-v3.txt, 
 YARN-2026-v4.txt, YARN-2026-v5.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Problem2 - Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of 

[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091612#comment-14091612
 ] 

Hadoop QA commented on YARN-2399:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660780/yarn-2399-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4569//console

This message is automatically generated.

 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2399-1.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)

2014-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091621#comment-14091621
 ] 

Hadoop QA commented on YARN-1954:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660786/YARN-1954.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4570//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4570//console

This message is automatically generated.

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch, 
 YARN-1954.4.patch, YARN-1954.4.patch, YARN-1954.5.patch, YARN-1954.6.patch, 
 YARN-1954.7.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

2014-08-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091658#comment-14091658
 ] 

Wangda Tan commented on YARN-2249:
--

Jian, Thanks for update, 
My last comment is,
Could you rename {{mutex}} to {{pendingReleaseMutex}} or something?

Wangda

 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)