date:20140720

zhihai xu created YARN-2324:
---

 Summary: Race condition in continuousScheduling for FairScheduler
 Key: YARN-2324
 URL: https://issues.apache.org/jira/browse/YARN-2324
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu


Race condition in continuousScheduling for FairScheduler.
removeNode can run when continuousScheduling is called in schedulingThread. If 
the node is removed from nodes, nodes.get(n2) and getFSSchedulerNode(nodeId) 
will be null. So we need add lock to remove the NPE/race conditions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2324) Race condition in continuousScheduling for FairScheduler

2014-07-20 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067821#comment-14067821
 ] 

Wei Yan commented on YARN-2324:
---

duplicate of YARN-2273?

 Race condition in continuousScheduling for FairScheduler
 

 Key: YARN-2324
 URL: https://issues.apache.org/jira/browse/YARN-2324
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu

 Race condition in continuousScheduling for FairScheduler.
 removeNode can run when continuousScheduling is called in schedulingThread. 
 If the node is removed from nodes, nodes.get(n2) and 
 getFSSchedulerNode(nodeId) will be null. So we need add lock to remove the 
 NPE/race conditions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2324) Race condition in continuousScheduling for FairScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reassigned YARN-2324:
---

Assignee: zhihai xu

 Race condition in continuousScheduling for FairScheduler
 

 Key: YARN-2324
 URL: https://issues.apache.org/jira/browse/YARN-2324
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu

 Race condition in continuousScheduling for FairScheduler.
 removeNode can run when continuousScheduling is called in schedulingThread. 
 If the node is removed from nodes, nodes.get(n2) and 
 getFSSchedulerNode(nodeId) will be null. So we need add lock to remove the 
 NPE/race conditions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2324) Race condition in continuousScheduling for FairScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2324:


Attachment: YARN-2324.000.patch

 Race condition in continuousScheduling for FairScheduler
 

 Key: YARN-2324
 URL: https://issues.apache.org/jira/browse/YARN-2324
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2324.000.patch


 Race condition in continuousScheduling for FairScheduler.
 removeNode can run when continuousScheduling is called in schedulingThread. 
 If the node is removed from nodes, nodes.get(n2) and 
 getFSSchedulerNode(nodeId) will be null. So we need add lock to remove the 
 NPE/race conditions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2324) Race condition in continuousScheduling for FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067844#comment-14067844
 ] 

Hadoop QA commented on YARN-2324:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656778/YARN-2324.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4373//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4373//console

This message is automatically generated.

 Race condition in continuousScheduling for FairScheduler
 

 Key: YARN-2324
 URL: https://issues.apache.org/jira/browse/YARN-2324
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2324.000.patch


 Race condition in continuousScheduling for FairScheduler.
 removeNode can run when continuousScheduling is called in schedulingThread. 
 If the node is removed from nodes, nodes.get(n2) and 
 getFSSchedulerNode(nodeId) will be null. So we need add lock to remove the 
 NPE/race conditions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2013) The diagnostics is always the ExitCodeException stack when the container crashes


 [ 
https://issues.apache.org/jira/browse/YARN-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2013:
-

Attachment: YARN-2013.5.patch

[~djp], thank you for review! Updated a patch to address your comment.

 The diagnostics is always the ExitCodeException stack when the container 
 crashes
 

 Key: YARN-2013
 URL: https://issues.apache.org/jira/browse/YARN-2013
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2013.1.patch, YARN-2013.2.patch, 
 YARN-2013.3-2.patch, YARN-2013.3.patch, YARN-2013.4.patch, YARN-2013.5.patch


 When a container crashes, ExitCodeException will be thrown from Shell. 
 Default/LinuxContainerExecutor captures the exception, put the exception 
 stack into the diagnostic. Therefore, the exception stack is always the same. 
 {code}
 String diagnostics = Exception from container-launch: \n
 + StringUtils.stringifyException(e) + \n + shExec.getOutput();
 container.handle(new ContainerDiagnosticsUpdateEvent(containerId,
 diagnostics));
 {code}
 In addition, it seems that the exception always has a empty message as 
 there's no message from stderr. Hence the diagnostics is not of much use for 
 users to analyze the reason of container crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2324) Race condition in continuousScheduling for FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067889#comment-14067889
 ] 

Tsuyoshi OZAWA commented on YARN-2324:
--

[~zxu], thank you for reporting and your contribution! As [~ywskycn] mentioned, 
we're addressing the problem on YARN-2273. Let's close this ticket as 
duplicated.

 Race condition in continuousScheduling for FairScheduler
 

 Key: YARN-2324
 URL: https://issues.apache.org/jira/browse/YARN-2324
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2324.000.patch


 Race condition in continuousScheduling for FairScheduler.
 removeNode can run when continuousScheduling is called in schedulingThread. 
 If the node is removed from nodes, nodes.get(n2) and 
 getFSSchedulerNode(nodeId) will be null. So we need add lock to remove the 
 NPE/race conditions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2013) The diagnostics is always the ExitCodeException stack when the container crashes


[ 
https://issues.apache.org/jira/browse/YARN-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067898#comment-14067898
 ] 

Hadoop QA commented on YARN-2013:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656788/YARN-2013.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4374//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4374//console

This message is automatically generated.

 The diagnostics is always the ExitCodeException stack when the container 
 crashes
 

 Key: YARN-2013
 URL: https://issues.apache.org/jira/browse/YARN-2013
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2013.1.patch, YARN-2013.2.patch, 
 YARN-2013.3-2.patch, YARN-2013.3.patch, YARN-2013.4.patch, YARN-2013.5.patch


 When a container crashes, ExitCodeException will be thrown from Shell. 
 Default/LinuxContainerExecutor captures the exception, put the exception 
 stack into the diagnostic. Therefore, the exception stack is always the same. 
 {code}
 String diagnostics = Exception from container-launch: \n
 + StringUtils.stringifyException(e) + \n + shExec.getOutput();
 container.handle(new ContainerDiagnosticsUpdateEvent(containerId,
 diagnostics));
 {code}
 In addition, it seems that the exception always has a empty message as 
 there's no message from stderr. Hence the diagnostics is not of much use for 
 users to analyze the reason of container crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2013) The diagnostics is always the ExitCodeException stack when the container crashes


[ 
https://issues.apache.org/jira/browse/YARN-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067922#comment-14067922
 ] 

Tsuyoshi OZAWA commented on YARN-2013:
--

The test failure is not related.

 The diagnostics is always the ExitCodeException stack when the container 
 crashes
 

 Key: YARN-2013
 URL: https://issues.apache.org/jira/browse/YARN-2013
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2013.1.patch, YARN-2013.2.patch, 
 YARN-2013.3-2.patch, YARN-2013.3.patch, YARN-2013.4.patch, YARN-2013.5.patch


 When a container crashes, ExitCodeException will be thrown from Shell. 
 Default/LinuxContainerExecutor captures the exception, put the exception 
 stack into the diagnostic. Therefore, the exception stack is always the same. 
 {code}
 String diagnostics = Exception from container-launch: \n
 + StringUtils.stringifyException(e) + \n + shExec.getOutput();
 container.handle(new ContainerDiagnosticsUpdateEvent(containerId,
 diagnostics));
 {code}
 In addition, it seems that the exception always has a empty message as 
 there's no message from stderr. Hence the diagnostics is not of much use for 
 users to analyze the reason of container crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2309) NPE during RM-Restart test scenario


[ 
https://issues.apache.org/jira/browse/YARN-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067928#comment-14067928
 ] 

Tsuyoshi OZAWA commented on YARN-2309:
--

[~nishan], thank you for reporting. The patch to fix the problem is available 
on YARN-1919. Any feedbacks are welcome :-)

 NPE during RM-Restart test scenario
 ---

 Key: YARN-2309
 URL: https://issues.apache.org/jira/browse/YARN-2309
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Priority: Minor

 During RMRestart test scenarios, we met with below exception. 
 A point to note here is, Zookeeper also was not stable during this testing, 
 we could see many Zookeeper exception before getting this NPE
 {code}
 2014-07-10 10:49:46,817 WARN org.apache.hadoop.service.AbstractService: When 
 stopping the service 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService : 
 java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceStop(EmbeddedElectorService.java:108)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:125)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:232)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1039)
 {code}
 Zookeeper Exception
 {code}
 2014-07-10 10:49:46,816 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService 
 failed in state INITED; cause: 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1046)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1017)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:632)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:766)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue

[
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067998#comment-14067998
]

Sandy Ryza commented on YARN-2313:
--

Thanks for reporting this [~ozawa].

A couple nits:
* The new configuration should be defined in FairSchedulerConfiguration like
other fair scheduler props
* If I understand correctly, the race described in the findbugs could never
actually happen. For code readability, I think it's better to add a findbugs
exclude than an unnecessary synchronization.
* In the warning, replace use with using
* Extra space after DEFAULT_RM_SCHEDULER_FS_UPDATE_INTERVAL_MS

Eventually, I think we should try to be smarter about the work that goes on in
update(). In most cases, the fair shares will stay the same, or will only
change for apps in a particular queue, so we can avoid recomputation.

Livelock can occur on FairScheduler when there are lots entry in queue
--

Key: YARN-2313
URL: https://issues.apache.org/jira/browse/YARN-2313
Project: Hadoop YARN
Issue Type: Bug
Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch,
rm-stack-trace.txt

Observed livelock on FairScheduler when there are lots entry in queue. After
my investigating code, following case can occur:
1. {{update()}} called by UpdateThread takes longer times than
UPDATE_INTERVAL(500ms) if there are lots queue.
2. UpdateThread goes busy loop.
3. Other threads(AllocationFileReloader,
ResourceManager$SchedulerEventDispatcher) can wait forever.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

[
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068017#comment-14068017
]

Sandy Ryza commented on YARN-796:
-

I'm worried that the proposal is becoming too complex. Can we try to whittle
the proposal down to a minimum viable feature? I'm not necessarily opposed to
the more advanced parts of it like queue label policies and updating labels on
the fly, and the design should aim to make them possible in the future, but I
don't think they need to be part of the initial implementation.

To me it seems like the essential requirements here are:
* A way for nodes to be tagged with labels
* A way to make scheduling requests based on these labels

I'm also skeptical about the need for adding/removing labels dynamically. Do
we have concrete use cases for this?

Lastly, as BC and Sunil have pointed out, specifying the labels in the
NodeManager confs greatly simplifies configuration when nodes are being added.
Are there advantages to a centralized configuration?

Allow for (admin) labels on nodes and resource-requests
---

Key: YARN-796
URL: https://issues.apache.org/jira/browse/YARN-796
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
Attachments: LabelBasedScheduling.pdf,
Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch

It will be useful for admins to specify labels for nodes. Examples of labels
are OS, processor architecture etc.
We should expose these labels and allow applications to specify labels on
resource-requests.
Obviously we need to support admin operations on adding/removing node labels.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-20 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068024#comment-14068024
 ] 

Craig Welch commented on YARN-2008:
---

Hmm, I don't think that we can change headroom to be just the guaranteed or 
base capacity because I believe that will defeat the support for  having a max 
capacity above the base capacity.  As I understand it that is in place so that 
busy queues can grow to use more of the cluster when other queues are 
underutilized - to achieve more efficient and full use of the cluster overall - 
and if the application gets the low baseline headroom it will not be able to 
effectively use that greater capacity.  Assuming we keep support for the max 
capacity, then even with pre-emption I think we will need this logic, because 
preemption won't guarantee that all the queues have their max capacity 
available to them, as the total max capacity can be over 100%.  Preemption will 
help, certainly, but I don't think it can replace this logic - I think we need 
both.

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-20 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068026#comment-14068026
 ] 

Chen He commented on YARN-2008:
---

Thank you for the patch, [~cwelch]
{quote}
and if the application gets the low baseline headroom it will not be able to 
effectively use that greater capacity. 
{quote}
If we overestimate the headroom, it will cause some AMs hang or worst case: 
deadlock. This JIRA is to avoid that. 

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2325) need check whether node is null in nodeUpdate for FairScheduler

zhihai xu created YARN-2325:
---

 Summary: need check whether node is null in nodeUpdate for 
FairScheduler 
 Key: YARN-2325
 URL: https://issues.apache.org/jira/browse/YARN-2325
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu


need check whether node is null in nodeUpdate for FairScheduler.
If nodeUpdate is called after removeNode, the getFSSchedulerNode will be null. 
If the node is null, we should return with error message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2325) need check whether node is null in nodeUpdate for FairScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2325:


Attachment: YARN-2325.000.patch

 need check whether node is null in nodeUpdate for FairScheduler 
 

 Key: YARN-2325
 URL: https://issues.apache.org/jira/browse/YARN-2325
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
 Attachments: YARN-2325.000.patch


 need check whether node is null in nodeUpdate for FairScheduler.
 If nodeUpdate is called after removeNode, the getFSSchedulerNode will be 
 null. If the node is null, we should return with error message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2325) need check whether node is null in nodeUpdate for FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068035#comment-14068035
 ] 

Hadoop QA commented on YARN-2325:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656795/YARN-2325.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4375//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4375//console

This message is automatically generated.

 need check whether node is null in nodeUpdate for FairScheduler 
 

 Key: YARN-2325
 URL: https://issues.apache.org/jira/browse/YARN-2325
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2325.000.patch


 need check whether node is null in nodeUpdate for FairScheduler.
 If nodeUpdate is called after removeNode, the getFSSchedulerNode will be 
 null. If the node is null, we should return with error message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Jian Fang (JIRA)

[
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068041#comment-14068041
]

Jian Fang commented on YARN-796:

As Sandy pointed out, seems the scope becomes bigger and bigger. Take our use
case as an example, we initial only need to restrict Application masters not be
assigned to some nodes such as spot instances in EC2. In our design, we only
added the following parameters

yarn.label.enabled
yarn.nodemanager.labels
yarn.app.mapreduce.am.labels

to yarn-site.xml and then modified hadoop code. This function works now. With
the current proposal, I wonder how long it may take to finish.

I also doubt about the assumption for admin to configure labels for a cluster.
Usually a cluster comes with hundreds or thousands of nodes, how possible for
the admin to manually configure the labels? This type of work can be easily
automated by some script or a java process running on each node to write the
labels such as OS, processor, and other parameters to yarn-site.xml before the
cluster is started. This is especially true for clusters in a cloud because
everything is automated there. The admin UI could only be used in some special
cases that require human intervention.

One use case for dynamic labeling is that we can put a label to a node when we
try to shrink a cluster so that hadoop will not assign tasks to that node any
more to give that node some grace time to be decommissioned. This is most
likely to be implemented by a restful API call from a process that chooses a
node to remove based on cluster metrics of the cluster.

Allow for (admin) labels on nodes and resource-requests
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Allen Wittenauer (JIRA)

[
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068043#comment-14068043
]

Allen Wittenauer commented on YARN-796:
---

I agree pretty much completely with everything Sandy said, especially on the
centralized configuration. It actually makes configuration harder for
heterogeneous node setups.

One caveat:

{code}
I'm also skeptical about the need for adding/removing labels dynamically. Do we
have concrete use cases for this?
{code}

If you have the nodemanager push the labels to the RM (esp if you can do this
via user defined script or java class...), you basically have to have dynamic
labels for nodes. Use cases are pretty easy to hit if you label nodes based
upon the software stack installed. A quick example for those not following:

# User writes software that depends upon a particular version of libfoo.so.2.
# Configuration management does an install of libfoo.so.2
# NodeManager label script picks up that it has both libfoo.so.1 and
libfoo.so.2. Publishes that it now has libfoo1 and libfoo2. (Remember,
this is C and not the screwed up Java universe so having two versions is
completely legitimate)
# system can now do operations appropriate for either libfoo on that node.
# libfoo1 gets deprecated and removed from the system, again via configuration
management.
# label script picks up change and removes libfoo1 from label listing
# system acts appropriately and no longer does operations on node based upon
libfoo1 label

... and all without restarting or reconfiguring anything on the Hadoop side.
If there is any sort of manual step required in configuration the nodes short
of the initial label script/class and other obviously user-provided bits, then
we've failed.

Allow for (admin) labels on nodes and resource-requests
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068048#comment-14068048
 ] 

Alejandro Abdelnur commented on YARN-796:
-

i agree with sandy and allen. 

said that, we currently dont do any thing centralized on per nodemanager basis, 
if we want to so that we should think solving it in a more general way than 
just labels. and i would suggest doing that (if we decide to) in a diff jira. 

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-20 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2211:


Attachment: YARN-2211.5.1.patch

Same patch with -1 on release audit fix

 RMStateStore needs to save AMRMToken master key for recovery when RM 
 restart/failover happens 
 --

 Key: YARN-2211
 URL: https://issues.apache.org/jira/browse/YARN-2211
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, 
 YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch


 After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
 related Master Keys and use them to recover the AMRMToken when RM 
 restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

[
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068069#comment-14068069
]

Hadoop QA commented on YARN-2211:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12656801/YARN-2211.5.1.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 9 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers

org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices

org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/4376//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4376//console

This message is automatically generated.

RMStateStore needs to save AMRMToken master key for recovery when RM
restart/failover happens
--

Key: YARN-2211
URL: https://issues.apache.org/jira/browse/YARN-2211
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch,
YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch

After YARN-2208, AMRMToken can be rolled over periodically. We need to save
related Master Keys and use them to recover the AMRMToken when RM
restart/failover happens

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-07-20 Thread Jian He (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068074#comment-14068074
]

Jian He commented on YARN-1372:
---

{code}
NM informs RM and holds on to the information (YARN-1336 should handle this as
well)
RM informs AM
AM acks RM
RM acks NM
NM deletes the information
{code}
The approach looks reasonable to me. [~adhoot], wanna take a stab at this?

Ensure all completed containers are reported to the AMs across RM restart
-

Key: YARN-1372
URL: https://issues.apache.org/jira/browse/YARN-1372
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot

Currently the NM informs the RM about completed containers and then removes
those containers from the RM notification list. The RM passes on that
completed container information to the AM and the AM pulls this data. If the
RM dies before the AM pulls this data then the AM may not be able to get this
information again. To fix this, NM should maintain a separate list of such
completed container notifications sent to the RM. After the AM has pulled the
containers from the RM then the RM will inform the NM about it and the NM can
remove the completed container from the new list. Upon re-register with the
RM (after RM restart) the NM should send the entire list of completed
containers to the RM along with any other containers that completed while the
RM was dead. This ensures that the RM can inform the AM's about all completed
containers. Some container completions may be reported more than once since
the AM may have pulled the container but the RM may die before notifying the
NM about the pull.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

2014-07-20 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2249:
--

Description: AM resync on RM restart will send outstanding container 
release requests back to the new RM. In the meantime, NMs report the container 
statuses back to RM to recover the containers. If RM receives the container 
release request  before the container is actually recovered in scheduler, the 
container won't be released and the release request will be lost.  (was: AM 
resync on RM restart will send outstanding resource requests, container release 
list etc. back to the new RM. It is possible that RM receives the container 
release request  before the container is actually recovered.  )

 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

2014-07-20 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068076#comment-14068076
 ] 

Jian He commented on YARN-2249:
---

One possible solution is to have AM always send the whole pending release 
requests in every allocate.  The pending release will be decremented once AM 
receives the completed status of the released container.
Specifically, changing AMRMClient to send the pendingRelease instead of release 
in the allocate method.


 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

[
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068116#comment-14068116
]

Wangda Tan commented on YARN-796:
-

Really thanks all your comments above,

As Sandy, Alejandro and Allen mentioned, concerns of centralized configuration.
My thinking is, node label is more dynamic comparing to any other existing
options of NM.
An important use case we can see is, some customers want to mark label on each
node indicate which department/team the node belongs to, when a new team comes
in and new machines added, labels may need to be changed. And also, it is
possible that the whole cluster is booked to run some huge batch job at
12am-2am for example. So such labels will be changed frequently. If we only
have distributed configuration on each node, it is a nightmare for admins to
re-configure.
I think we should have a same internal interface for destributed/centralized
configuration. Like what we've done for RMStateStore.

And as Jian Fang mentioned,
bq. doubt about the assumption for admin to configure labels for a cluster.
I think using script to mark labels is a great way to saving configuration
works. But lots of other use cases need human intervention as well. Good
examples like from Allen and me.

Thanks,
Wangda

Allow for (admin) labels on nodes and resource-requests
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068120#comment-14068120
 ] 

Alejandro Abdelnur commented on YARN-796:
-

Wangda, your usecase is throwing overboard the work pf the scheduler regarding 
matching nodes with data locality. you can solve it in a much better way using 
scheduler queues configuration, which can be dynamically adjusted. 

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

[
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068124#comment-14068124
]

Wangda Tan commented on YARN-796:
-

Hi Alejandro,
I totally understand the use case I mentioned is antithetical of the design
philosophy of YARN, which should be elastically sharing resources of a
multi-tenant environment. But hard partition has some important use cases, even
if this is not strongly recommended.
Like in some performance-sensitive environment. For example user may want to
run HBase master/region-servers in a group of nodes, and don't want any other
tasks running in these nodes even if they have free resource.
Our current queue configuration cannot solve such problem, of course user can
create a separate YARN cluster in this case, but I think make such NMs under a
same RM is easy to use and manage.

Do you agree?
Thanks,

Allow for (admin) labels on nodes and resource-requests
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2323) FairShareComparator creates too much Resource object


[ 
https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068143#comment-14068143
 ] 

Sandy Ryza commented on YARN-2323:
--

As it's a static final variable, ONE should be all caps.  Otherwise, LGTM.

 FairShareComparator creates too much Resource object
 

 Key: YARN-2323
 URL: https://issues.apache.org/jira/browse/YARN-2323
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2323.patch


 Each call of {{FairShareComparator}} creates a new Resource object one:
 {code}
 Resource one = Resources.createResource(1);
 {code}
 At the volume of 1000 nodes and 1000 apps, the comparator will be called more 
 than 10 million times per second, thus creating more than 10 million object 
 one, which is unnecessary.
 Since the object one is read-only and is never referenced outside of 
 comparator, we could make it static.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068144#comment-14068144
 ] 

Alejandro Abdelnur commented on YARN-796:
-

Wangda, i'm afraid i'm lost with your last comment. i thought labels were to 
express desired node affinity base on a label, not to fence off nodes. i don't 
understand how you will achieve fencing off a node with a label unless you have 
a more complex annotation mechanism than just a label (ie book this node only 
if label X is present) also you would have to add ACLs to labels to avoid 
anybody simply asking for a label. 

am i missing something?

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068145#comment-14068145
 ] 

Wangda Tan commented on YARN-796:
-

Alejandro,
I think we've mentioned this in our design doc, you check check 
https://issues.apache.org/jira/secure/attachment/12654446/Node-labels-Requirements-Design-doc-V1.pdf,
 top level requirements-admin tools-Security and access controls for 
managing Labels. Please let me know if you have any comments on it.

Thanks :),

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

[
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068147#comment-14068147
]

Wangda Tan commented on YARN-1198:
--

I've just taken a look at all sub tasks of this JIRA, I'm wondering if we
should define what is the headroom first.
In previous YARN, including YARN-1198 the headroom is defined as the maximum
resource of an application can get.
And in YARN-2008, the headroom is defined as the available resource of an
application can get, because we already considered used resource of sibling
queues.

I'm afraid if we need add a new field like guaranteed headroom of an
application consider its absolute capacity (not maximum capacity) and
user-limits, etc. We may keep both of them because,
- The maximum resource is not always achievible because sum of maximum resource
of leaf queues may excess cluster resource.
- With preemption, resource beyond guaranteed resource will be likely
preempted. It should be consider as a temporary resource.

And with this, AM can,
- Using guaranteed headroom to allocate resource which will not be preempted.
- Using maximum headroom to try to allocate resource beyond its guaranteed
headroom.

And in my humble opinion, the available resource of an application can get
doesn't make a lot of sense here, and may cause some backward-compatible
problems as well. Because in a dynamic cluster, the number can change rapidly,
it is possible that a cluster is fulfilled by another application just happens
one second after the AM got the available headroom.
And also, this field can not solve the deadlock problem as well, a malicious
application can ask much more resource of this, or a careless developer totally
ignore this field. The only valid solution in my head is putting such logic
into scheduler side, and enforce resource usage by preemption policy.

Any thoughts? [~jlowe], [~cwelch]

Thanks,
Wangda

Capacity Scheduler headroom calculation does not work as expected
-

Key: YARN-1198
URL: https://issues.apache.org/jira/browse/YARN-1198
Project: Hadoop YARN
Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Attachments: YARN-1198.1.patch

Today headroom calculation (for the app) takes place only when
* New node is added/removed from the cluster
* New container is getting assigned to the application.
However there are potentially lot of situations which are not considered for
this calculation
* If a container finishes then headroom for that application will change and
should be notified to the AM accordingly.
* If a single user has submitted multiple applications (app1 and app2) to the
same queue then
** If app1's container finishes then not only app1's but also app2's AM
should be notified about the change in headroom.
** Similarly if a container is assigned to any applications app1/app2 then
both AM should be notified about their headroom.
** To simplify the whole communication process it is ideal to keep headroom
per User per LeafQueue so that everyone gets the same picture (apps belonging
to same user and submitted in same queue).
* If a new user submits an application to the queue then all applications
submitted by all users in that queue should be notified of the headroom
change.
* Also today headroom is an absolute number ( I think it should be normalized
but then this is going to be not backward compatible..)
* Also when admin user refreshes queue headroom has to be updated.
These all are the potential bugs in headroom calculations

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068148#comment-14068148
 ] 

Wangda Tan commented on YARN-2008:
--

Hi [~cwelch], [~airbots],
I've put my comment on YARN-1198: 
https://issues.apache.org/jira/browse/YARN-1198?focusedCommentId=14068147page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14068147,
 because I think it is a general comment of headroom.
Please share your ideas here,

Thanks,
Wangda

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

[
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068163#comment-14068163
]

Wangda Tan commented on YARN-796:
-

Hi [~sunilg],
bq. 2. Regarding reservations, how about introducing node-label reservations.
Ideas is like, if an application is lacking resource on a node, it can reserve
on that node as well as to node-label. So when a suitable node update comes
from another node in same node-label, can try allocating container in new node
by unreserving from old node.
I think this makes sense, we'd better support this. I will check our current
resource reservation/unreservation logic how to support it, will keep you
posted.

bq. 3. My approach was more like have a centralized configuration, but later
after some time, if want to add a new node to cluster, then it can start with a
hardcoded label in its yarn-site. In your approach, we need to use REStful API
or admin command to bring this node under one label. May be while start up
itself this node can be set under a label. your thoughts?
I think a problem of mixed centralized/distributed configuration I can see is,
it will be hard to manage them after RM/NM restart -- should we use labels
specified in NM config or our centralized config? I also replied Jian Fang
previously about this:
https://issues.apache.org/jira/browse/YARN-796?focusedCommentId=14063316page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14063316.
Maybe a workaround is we can define the centralized config all always overwrite
distributed config. E.g. user defined GPU in NM config, and admin use RESTful
added FPGA, RM will serialize both GPU, FPGA into a centralized storage
system. And after RM restart or NM restart, RM will ignore NM config if
anything defined in RM. But I still think it's better to avoid use both of them
together.

Allow for (admin) labels on nodes and resource-requests
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Allen Wittenauer (JIRA)

[
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068169#comment-14068169
]

Allen Wittenauer commented on YARN-796:
---

bq. An important use case we can see is, some customers want to mark label on
each node indicate which department/team the node belongs to, when a new team
comes in and new machines added, labels may need to be changed.

You can solve this problem today by just running separate RMs. In practice,
however, marking nodes for specific teams in queue systems doesn't work because
doing so assumes that the capacity never changes... i.e., nodes never fail.
That happens all the time, of course, thus why percentages make a lot more
sense. If you absolutely want a fixed number of capacity, you still wouldn't
mark specific nodes: you'd say queue x gets y machines with no specification
of which nodes.

bq. And also, it is possible that the whole cluster is booked to run some huge
batch job at 12am-2am for example. So such labels will be changed frequently.

Well, no, they won't. They'll happen exactly twice a day. But it doesn't
matter: you can solve this problem today too by just setting something that
changes the queue acls at 12am and 2am via a cron job.

bq. For example user may want to run HBase master/region-servers in a group of
nodes, and don't want any other tasks running in these nodes even if they have
free resource. Our current queue configuration cannot solve such problem

... except, you guessed it: this is a solved problem today too. You just need
to make sure the container sizes that are requested consume the whole node.

bq. If we only have distributed configuration on each node, it is a nightmare
for admins to re-configure.

Hi. My name is Allen and I'm an admin. Even if using labels for doing this
type of scheduling was sane, it still wouldn't be a nightmare because any
competent admin would use configuration management to roll out changes to the
nodes in a controlled manner.

But more importantly: these use cases are *solved problems* and have been in
YARN for a very long time.

Allow for (admin) labels on nodes and resource-requests
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2323) FairShareComparator creates too much Resource object

2014-07-20 Thread Hong Zhiguo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2323:
--

Attachment: YARN-2323-2.patch

patch revised according to [~sandyr]'s comments.

 FairShareComparator creates too much Resource object
 

 Key: YARN-2323
 URL: https://issues.apache.org/jira/browse/YARN-2323
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2323-2.patch, YARN-2323.patch


 Each call of {{FairShareComparator}} creates a new Resource object one:
 {code}
 Resource one = Resources.createResource(1);
 {code}
 At the volume of 1000 nodes and 1000 apps, the comparator will be called more 
 than 10 million times per second, thus creating more than 10 million object 
 one, which is unnecessary.
 Since the object one is read-only and is never referenced outside of 
 comparator, we could make it static.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

[
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068184#comment-14068184
]

Wangda Tan commented on YARN-796:
-

bq. You can solve this problem today by just running separate RMs.
I think it's not good for configure, user need maintain several configuration
folders in their nodes for submission job.

bq. In practice, however, marking nodes for specific teams in queue systems
doesn't work because doing so assumes that the capacity never changes... i.e
It is possible that you cannot replace a failure node by a random node in
heterogeneous cluster. E.g. only some nodes have GPUs, and these nodes will be
dedicated to be used by data scientist team. Percentage of queue capacity
doesn't make a lot of sense here.

bq. ... except, you guessed it: this is a solved problem today too. You just
need to make sure the container sizes that are requested consume the whole node.
Assume a HBase master want to run a node have 64G mem and infiniband. You can
ask a 64G mem container, but it may be like to be allocated to a 128G node but
doesn't have infiniband.
Again, it's another heterogeneous issue.
And ask for such a big container may need take a great amount of time, wait for
resource reservation, etc.

bq. it still wouldn't be a nightmare because any competent admin would use
configuration management to roll out changes to the nodes in a controlled
manner.
It is very likely not every admin has scripts like you, especially some new
YARN users, we'd better make this feature can be used out-of-box

Allow for (admin) labels on nodes and resource-requests
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2323) FairShareComparator creates too much Resource object


[ 
https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068207#comment-14068207
 ] 

Hadoop QA commented on YARN-2323:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656814/YARN-2323-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4377//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4377//console

This message is automatically generated.

 FairShareComparator creates too much Resource object
 

 Key: YARN-2323
 URL: https://issues.apache.org/jira/browse/YARN-2323
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2323-2.patch, YARN-2323.patch


 Each call of {{FairShareComparator}} creates a new Resource object one:
 {code}
 Resource one = Resources.createResource(1);
 {code}
 At the volume of 1000 nodes and 1000 apps, the comparator will be called more 
 than 10 million times per second, thus creating more than 10 million object 
 one, which is unnecessary.
 Since the object one is read-only and is never referenced outside of 
 comparator, we could make it static.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2323) FairShareComparator creates too many Resource objects


 [ 
https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-2323:
-

Summary: FairShareComparator creates too many Resource objects  (was: 
FairShareComparator creates too much Resource object)

 FairShareComparator creates too many Resource objects
 -

 Key: YARN-2323
 URL: https://issues.apache.org/jira/browse/YARN-2323
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2323-2.patch, YARN-2323.patch


 Each call of {{FairShareComparator}} creates a new Resource object one:
 {code}
 Resource one = Resources.createResource(1);
 {code}
 At the volume of 1000 nodes and 1000 apps, the comparator will be called more 
 than 10 million times per second, thus creating more than 10 million object 
 one, which is unnecessary.
 Since the object one is read-only and is never referenced outside of 
 comparator, we could make it static.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2323) FairShareComparator creates too many Resource objects

2014-07-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068210#comment-14068210
 ] 

Hudson commented on YARN-2323:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5921 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5921/])
YARN-2323. FairShareComparator creates too many Resource objects (Hong Zhiguo 
via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612187)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java


 FairShareComparator creates too many Resource objects
 -

 Key: YARN-2323
 URL: https://issues.apache.org/jira/browse/YARN-2323
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-2323-2.patch, YARN-2323.patch


 Each call of {{FairShareComparator}} creates a new Resource object one:
 {code}
 Resource one = Resources.createResource(1);
 {code}
 At the volume of 1000 nodes and 1000 apps, the comparator will be called more 
 than 10 million times per second, thus creating more than 10 million object 
 one, which is unnecessary.
 Since the object one is read-only and is never referenced outside of 
 comparator, we could make it static.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-07-20 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--

Attachment: YARN-2033_ALL.2.patch

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.Prototype.patch, 
 YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-07-20 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--

Attachment: YARN-2033.2.patch

TestFSDownload is not related, but upload a patch to fix remaining issues.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.Prototype.patch, 
 YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml

2014-07-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2284:
-

Attachment: YARN2284-02.patch

Second attempt.

- Have minimal changes to Configuration to read in xml files and retain keys 
with no values for comparison.
- Pull out common functions into a utility class.
- Create two unit tests, one for YarnConfiguration/yarn-default.xml and another 
for MRJobConfig/mapred-default.xml.  Unit tests generate output in 
surefire-reports that do Configuration/XML file comparisons and report which 
keys exist in one and not the other.

 Find missing config options in YarnConfiguration and yarn-default.xml
 -

 Key: YARN-2284
 URL: https://issues.apache.org/jira/browse/YARN-2284
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.4.1
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Attachments: YARN2284-01.patch, YARN2284-02.patch


 YarnConfiguration has one set of properties.  yarn-default.xml has another 
 set of properties.  Ideally, there should be an automatic way to find missing 
 properties in either location.
 This is analogous to MAPREDUCE-5130, but for yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store


[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068217#comment-14068217
 ] 

Hadoop QA commented on YARN-2033:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656819/YARN-2033_ALL.2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4378//console

This message is automatically generated.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.Prototype.patch, 
 YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml