date:20150309


[ 
https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352835#comment-14352835
 ] 

Hudson commented on YARN-3296:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #127 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/127/])
YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable 
resource monitoring. Contributed by Hitesh Shah (junping_du: rev 
7ce3c7635392c32f0504191ddd8417fb20509caa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java


 yarn.nodemanager.container-monitor.process-tree.class is configurable but 
 ResourceCalculatorProcessTree class is marked Private
 ---

 Key: YARN-3296
 URL: https://issues.apache.org/jira/browse/YARN-3296
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Fix For: 2.7.0

 Attachments: YARN-3296.1.patch, YARN-3296.2.patch


 Given that someone can implement their custom plugin for resource monitoring 
 and configure the NM to use it, this class should be marked public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-03-09 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352944#comment-14352944
 ] 

Junping Du commented on YARN-3225:
--

Thanks [~devaraj.k] for delivering the patch which is the first one in graceful 
decommission effort!
A couple of comments:
In RefreshNodesRequestPBImpl.java, 
{code}
   @Override
+  public long getTimeout() {
+return getProto().getTimeout();
+  }
+
+  @Override
+  public void setTimeout(long timeout) {
+builder.setTimeout(timeout);
+  }
{code}
The setTimeout() has problem because we didn't set viaProto to false, so if we 
getTimeout() afterwards then it will return the old value from old proto. 
Suggest to add a method of maybeInitBuilder() just like other PBImpls, also add 
a unit test to verify the PBImpl works as expected.

In NodeState.java,
{code}
DECOMMISSION_IN_PROGRESS
{code}
[~jlowe] suggested in umbrella JIRA that it is better to be DECOMMISSIONING. I 
had the same feeling so reflect the name in latest proposal. Do you think we 
should incorporate that comments here?

In RMAdminCLI.java,
{code}
+  .put(-refreshNodes, new UsageInfo([-g [timeout in ms]],
   Refresh the hosts information at the ResourceManager.))
{code}
I think we should add more info to description message - Refresh the hosts 
information at the ResourceManager. to explain -g option doing. Isn't it? 
Also, per my suggestion above, it is better to specify seconds in timeout. MS 
is more precisely, but get more chance for wrong (manually) operation.

Also, it is better to change the patch name to be consist with JIRA number 
(YARN-3225).

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private


[ 
https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352857#comment-14352857
 ] 

Hudson commented on YARN-3296:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #861 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/861/])
YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable 
resource monitoring. Contributed by Hitesh Shah (junping_du: rev 
7ce3c7635392c32f0504191ddd8417fb20509caa)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java
* hadoop-yarn-project/CHANGES.txt


 yarn.nodemanager.container-monitor.process-tree.class is configurable but 
 ResourceCalculatorProcessTree class is marked Private
 ---

 Key: YARN-3296
 URL: https://issues.apache.org/jira/browse/YARN-3296
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Fix For: 2.7.0

 Attachments: YARN-3296.1.patch, YARN-3296.2.patch


 Given that someone can implement their custom plugin for resource monitoring 
 and configure the NM to use it, this class should be marked public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-03-09 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3225:

Attachment: YARN-3225.patch

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2015-03-09 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352962#comment-14352962
 ] 

Junping Du commented on YARN-3304:
--

Agree that negative value sounds very odd. However, if we are really failed to 
get cpu usage info, does value of 0 confuse user that metrics works fine and 
the cpu usage is very low here?

 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Karthik Kambatla
Priority: Blocker

 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private


[ 
https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352863#comment-14352863
 ] 

Hudson commented on YARN-3296:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2059 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2059/])
YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable 
resource monitoring. Contributed by Hitesh Shah (junping_du: rev 
7ce3c7635392c32f0504191ddd8417fb20509caa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java


 yarn.nodemanager.container-monitor.process-tree.class is configurable but 
 ResourceCalculatorProcessTree class is marked Private
 ---

 Key: YARN-3296
 URL: https://issues.apache.org/jira/browse/YARN-3296
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Fix For: 2.7.0

 Attachments: YARN-3296.1.patch, YARN-3296.2.patch


 Given that someone can implement their custom plugin for resource monitoring 
 and configure the NM to use it, this class should be marked public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private


[ 
https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353071#comment-14353071
 ] 

Hudson commented on YARN-3296:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #118 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/118/])
YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable 
resource monitoring. Contributed by Hitesh Shah (junping_du: rev 
7ce3c7635392c32f0504191ddd8417fb20509caa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java


 yarn.nodemanager.container-monitor.process-tree.class is configurable but 
 ResourceCalculatorProcessTree class is marked Private
 ---

 Key: YARN-3296
 URL: https://issues.apache.org/jira/browse/YARN-3296
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Fix For: 2.7.0

 Attachments: YARN-3296.1.patch, YARN-3296.2.patch


 Given that someone can implement their custom plugin for resource monitoring 
 and configure the NM to use it, this class should be marked public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


 [ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2495:

Attachment: YARN-2495.20150309-1.patch

Hi [~wangda], 
Attaching the updated patch and find the status of the comments :
1,2,3,4 : all have been rectified, method name having set in the begin and 
the end was not sounding appropriate, hence modified it to 
areNodeLabelsSetInReq and setAreNodeLabelsSetInReq  in both heartbeat and 
Register

5) ??I think we may not need to check centralized/distributed configuration 
here, centralized/distributed is a config in RM side.??
   Earlier had moved this configuration type check in the NM.serviceInit (l) 
before calling {{getNodeLabelsProviderService}} But now changed the method name 
to {{createNodeLabelsProviderService}} and moved back the check inside this 
method itself. As part of the YARN-2729 will be returning Script based node 
label provider in the createNodeLabelsProviderService method.

??In NM side, it should be how to get node labels, if user doesn't configure 
any script file for it, it should be null and no instance of 
NodeLabelProviderService will be added to NM.??
In current patch, null will be set only if configuration type is set as 
centralized in NM and based on earlier(other jira) feedback from Vinod, i think 
we need to fail fast and let the user know the error at the earliest, so script 
node label provider will throw exception on erroneous conditions like script 
not configured,no rights to execute etc.. and ensure NM will fail to start. 

??So back to code, you can just leave getNodeLabelsProviderService(..), which 
will be implemented in YARN-2729.If you agree, we need change the name 
isDistributedNodeLabelsConf to??
Actually dint get the intent of these 2 lines and felt like comment was not 
complete... Is it you want to avoid check of configuration type in NM and move 
it script node label provider or something ?

6) has been rectified, was added while analyzing test case failure.

7) ??isDistributedNodeLabels seems not so necessary here, and if you agree with 
5), it's better to remove the field??
IIUC point5 was related to NM initializing the provider and point7 is related 
to NodeStatusUpdaterImpl if so i dint get the relation.  can you please clarify 
these 2 points 

8) ??Add null check or comment (provider returned node labels will always be 
not-null, for areNodeLabelsUpdated in NodeStatusUpdaterImpl??
Before calling areNodeLabelsUpdated, i had already checked for null and set 
empty labels @ line 626 (startStatusUpdater method) 

9)??Since we already have TestNodeStatusUpdater, it's better to merge 
TestNodeStatusUpdaterForLabels to it.??
Well there was already too many internal classes extending 
NodeStatusUpdaterImpl and ResourceTrackerService. And personally felt very very 
difficult to walk through the test case and try to reuse it and class had 
already crossed 1666 lines of code and hence as it was loosing readability 
added a new class. Please inform if required will merge it to the existing 
class only

10) Have modified ResourceTrackerService based on ur comments and have pushed 
some common code in register and Heartbeat to the common method.

All findbugs issues are not related to my modifications and following test case 
failure is not related to my modification
TestRMRestart.testRMRestartGetApplicationList.

Also will be uploading a patch for 2729 to get the view of complete flow and 
also should will be testable

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup


 [ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2729:

Attachment: YARN-2729.20150309-1.patch

Hi [~wangda]
Rebasing the patch 
Removing dependency on yarn-2923, 
Changed configuration names to suit current conf suggestions
Have made it to fail fast on invalid configurations.

If the above modifications are fine then will start looking into changes 
required to make HadoopCommon's NodeHealthScriptRunner to make it reuseable. 
(In seperate jira)

 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup
 ---

 Key: YARN-2729
 URL: https://issues.apache.org/jira/browse/YARN-2729
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
 YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
 YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch


 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353053#comment-14353053
 ] 

Nathan Roberts commented on YARN-3298:
--

Thanks [~leftnoteasy] for the additional detail. Maybe I should just wait for 
the patch, but here's the case I'm worried about.

queue.used is just under queue.capacity, so current-capacity = queue.capacity.
two users in the queue, both have same used resources

user-limit will be slightly less than (queue-capacity/2). (so user-limit can be 
extremely close to user.usage)

user.usage + required might now be slightly greater than user-limit. If that 
happens, it seems like we'll be unable to cross the capacity threshold. Once 
above capacity, I think it will work, but crossing that threshold might be hard.

Seems like current-capacity should be calculated as:
{code}
current-capacity = max(queue.used,queue.capacity)+now-required;
{code}




 User-limit should be enforced in CapacityScheduler
 --

 Key: YARN-3298
 URL: https://issues.apache.org/jira/browse/YARN-3298
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, yarn
Reporter: Wangda Tan
Assignee: Wangda Tan

 User-limit is not treat as a hard-limit for now, it will not consider 
 required-resource (resource of being-allocated resource request). And also, 
 when user's used resource equals to user-limit, it will still continue. This 
 will generate jitter issues when we have YARN-2069 (preemption policy kills a 
 container under an user, and scheduler allocate a container under the same 
 user soon after).
 The expected behavior should be as same as queue's capacity:
 Only when user.usage + required = user-limit (1), queue will continue to 
 allocate container.
 (1), user-limit mentioned here is determined by following computing
 {code}
 current-capacity = queue.used + now-required (when queue.used  
 queue.capacity)
queue.capacity (when queue.used  queue.capacity)
 user-limit = min(max(current-capacity / #active-users, current-capacity * 
 user-limit / 100), queue-capacity * user-limit-factor)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private


[ 
https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353127#comment-14353127
 ] 

Hudson commented on YARN-3296:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2077 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2077/])
YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable 
resource monitoring. Contributed by Hitesh Shah (junping_du: rev 
7ce3c7635392c32f0504191ddd8417fb20509caa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java


 yarn.nodemanager.container-monitor.process-tree.class is configurable but 
 ResourceCalculatorProcessTree class is marked Private
 ---

 Key: YARN-3296
 URL: https://issues.apache.org/jira/browse/YARN-3296
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Fix For: 2.7.0

 Attachments: YARN-3296.1.patch, YARN-3296.2.patch


 Given that someone can implement their custom plugin for resource monitoring 
 and configure the NM to use it, this class should be marked public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private


[ 
https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353095#comment-14353095
 ] 

Hudson commented on YARN-3296:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #127 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/127/])
YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable 
resource monitoring. Contributed by Hitesh Shah (junping_du: rev 
7ce3c7635392c32f0504191ddd8417fb20509caa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java


 yarn.nodemanager.container-monitor.process-tree.class is configurable but 
 ResourceCalculatorProcessTree class is marked Private
 ---

 Key: YARN-3296
 URL: https://issues.apache.org/jira/browse/YARN-3296
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Fix For: 2.7.0

 Attachments: YARN-3296.1.patch, YARN-3296.2.patch


 Given that someone can implement their custom plugin for resource monitoring 
 and configure the NM to use it, this class should be marked public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-09 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3136:
--
Attachment: 0006-YARN-3136.patch

 getTransferredContainers can be a bottleneck during AM registration
 ---

 Key: YARN-3136
 URL: https://issues.apache.org/jira/browse/YARN-3136
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Sunil G
 Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 
 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 
 0006-YARN-3136.patch


 While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
 stuck waiting for the scheduler lock trying to call getTransferredContainers. 
  The scheduler lock is highly contended, especially on a large cluster with 
 many nodes heartbeating, and it would be nice if we could find a way to 
 eliminate the need to grab this lock during this call.  We've already done 
 similar work during AM allocate calls to make sure they don't needlessly grab 
 the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-03-09 Thread Sunil G (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353143#comment-14353143
]

Sunil G commented on YARN-3136:
---

bq.createReleaseCache schedules a timer task that
Sorry. I also missed that.

Agreeing to make 'applications' as a concurrent map. As its private and
unstable, its fine to make its concurrent. But any schedulers which uses this
will have to make change. Do we need to document that?
Also attaching a patch for same.

getTransferredContainers can be a bottleneck during AM registration
---

Key: YARN-3136
URL: https://issues.apache.org/jira/browse/YARN-3136
Project: Hadoop YARN
Issue Type: Sub-task
Components: scheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Sunil G
Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch,
0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch

While examining RM stack traces on a busy cluster I noticed a pattern of AMs
stuck waiting for the scheduler lock trying to call getTransferredContainers.
The scheduler lock is highly contended, especially on a large cluster with
many nodes heartbeating, and it would be nice if we could find a way to
eliminate the need to grab this lock during this call. We've already done
similar work during AM allocate calls to make sure they don't needlessly grab
the scheduler lock, and it would be good to do so here as well, if possible.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353241#comment-14353241
 ] 

Naganarasimha G R commented on YARN-2495:
-

Most of the find bugs reported are from Fair scheduler and nothing to do with 
changes in the patch and tests failed are due to timeout and those are also not 
related to the patch.


 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN

Vinod Kumar Vavilapalli created YARN-3306:
-

 Summary: [Umbrella] Proposing per-queue Policy driven scheduling 
in YARN
 Key: YARN-3306
 URL: https://issues.apache.org/jira/browse/YARN-3306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Scheduling layout in Apache Hadoop YARN today is very coarse grained. This 
proposal aims at converting today’s rigid scheduling in YARN to a per-queue 
policy driven architecture.

We propose the creation of a common policy framework and implement acommon 
set of policies that administrators can pick and chose per queue
 - Make scheduling policies configurable per queue
 - Initially, we limit ourselves to a new type of scheduling policy that 
determines the ordering of applications within the leaf queue
 - In the near future, we will also pursue parent queue level policies and 
potential algorithm reuse through a separate type of policies that control 
resource limits per queue, user, application etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.


[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353277#comment-14353277
 ] 

Zhijie Shen commented on YARN-3287:
---

Sure, I'll take a look again.

 TimelineClient kerberos authentication failure uses wrong login context.
 

 Key: YARN-3287
 URL: https://issues.apache.org/jira/browse/YARN-3287
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Daryn Sharp
 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, timeline.patch


 TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
 failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.

2015-03-09 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353259#comment-14353259
 ] 

Jonathan Eagles commented on YARN-3287:
---

[~zjshen], can you have another looks now that I have up-merged and added 
sufficient tests to test this change?

 TimelineClient kerberos authentication failure uses wrong login context.
 

 Key: YARN-3287
 URL: https://issues.apache.org/jira/browse/YARN-3287
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Daryn Sharp
 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, timeline.patch


 TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
 failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353217#comment-14353217
 ] 

Hadoop QA commented on YARN-2495:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12703435/YARN-2495.20150309-1.patch
  against trunk revision 5578e22.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6894//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6894//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6894//console

This message is automatically generated.

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN


 [ 
https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3306:
--
Attachment: PerQueuePolicydrivenschedulinginYARN.pdf

Here's a detailed proposal doc.

It's light on details on the leaf-queue policy interface - will do so in one of 
the sub-tasks.

[~cwelch] is helping with most of the implementation, Tx Craig.

 [Umbrella] Proposing per-queue Policy driven scheduling in YARN
 ---

 Key: YARN-3306
 URL: https://issues.apache.org/jira/browse/YARN-3306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: PerQueuePolicydrivenschedulinginYARN.pdf


 Scheduling layout in Apache Hadoop YARN today is very coarse grained. This 
 proposal aims at converting today’s rigid scheduling in YARN to a per-queue 
 policy driven architecture.
 We propose the creation of a common policy framework and implement acommon 
 set of policies that administrators can pick and chose per queue
  - Make scheduling policies configurable per queue
  - Initially, we limit ourselves to a new type of scheduling policy that 
 determines the ordering of applications within the leaf queue
  - In the near future, we will also pursue parent queue level policies and 
 potential algorithm reuse through a separate type of policies that control 
 resource limits per queue, user, application etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2015-03-09 Thread Anubhav Dhoot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353272#comment-14353272
 ] 

Anubhav Dhoot commented on YARN-3304:
-

The intention of setting -1 as was for this issue (distinguishing unavailable 
vs actually zero). 
Ideally we should prevent adding the metrics to collection until they are 
available. One possibility is doing it at ContainerMetrics#recordCpuUsage.
Suggest investigating if this ideal case is achievable, and if not i am fine 
with making these 0 to be consistent.

 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Karthik Kambatla
Priority: Blocker

 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

[
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353232#comment-14353232
]

Hadoop QA commented on YARN-3136:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12703438/0006-YARN-3136.patch
against trunk revision 5578e22.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 7 new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRM

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/6895//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/6895//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6895//console

This message is automatically generated.

getTransferredContainers can be a bottleneck during AM registration
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.


[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353597#comment-14353597
 ] 

Hudson commented on YARN-3287:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7291 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7291/])
YARN-3287. Made TimelineClient put methods do as the correct login context. 
Contributed by Daryn Sharp and Jonathan Eagles. (zjshen: rev 
d6e05c5ee26feefc17267b7c9db1e2a3dbdef117)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineAuthenticationFilter.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java


 TimelineClient kerberos authentication failure uses wrong login context.
 

 Key: YARN-3287
 URL: https://issues.apache.org/jira/browse/YARN-3287
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Daryn Sharp
 Fix For: 2.7.0

 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, 
 timeline.patch


 TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
 failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3315) Fix -list-blacklisted-trackers to print the blacklisted NMs


 [ 
https://issues.apache.org/jira/browse/YARN-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-3305 to YARN-3315:
---

  Component/s: (was: mrv2)
Affects Version/s: (was: 0.23.0)
  Key: YARN-3315  (was: MAPREDUCE-3305)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Fix -list-blacklisted-trackers to print the blacklisted NMs
 ---

 Key: YARN-3315
 URL: https://issues.apache.org/jira/browse/YARN-3315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil

 bin/mapred job -list-blacklisted-trackers currently prints 
 getBlacklistedTrackers - Not implemented yet This is a long pending issue. 
 Could not find a tracking ticket, hence opening one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3311) add location to web UI so you know where you are - cluster, node, AM, job history


 [ 
https://issues.apache.org/jira/browse/YARN-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-3074 to YARN-3311:
---

  Component/s: (was: mrv2)
Affects Version/s: (was: 3.0.0)
   (was: 0.23.0)
   3.0.0
  Key: YARN-3311  (was: MAPREDUCE-3074)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 add location to web UI so you know where you are - cluster, node, AM, job 
 history
 -

 Key: YARN-3311
 URL: https://issues.apache.org/jira/browse/YARN-3311
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Thomas Graves

 Right now if you go to any of the web UIs for resource manager, node manager, 
 app master, or job history, they look very similar but sometimes it hard to 
 tell which page you are.  Adding a title or something that lets you know 
 would be helpful.   Or somehow make them more seemless so one doesn't have to 
 know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.


[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353704#comment-14353704
 ] 

Jian He commented on YARN-3243:
---

thanks Wangda !
- ParentQueue#canAssignToThisQueue, 
{code}
if (totalUsedCapacityRatio = maxAvailCapacity) {   canAssign = false;   break; 
}
{code}
instead of comparing with ratio, I think it might be simpler to compare 
resource value

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3243.1.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353353#comment-14353353
 ] 

Wangda Tan commented on YARN-2495:
--

For your comments:
1) For the name, do you think is setAreNodeLabelsUpdated a better name since it 
avoids set occured twice :) (I understand this needs lots of refactorings, if 
you have any suggestions, we can finalize it before renaming.

5) I made a mistake that sent an incompleted comment :-p, what I wanted to say 
is:
It will be problematic to ask admins make NM/RM configuration keep 
synchronized, so I don't want (and also not necessary) NM depends on RM's 
configuration.
So I suggest to make a changes:
- In NodeManager.java: when user doesn't configure provider, it should be null. 
In your patch, you can return a null directly, and YARN-2729 will implement the 
logic of instancing provider from config.
- In NodeStatusUpdaterImpl: avoid using {{isDistributedNodeLabelsConf}}, since 
we will not have distributedNodeLabelConf in NM side if you agree on 
previously comment, instead, it will check null of provider.

Regarding your fail-fast concern, it shouldn't be a problem if you agree on 
comment I just made. (I know there could be some back-and-forth comment from my 
side on this, I feel sorry about this since this feature is evolving itself, 
please just feel free to let me know your ideas.).

7) I should address your question on 5).

8) You can add an additional comments in line 626 for this.

9) Took a look at TestNodeStatusUpdater, your comment make sense to me, it's a 
very complex class, you can just leave this comment alone.

10) Few comments for your added code:
- updateNodeLabelsInNodeLabelsManager - updateNodeLabelsFromNMReport
- {{LOG.info(... accepted from RM}}, use LOG.debug and check {{isDebugEnabled}}.
- Make errorMessage clear: indicate 1# this is node labels reported from NM, 
and 2# it's failed to be put to RM instead of not properly configured.

In addition:
Another thing we should do is, when distributed node label configuration is 
set, any direct modify node to labels mapping from RMAdminCLI should be 
rejected (like -replaceNodeToLabels). This can be done in a separated JIRA.

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353553#comment-14353553
 ] 

Wangda Tan commented on YARN-3298:
--

Hi [~nroberts],
If I understand what you meant correctly, maybe we can just relax when 
user.used  user.limit (instead of user.used + now_required = user.limit), 
which can solve the problem you mentioned.

 User-limit should be enforced in CapacityScheduler
 --

 Key: YARN-3298
 URL: https://issues.apache.org/jira/browse/YARN-3298
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, yarn
Reporter: Wangda Tan
Assignee: Wangda Tan

 User-limit is not treat as a hard-limit for now, it will not consider 
 required-resource (resource of being-allocated resource request). And also, 
 when user's used resource equals to user-limit, it will still continue. This 
 will generate jitter issues when we have YARN-2069 (preemption policy kills a 
 container under an user, and scheduler allocate a container under the same 
 user soon after).
 The expected behavior should be as same as queue's capacity:
 Only when user.usage + required = user-limit (1), queue will continue to 
 allocate container.
 (1), user-limit mentioned here is determined by following computing
 {code}
 current-capacity = queue.used + now-required (when queue.used  
 queue.capacity)
queue.capacity (when queue.used  queue.capacity)
 user-limit = min(max(current-capacity / #active-users, current-capacity * 
 user-limit / 100), queue-capacity * user-limit-factor)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress


[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353558#comment-14353558
 ] 

Zhijie Shen commented on YARN-1884:
---

[~xgong], thanks for the patch. Here're some comments:

1. No need to change application_history_server.proto, 
ApplicationHistoryManagerImpl.java, FileSystemApplicationHistoryStore.java, 
MemoryApplicationHistoryStore.java, ContainerFinishData.java, 
ContainerHistoryData.java, ContainerStartData.java, 
ContainerFinishDataPBImpl.java, ContainerStartDataPBImpl.java, 
ApplicationHistoryStoreTestUtils.java, 
TestFileSystemApplicationHistoryStore.java, 
TestMemoryApplicationHistoryStore.java, RMApplicationHistoryWriter.java, 
TestRMApplicationHistoryWriter.java. It's the deprecated code.

2. Why do we need conf here to compute http or https? getNodeHttpAddress() 
doesn't come with the prefix? If so, we need to fix it in other block, CLI 
and webservice too for consistency. For example, when generating the report, we 
should already append the http prefix.
{code}
114 container.getNodeHttpAddress() == null ? # : WebAppUtils
115   .getHttpSchemePrefix(conf) + container.getNodeHttpAddress(),
{code}

3. Is it possible if getContainer() returns null? If so, it will result in NPE. 
Another way is to make getNodeHttpAddress as the method of RMContainer. See how 
we do it for getContainerExitStatus and so on.
{code}
  createdTime, container.getContainer().getNodeHttpAddress()));
{code}

 ContainerReport should have nodeHttpAddress
 ---

 Key: YARN-1884
 URL: https://issues.apache.org/jira/browse/YARN-1884
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Attachments: YARN-1884.1.patch


 In web UI, we're going to show the node, which used to be to link to the NM 
 web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
 field has to be set to nodeID where the container is allocated. We need to 
 add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3308) Improvements to CapacityScheduler documentation


 [ 
https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-3308:
---
Attachment: YARN-3308-02.patch

02:
* rebased for trunk
* took in arun's comments

 Improvements to CapacityScheduler documentation
 ---

 Key: YARN-3308
 URL: https://issues.apache.org/jira/browse/YARN-3308
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.0
Reporter: Yoram Arnon
Priority: Minor
  Labels: documentation
 Attachments: MAPREDUCE-3658, MAPREDUCE-3658, YARN-3308-02.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 There are some typos and some cases of incorrect English.
 Also, the descriptions of yarn.scheduler.capacity.queue-path.capacity, 
 yarn.scheduler.capacity.queue-path.maximum-capacity, 
 yarn.scheduler.capacity.queue-path.user-limit-factor, 
 yarn.scheduler.capacity.maximum-applications are not very clear to the 
 uninitiated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3316) Make the ResourceManager, NodeManager and HistoryServer run from Eclipse.


 [ 
https://issues.apache.org/jira/browse/YARN-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-3316:
---
Component/s: resourcemanager
 nodemanager

 Make the ResourceManager, NodeManager and HistoryServer run from Eclipse.
 -

 Key: YARN-3316
 URL: https://issues.apache.org/jira/browse/YARN-3316
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 3.0.0
Reporter: praveen sripati
Priority: Minor

 Make the ResourceManager, NodeManager and HistoryServer run from Eclipse, so 
 that it would be easy for development.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.


[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353574#comment-14353574
 ] 

Zhijie Shen commented on YARN-3287:
---

+1 for the last patch. Will commit it.

 TimelineClient kerberos authentication failure uses wrong login context.
 

 Key: YARN-3287
 URL: https://issues.apache.org/jira/browse/YARN-3287
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Daryn Sharp
 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, 
 timeline.patch


 TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
 failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.


[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353560#comment-14353560
 ] 

Hadoop QA commented on YARN-3287:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703485/YARN-3287.3.patch
  against trunk revision 3241fc2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6896//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6896//console

This message is automatically generated.

 TimelineClient kerberos authentication failure uses wrong login context.
 

 Key: YARN-3287
 URL: https://issues.apache.org/jira/browse/YARN-3287
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Daryn Sharp
 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, 
 timeline.patch


 TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
 failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3310) MR-279: Log info about the location of dist cache


 [ 
https://issues.apache.org/jira/browse/YARN-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-2758 to YARN-3310:
---

  Component/s: (was: mrv2)
Affects Version/s: (was: 0.23.0)
   Issue Type: Improvement  (was: Bug)
  Key: YARN-3310  (was: MAPREDUCE-2758)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 MR-279: Log info about the location of dist cache
 -

 Key: YARN-3310
 URL: https://issues.apache.org/jira/browse/YARN-3310
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ramya Sunil
Assignee: Siddharth Seth
Priority: Minor

 Currently, there is no log info available about the actual location of the 
 file/archive in dist cache being used by the task except for the ln command 
 in task.sh. We need to log this information to help in debugging esp in those 
 cases where there are more than one archive with the same name. 
 In 0.20.x, in task logs, one could find log info such as the following:
 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: distcache 
 location/archive - mapred.local.dir/archive 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3313) Write additional tests for data locality in MRv2.


 [ 
https://issues.apache.org/jira/browse/YARN-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-3093 to YARN-3313:
---

  Component/s: (was: mrv2)
   (was: test)
   test
 Assignee: (was: Mahadev konar)
Affects Version/s: (was: 0.23.0)
   3.0.0
  Key: YARN-3313  (was: MAPREDUCE-3093)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Write additional tests for data locality in MRv2.
 -

 Key: YARN-3313
 URL: https://issues.apache.org/jira/browse/YARN-3313
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Mahadev konar

 We should add tests to make sure data locality is in place in MRv2 (with 
 respect to the capacity scheduler and also the matching/ask of containers in 
 the MR AM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3312) Web UI menu inconsistencies


 [ 
https://issues.apache.org/jira/browse/YARN-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-3075 to YARN-3312:
---

  Component/s: (was: mrv2)
Affects Version/s: (was: 0.23.0)
   3.0.0
  Key: YARN-3312  (was: MAPREDUCE-3075)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Web UI menu inconsistencies
 ---

 Key: YARN-3312
 URL: https://issues.apache.org/jira/browse/YARN-3312
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Thomas Graves

 When you go to the various web UI's the menus on the left are inconsistent 
 and (atleast to me) sometimes confusing.   For instance if you go to the 
 application master UI, one of the menus is Cluster. If you click on one of 
 the Cluster links it takes you back to the RM ui and you lose the app master 
 UI altogether. Maybe its just me but that is confusing.  I like having a link 
 back to the cluster from AM but the way the UI is setup I would have expected 
 it to just open that page in the middle div/frame and leave the AM menus 
 there.  Perhaps a different type of link or menu to indicate this is going to 
 take you away from AM page.
 Also, the nodes and job history UI don't have the Cluster menus at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3309) Capacity scheduler can wait a very long time for node locality

Nathan Roberts created YARN-3309:


 Summary: Capacity scheduler can wait a very long time for node 
locality
 Key: YARN-3309
 URL: https://issues.apache.org/jira/browse/YARN-3309
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Nathan Roberts


The capacity scheduler will delay scheduling a container on a rack-local node 
in hopes that a node-local opportunity will come along (YARN-80). It does this 
by counting the number of missed scheduling opportunities the application has 
had. When the count reaches a certain threshold, the app will accept the 
rack-local node. The documented recommendation is to set this threshold to the 
#nodes in the cluster.

However, there are some early-out optimizations that can lead to this delay 
being a very long time. 
Example in allocateContainersToNode():
{code}
   // Try to schedule more if there are no reservations to fulfill
if (node.getReservedContainer() == null) {
  if (calculator.computeAvailableContainers(node.getAvailableResource(),
minimumAllocation)  0) {
if (LOG.isDebugEnabled()) {
  LOG.debug(Trying to schedule on node:  + node.getNodeName() +
  , available:  + node.getAvailableResource());
}
root.assignContainers(clusterResource, node, false);
  }
{code}

So, in a large cluster that is completely full (AvailableResource on each node 
is 0), SchedulingOpportunities will only increase at the rate of container 
completion rate, not the heartbeat rate, which I think was the original 
assumption of YARN-80. On a large cluster, this can lead to an hour+ of skipped 
scheduling opportunities meaning the fifo'ness of a queue is ignored for a very 
long time.

Maybe there should be a time-based limit on this delay as well as a count of  
missed-scheduling opportunities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353639#comment-14353639
 ] 

Vinod Kumar Vavilapalli commented on YARN-2495:
---

Ah, right. Forgot about that. Given that, it seems that we have the following
 # Node reports with invalid labels during registration - we reject it rightaway
 # Node gets successfully registered, but then the labels script starts 
generating invalid labels mid way through

I think in case (2), we are better off ignoring the newly reported invalid 
labels, report this in the UI/NodeReport and let the node continue running. 
Thoughts?

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3314) Write an integration test for validating MR AM restart and recovery


 [ 
https://issues.apache.org/jira/browse/YARN-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-3245 to YARN-3314:
---

  Component/s: (was: mrv2)
   (was: test)
   test
Affects Version/s: (was: 0.23.0)
  Key: YARN-3314  (was: MAPREDUCE-3245)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Write an integration test for validating MR AM restart and recovery
 ---

 Key: YARN-3314
 URL: https://issues.apache.org/jira/browse/YARN-3314
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Reporter: Vinod Kumar Vavilapalli

 This, so that we can catch bugs like MAPREDUCE-3233.
 We need one with recovery disabled i.e. for only restart and one for 
 restart+recovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353677#comment-14353677
 ] 

Nathan Roberts commented on YARN-1963:
--

{quote}
Without some sort of labels, it will be very hard for users to reason about the 
definition and relative importance of priorities across queues and cluster. We 
must support the notion of priority-labels to make this feature usable in 
practice.
{quote}

Maybe I'm missing something... Isn't it relatively easy to reason about 24 and 
therefore 2 is lower priority than 4? Unix/Linux hasn't had labels for 
priorities and it seems to be working pretty well there. Even if I have labels, 
I have to make sure that all queues and clusters define them precisely the same 
way or I wind up just as confused, if not even more. Just my $0.02

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: 0001-YARN-1963-prototype.patch, YARN Application 
 Priorities Design.pdf, YARN Application Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3316) Make the ResourceManager, NodeManager and HistoryServer run from Eclipse.


 [ 
https://issues.apache.org/jira/browse/YARN-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-2798 to YARN-3316:
---

  Component/s: (was: mrv2)
Affects Version/s: (was: 0.23.0)
   3.0.0
  Key: YARN-3316  (was: MAPREDUCE-2798)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Make the ResourceManager, NodeManager and HistoryServer run from Eclipse.
 -

 Key: YARN-3316
 URL: https://issues.apache.org/jira/browse/YARN-3316
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: praveen sripati
Priority: Minor

 Make the ResourceManager, NodeManager and HistoryServer run from Eclipse, so 
 that it would be easy for development.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353397#comment-14353397
 ] 

Craig Welch commented on YARN-2495:
---

-re

bq. How about we simply things? Instead of accepting labels on both 
registration and heartbeat, why not restrict it to be just during registration?

As I understand the requirements, it's necessary to handle the case where the 
derived set of labels changes during the lifetime of the nodemanager, e.g. 
externally libraries might be installed or some other condition may change 
which effects the labels  no nodemanager re-registration is involved, and yet 
the changed labels need to be reflected

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353383#comment-14353383
 ] 

Vinod Kumar Vavilapalli commented on YARN-2495:
---

Quick comments
 - configuration.type - configuration-type
 - Should RegisterNodeManagerRequestProto.nodeLabels be a set instead?
 - Do we really need NodeHeartbeatRequest.areNodeLabelsSetInReq()? Why not just 
look at the set as mentioned in the previous comment?
 - RegisterNodeManagerRequest is getting changed. It will be interesting to 
reason about rolling-upgrades in this scenario.
 - How about we simply things? Instead of accepting labels on both registration 
and heartbeat, why not restrict it to be just during registration?
 - We should not even accept a node's registration when it reports invalid 
labels?

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.


[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353418#comment-14353418
 ] 

Zhijie Shen commented on YARN-3287:
---

I double checked the oozie use case. It seems that for each individual job, 
oozie server will create a separate client to start the MR job. The change 
should be safe then.

Thanks for the patch, Jon! It's almost fine to me. Just one nit.

1. In private ClientResponse doPosting(Object obj, String path), doAs op will 
throw UndeclaredThrowableException, shall we capture and unwrap it as before.
{code}
332 } catch (InterruptedException ie) {
333   throw new IOException(ie);
314 }
{code}

 TimelineClient kerberos authentication failure uses wrong login context.
 

 Key: YARN-3287
 URL: https://issues.apache.org/jira/browse/YARN-3287
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Daryn Sharp
 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, timeline.patch


 TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
 failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.

2015-03-09 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3287:
--
Attachment: YARN-3287.3.patch

[~zjshen], trying to unwrap as before. Let me know if this is what you are 
intending.

 TimelineClient kerberos authentication failure uses wrong login context.
 

 Key: YARN-3287
 URL: https://issues.apache.org/jira/browse/YARN-3287
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Daryn Sharp
 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, 
 timeline.patch


 TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
 failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3307) Master-Worker Application on YARN


 [ 
https://issues.apache.org/jira/browse/YARN-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-3315 to YARN-3307:
---

Affects Version/s: (was: 3.0.0)
   3.0.0
  Key: YARN-3307  (was: MAPREDUCE-3315)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Master-Worker Application on YARN
 -

 Key: YARN-3307
 URL: https://issues.apache.org/jira/browse/YARN-3307
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Sharad Agarwal
Assignee: Sharad Agarwal
 Attachments: MAPREDUCE-3315-1.patch, MAPREDUCE-3315-2.patch, 
 MAPREDUCE-3315-3.patch, MAPREDUCE-3315.patch


 Currently master worker scenarios are forced fit into Map-Reduce. Now with 
 YARN, these can be first class and would benefit real/near realtime workloads 
 and be more effective in using the cluster resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3308) Improvements to CapacityScheduler documentation


 [ 
https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-3658 to YARN-3308:
---

  Component/s: (was: mrv2)
   documentation
 Assignee: (was: Yoram Arnon)
 Target Version/s:   (was: 2.0.0-alpha, 3.0.0)
Affects Version/s: (was: 0.23.0)
  Key: YARN-3308  (was: MAPREDUCE-3658)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Improvements to CapacityScheduler documentation
 ---

 Key: YARN-3308
 URL: https://issues.apache.org/jira/browse/YARN-3308
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Yoram Arnon
Priority: Minor
  Labels: documentation
 Attachments: MAPREDUCE-3658, MAPREDUCE-3658

   Original Estimate: 3h
  Remaining Estimate: 3h

 There are some typos and some cases of incorrect English.
 Also, the descriptions of yarn.scheduler.capacity.queue-path.capacity, 
 yarn.scheduler.capacity.queue-path.maximum-capacity, 
 yarn.scheduler.capacity.queue-path.user-limit-factor, 
 yarn.scheduler.capacity.maximum-applications are not very clear to the 
 uninitiated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3308) Improvements to CapacityScheduler documentation


 [ 
https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-3308:
---
Release Note:   (was: documentation change only)

 Improvements to CapacityScheduler documentation
 ---

 Key: YARN-3308
 URL: https://issues.apache.org/jira/browse/YARN-3308
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Yoram Arnon
Priority: Minor
  Labels: documentation
 Attachments: MAPREDUCE-3658, MAPREDUCE-3658

   Original Estimate: 3h
  Remaining Estimate: 3h

 There are some typos and some cases of incorrect English.
 Also, the descriptions of yarn.scheduler.capacity.queue-path.capacity, 
 yarn.scheduler.capacity.queue-path.maximum-capacity, 
 yarn.scheduler.capacity.queue-path.user-limit-factor, 
 yarn.scheduler.capacity.maximum-applications are not very clear to the 
 uninitiated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353390#comment-14353390
 ] 

Vinod Kumar Vavilapalli commented on YARN-1963:
---

{quote}
As per discussion happened in YARN-2896 with Eric Payne and Wangda Tan, there 
is proposal to use Integer alone as priority from client and as well as in 
server. As per design doc, a priority label was used as wrapper for user and 
internally server was using corresponding integer with same. We can continue 
discussion on this here in parent JIRA. Looping Vinod Kumar Vavilapalli.
Current idea:
yarn.prority-labels = low:2, medium:4, high:6
Proposed:
yarn.application.priority = 2, 3 , 4
{quote}
Without some sort of labels, it will be very hard for users to reason about the 
definition and relative importance of priorities across queues and cluster. We 
must support the notion of priority-labels to make this feature usable in 
practice.

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: 0001-YARN-1963-prototype.patch, YARN Application 
 Priorities Design.pdf, YARN Application Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3308) Improvements to CapacityScheduler documentation


 [ 
https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-3308:
---
Affects Version/s: 3.0.0

 Improvements to CapacityScheduler documentation
 ---

 Key: YARN-3308
 URL: https://issues.apache.org/jira/browse/YARN-3308
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.0
Reporter: Yoram Arnon
Priority: Minor
  Labels: documentation
 Attachments: MAPREDUCE-3658, MAPREDUCE-3658

   Original Estimate: 3h
  Remaining Estimate: 3h

 There are some typos and some cases of incorrect English.
 Also, the descriptions of yarn.scheduler.capacity.queue-path.capacity, 
 yarn.scheduler.capacity.queue-path.maximum-capacity, 
 yarn.scheduler.capacity.queue-path.user-limit-factor, 
 yarn.scheduler.capacity.maximum-applications are not very clear to the 
 uninitiated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-321) Generic application history service


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353825#comment-14353825
 ] 

Allen Wittenauer commented on YARN-321:
---

Looks like this should get closed out w/a fix ver of 2.4.0?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-09 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353882#comment-14353882
 ] 

Robert Kanter commented on YARN-2928:
-

I agree; we're using aggregator for too many things.  

For TimelineAggregator, IIRC, [~kasha] had suggested TimelineCollector at one 
point, and that sounded good.  TimelineReceiver also sounds fine.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
 v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3215) Respect labels in CapacityScheduler when computing headroom


 [ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3215:
-
Description: 
In existing CapacityScheduler, when computing headroom of an application, it 
will only consider non-labeled nodes of this application.

But it is possible the application is asking for labeled resources, so 
headroom-by-label (like 5G resource available under node-label=red) is required 
to get better resource allocation and avoid deadlocks such as MAPREDUCE-5928.

This JIRA could involve both API changes (such as adding a 
label-to-available-resource map in AllocateResponse) and also internal changes 
in CapacityScheduler.

 Respect labels in CapacityScheduler when computing headroom
 ---

 Key: YARN-3215
 URL: https://issues.apache.org/jira/browse/YARN-3215
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan

 In existing CapacityScheduler, when computing headroom of an application, it 
 will only consider non-labeled nodes of this application.
 But it is possible the application is asking for labeled resources, so 
 headroom-by-label (like 5G resource available under node-label=red) is 
 required to get better resource allocation and avoid deadlocks such as 
 MAPREDUCE-5928.
 This JIRA could involve both API changes (such as adding a 
 label-to-available-resource map in AllocateResponse) and also internal 
 changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom


[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353891#comment-14353891
 ] 

Wangda Tan commented on YARN-3215:
--

Yes, it works for no-labeled environment only, I added some details in 
description, please feel free to let me know your ideas.

Thanks,

 Respect labels in CapacityScheduler when computing headroom
 ---

 Key: YARN-3215
 URL: https://issues.apache.org/jira/browse/YARN-3215
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan

 In existing CapacityScheduler, when computing headroom of an application, it 
 will only consider non-labeled nodes of this application.
 But it is possible the application is asking for labeled resources, so 
 headroom-by-label (like 5G resource available under node-label=red) is 
 required to get better resource allocation and avoid deadlocks such as 
 MAPREDUCE-5928.
 This JIRA could involve both API changes (such as adding a 
 label-to-available-resource map in AllocateResponse) and also internal 
 changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom


[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353890#comment-14353890
 ] 

Wangda Tan commented on YARN-3215:
--

Yes, it works for no-labeled environment only, I added some details in 
description, please feel free to let me know your ideas.

Thanks,

 Respect labels in CapacityScheduler when computing headroom
 ---

 Key: YARN-3215
 URL: https://issues.apache.org/jira/browse/YARN-3215
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan

 In existing CapacityScheduler, when computing headroom of an application, it 
 will only consider non-labeled nodes of this application.
 But it is possible the application is asking for labeled resources, so 
 headroom-by-label (like 5G resource available under node-label=red) is 
 required to get better resource allocation and avoid deadlocks such as 
 MAPREDUCE-5928.
 This JIRA could involve both API changes (such as adding a 
 label-to-available-resource map in AllocateResponse) and also internal 
 changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

Craig Welch created YARN-3318:
-

 Summary: Create Initial OrderingPolicy Framework, integrate with 
CapacityScheduler LeafQueue supporting present behavior
 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch


Create the initial framework required for using OrderingPolicies with 
SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior


 [ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch reassigned YARN-3318:
-

Assignee: Craig Welch

 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch

 Create the initial framework required for using OrderingPolicies with 
 SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
 will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3319) Implement a Fair SchedulerOrderingPolicy

Craig Welch created YARN-3319:
-

 Summary: Implement a Fair SchedulerOrderingPolicy
 Key: YARN-3319
 URL: https://issues.apache.org/jira/browse/YARN-3319
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch


Implement a Fair SchedulerOrderingPolicy which prefers to allocate to 
SchedulerProcesses with least current usage, very similar to the 
FairScheduler's FairSharePolicy.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353787#comment-14353787
 ] 

Jian He commented on YARN-3273:
---

looks good, to distinguish scenarios like one user belongs to two queues, we 
probably need to add a separate queue tag too ? 
For the Active Users: field in CS queue page, it may also be useful to change 
that to be simply user names which links back to the user page with filtered 
user name. Just for implementation reference, the existing Node Labels page has 
some similar functionalities.
thanks again for taking on this, Rohith !  

 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3273-v1.patch, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.


[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353803#comment-14353803
 ] 

Zhijie Shen commented on YARN-3287:
---

Merge it into branch-2.7 too.

 TimelineClient kerberos authentication failure uses wrong login context.
 

 Key: YARN-3287
 URL: https://issues.apache.org/jira/browse/YARN-3287
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Daryn Sharp
 Fix For: 2.7.0

 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, 
 timeline.patch


 TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
 failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom


[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353800#comment-14353800
 ] 

Nathan Roberts commented on YARN-3215:
--

Hi [~leftnoteasy]. Can you provide a summary of what this is about? Basic 
testing seems to show this works at least to some degree. e.g. jobs running on 
nodes without labels don't appear to include labeled-nodes as part of headroom 
(as expected). 

 Respect labels in CapacityScheduler when computing headroom
 ---

 Key: YARN-3215
 URL: https://issues.apache.org/jira/browse/YARN-3215
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress

2015-03-09 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353864#comment-14353864
 ] 

Xuan Gong commented on YARN-1884:
-

The new patch addressed all the comments

 ContainerReport should have nodeHttpAddress
 ---

 Key: YARN-1884
 URL: https://issues.apache.org/jira/browse/YARN-1884
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Attachments: YARN-1884.1.patch, YARN-1884.2.patch


 In web UI, we're going to show the node, which used to be to link to the NM 
 web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
 field has to be set to nodeID where the container is allocated. We need to 
 add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior


 [ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3318:
--
Attachment: YARN-3318.13.patch


Initial, incomplete patch with the overall framework  implementation of the 
SchedulerComparatorPolicy and FifoComparator, major TODO includes integrating 
with capacity scheduler configuration.  Also includes a CompoundComparator for 
chaining comparator based policies where desired.

 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3318.13.patch


 Create the initial framework required for using OrderingPolicies with 
 SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
 will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.

2015-03-09 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353815#comment-14353815
 ] 

Jonathan Eagles commented on YARN-3287:
---

Thanks, [~zjshen]

 TimelineClient kerberos authentication failure uses wrong login context.
 

 Key: YARN-3287
 URL: https://issues.apache.org/jira/browse/YARN-3287
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Daryn Sharp
 Fix For: 2.7.0

 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, 
 timeline.patch


 TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
 failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3317) MR-279: Modularize web framework and webapps


 [ 
https://issues.apache.org/jira/browse/YARN-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-2435 to YARN-3317:
---

   Tags:   (was: mrv2, hamlet, module)
Component/s: (was: mrv2)
Key: YARN-3317  (was: MAPREDUCE-2435)
Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 MR-279: Modularize web framework and webapps
 

 Key: YARN-3317
 URL: https://issues.apache.org/jira/browse/YARN-3317
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Luke Lu

 The patch moves the web framework out of yarn-common into a separate module: 
 yarn-web.
 It also decouple webapps into separate modules/jars from their respective 
 server modules/jars to allow webapp updates independent of servers. Servers 
 use ServiceLoader to discover its webapp modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-09 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353853#comment-14353853
 ] 

Sangjin Lee commented on YARN-2928:
---

I suppose the ApplicationMaster events refer to the ones that are written by 
the distributed shell AM. Correct?

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
 v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior


[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353965#comment-14353965
 ] 

Craig Welch commented on YARN-3318:
---


The proposed initial implementation of the framework to support FIFO 
SchedulerApplicationAttempt ordering for the CapacityScheduler:

A SchedulerComparatorPolicy which implements OrderingPolicy above.  This 
implementation will take care of the common logic required for cases where the 
policy can be effectively implemented as a comparator (which is expected to be 
the case for several potential policies, including FIFO).  

A SchedulerComparator which is used by the SchedulerComparatorPolicy above.  
This is an extension of the java Comparator interface with additional logic 
required by the SchedulerComparatorPolicy, initially a method to accept 
SchedulerProcessEvents and indicate whether the require re-ordering of the 
associated SchedulerProcess.

 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch

 Create the initial framework required for using OrderingPolicies with 
 SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
 will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1884) ContainerReport should have nodeHttpAddress

2015-03-09 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1884:

Attachment: YARN-1884.2.patch

 ContainerReport should have nodeHttpAddress
 ---

 Key: YARN-1884
 URL: https://issues.apache.org/jira/browse/YARN-1884
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Xuan Gong
 Attachments: YARN-1884.1.patch, YARN-1884.2.patch


 In web UI, we're going to show the node, which used to be to link to the NM 
 web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
 field has to be set to nodeID where the container is allocated. We need to 
 add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353852#comment-14353852
 ] 

Wangda Tan commented on YARN-3298:
--

[~nroberts],
As you mentioned, it is mostly as same as what we have today, and I think it 
cannot solve the jitter problem. What I really want to say is enforce the 
limit. To solve small amount of resource cannot be used in a queue problem 
which you mentioned in 
https://issues.apache.org/jira/browse/YARN-3298?focusedCommentId=14353053page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14353053,
 setting user-limit a little bit higher should solve the problem also. (like 
from 50 to 51).

Sounds like a plan?

 User-limit should be enforced in CapacityScheduler
 --

 Key: YARN-3298
 URL: https://issues.apache.org/jira/browse/YARN-3298
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, yarn
Reporter: Wangda Tan
Assignee: Wangda Tan

 User-limit is not treat as a hard-limit for now, it will not consider 
 required-resource (resource of being-allocated resource request). And also, 
 when user's used resource equals to user-limit, it will still continue. This 
 will generate jitter issues when we have YARN-2069 (preemption policy kills a 
 container under an user, and scheduler allocate a container under the same 
 user soon after).
 The expected behavior should be as same as queue's capacity:
 Only when user.usage + required = user-limit (1), queue will continue to 
 allocate container.
 (1), user-limit mentioned here is determined by following computing
 {code}
 current-capacity = queue.used + now-required (when queue.used  
 queue.capacity)
queue.capacity (when queue.used  queue.capacity)
 user-limit = min(max(current-capacity / #active-users, current-capacity * 
 user-limit / 100), queue-capacity * user-limit-factor)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353897#comment-14353897
 ] 

Wangda Tan commented on YARN-2495:
--

I think the two issues are identical, and we should have a consistent way to 
handle them. If we stop node when any invalid labels during registration, we 
should stop node when same issue happened when heartbeat after registration.

I think we can either allow them running or stop both of them, I'm fine with 
both approach.

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-09 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353895#comment-14353895
 ] 

Vrushali C commented on YARN-2928:
--

+ 1 to renaming TimelineAggregator.  TimelineReceiver is good.  Some other 
suggestions are TimelineAccumulator or TimelineCollector. 

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
 v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS


[ 
https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353990#comment-14353990
 ] 

Jian He commented on YARN-3300:
---

lgtm, +1 

 outstanding_resource_requests table should not be shown in AHS
 --

 Key: YARN-3300
 URL: https://issues.apache.org/jira/browse/YARN-3300
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3300.1.patch, YARN-3300.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353833#comment-14353833
 ] 

Nathan Roberts commented on YARN-3298:
--

[~leftnoteasy], won't that be extremely close to what it is today? If so, then 
does it really solve the jitter issue you originally cited?

Just to make sure I'm in-sync with your proposed direction, this is the code 
you're thinking about modifying, correct? 
{code}
// Note: We aren't considering the current request since there is a fixed
// overhead of the AM, but it's a  check, not a = check, so...
if (Resources
.greaterThan(resourceCalculator, clusterResource,
user.getConsumedResourceByLabel(label),
limit)) {
{code}

 User-limit should be enforced in CapacityScheduler
 --

 Key: YARN-3298
 URL: https://issues.apache.org/jira/browse/YARN-3298
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, yarn
Reporter: Wangda Tan
Assignee: Wangda Tan

 User-limit is not treat as a hard-limit for now, it will not consider 
 required-resource (resource of being-allocated resource request). And also, 
 when user's used resource equals to user-limit, it will still continue. This 
 will generate jitter issues when we have YARN-2069 (preemption policy kills a 
 container under an user, and scheduler allocate a container under the same 
 user soon after).
 The expected behavior should be as same as queue's capacity:
 Only when user.usage + required = user-limit (1), queue will continue to 
 allocate container.
 (1), user-limit mentioned here is determined by following computing
 {code}
 current-capacity = queue.used + now-required (when queue.used  
 queue.capacity)
queue.capacity (when queue.used  queue.capacity)
 user-limit = min(max(current-capacity / #active-users, current-capacity * 
 user-limit / 100), queue-capacity * user-limit-factor)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-09 Thread Sangjin Lee (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353872#comment-14353872
]

Sangjin Lee commented on YARN-2928:
---

A couple of more comments on the plan:

- I think the metrics API should be part of phase 2 since we will handle
aggregation
- It's a small item, but we should make the per-node aggregator a standalone
daemon part of phase 2

Speaking of aggregator, the word aggregation/aggregator is now getting
quite overloaded. Originally it meant rolling up metrics to parent entities.
Now it's really used in two quite different contexts. For example, the
TimelineAggregator classes have little to do with that original meaning. I'm
not quite sure what aggregation means in that context, although, I know, I
know, I said +1 to the name TimelineAggregator. :) Should we clear up this
confusion? IMO, we should stick with the original meaning of aggregation when
we talk about aggregation. For TimelineAggregator, perhaps we could rename it
to TimelineReceiver or another name?

Application Timeline Server (ATS) next gen: phase 1
---

Key: YARN-2928
URL: https://issues.apache.org/jira/browse/YARN-2928
Project: Hadoop YARN
Issue Type: New Feature
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal
v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx

We have the application timeline server implemented in yarn per YARN-1530 and
YARN-321. Although it is a great feature, we have recognized several critical
issues and features that need to be addressed.
This JIRA proposes the design and implementation changes to address those.
This is phase 1 of this effort.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior


[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353953#comment-14353953
 ] 

Craig Welch commented on YARN-3318:
---


Proposed elements of the framework:

A SchedulerProcess interface which generalizes processes to be managed by the 
OrderingPolicy (initially, potentially in the future by other Policies as well) 
Initial implementer will be the SchedulerApplicaitonAttempt. 

An OrderingPolicy interface which exposes a collection of scheduler processes 
which will be ordered by the policy for container assignment and preemption.  
The ordering policy will provide one Iterator which presents processes in the 
policy specific order for container assignment and another Iterator which 
presents them in the proper order for preemption.  It will also accept 
SchedulerProcessEvents which may indicate a need to re-order the associated 
SchedulerProcess (for example, after container completion, preemption, 
assignment, etc)



 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch

 Create the initial framework required for using OrderingPolicies with 
 SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
 will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352766#comment-14352766
 ] 

Hadoop QA commented on YARN-3225:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703395/YARN-914.patch
  against trunk revision 5578e22.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.api.TestPBImplRecords
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestNodesPage
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6893//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6893//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6893//console

This message is automatically generated.

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


 [ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-160:
--
Fix Version/s: (was: 2.7.0)

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, 
 apache-yarn-160.2.patch, apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS


[ 
https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354227#comment-14354227
 ] 

Jian He commented on YARN-3300:
---

sounds good. committing 

 outstanding_resource_requests table should not be shown in AHS
 --

 Key: YARN-3300
 URL: https://issues.apache.org/jira/browse/YARN-3300
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3300.1.patch, YARN-3300.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly


 [ 
https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-1142:
---
Fix Version/s: (was: 2.7.0)

 MiniYARNCluster web ui does not work properly
 -

 Key: YARN-1142
 URL: https://issues.apache.org/jira/browse/YARN-1142
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur

 When going to the RM http port, the NM web ui is displayed. It seems there is 
 a singleton somewhere that breaks things when RM  NMs run in the same 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so


 [ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2890:
---
Fix Version/s: (was: 2.7.0)

 MiniMRYarnCluster should turn on timeline service if configured to do so
 

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location


 [ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-314:
--
Fix Version/s: (was: 2.7.0)

 Schedulers should allow resource requests of different sizes at the same 
 priority and location
 --

 Key: YARN-314
 URL: https://issues.apache.org/jira/browse/YARN-314
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
 Attachments: yarn-314-prelim.patch


 Currently, resource requests for the same container and locality are expected 
 to all be the same size.
 While it it doesn't look like it's needed for apps currently, and can be 
 circumvented by specifying different priorities if absolutely necessary, it 
 seems to me that the ability to request containers with different resource 
 requirements at the same priority level should be there for the future and 
 for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2172) Suspend/Resume Hadoop Jobs


 [ 
https://issues.apache.org/jira/browse/YARN-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2172:
---
Fix Version/s: (was: 2.2.0)

 Suspend/Resume Hadoop Jobs
 --

 Key: YARN-2172
 URL: https://issues.apache.org/jira/browse/YARN-2172
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager, webapp
Affects Versions: 2.2.0
 Environment: CentOS 6.5, Hadoop 2.2.0
Reporter: Richard Chen
  Labels: hadoop, jobs, resume, suspend
 Attachments: Hadoop Job Suspend Resume Design.docx, 
 hadoop_job_suspend_resume.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 In a multi-application cluster environment, jobs running inside Hadoop YARN 
 may be of lower-priority than jobs running outside Hadoop YARN like HBase. To 
 give way to other higher-priority jobs inside Hadoop, a user or some 
 cluster-level resource scheduling service should be able to suspend and/or 
 resume some particular jobs within Hadoop YARN.
 When target jobs inside Hadoop are suspended, those already allocated and 
 running task containers will continue to run until their completion or active 
 preemption by other ways. But no more new containers would be allocated to 
 the target jobs. In contrast, when suspended jobs are put into resume mode, 
 they will continue to run from the previous job progress and have new task 
 containers allocated to complete the rest of the jobs.
 My team has completed its implementation and our tests showed it works in a 
 rather solid  and convenient way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed


 [ 
https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-965:
--
Fix Version/s: (was: 2.7.0)

 NodeManager Metrics containersRunning is not correct When localizing 
 container process is failed or killed
 --

 Key: YARN-965
 URL: https://issues.apache.org/jira/browse/YARN-965
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha
 Environment: suse linux
Reporter: Li Yuan

 When successfully launched a container, container state from LOCALIZED to 
 RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or 
 KILLING to DONE, containersRunning--. 
 However, state EXITED_WITH_FAILURE or KILLING could come from 
 LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less 
 than the actual number. Further more,　Metrics is wrong, containersLaunched != 
 containersCompleted + containersFailed +　containersKilled ＋ containersRunning 
 +　containersIniting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project


 [ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2784:
---
Component/s: (was: test)
 build

 Yarn project module names in POM needs to consistent acros hadoop project
 -

 Key: YARN-2784
 URL: https://issues.apache.org/jira/browse/YARN-2784
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: build
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2784.patch


 All yarn and mapreduce pom.xml has project name has 
 hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
 projects build like 'Apache Hadoop Yarn module-name' and 'Apache Hadoop 
 MapReduce module-name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-03-09 Thread Rohith (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354279#comment-14354279
]

Rohith commented on YARN-3273:
--

Thanks Jian He for your suggestion:-)
Overall summary to be in right direction. I am assuming that all scheduler
changes is only for CS. Is there any common scheduelr changes to be done ?
# Headroom will be dispalyed in application attempt page. This will be set as 0
once the attempt is finished.
# For each leaf queue in CS, UsedAMResource,UsedUserAMResource, 'User Limit for
User' will be displayed.
# In Active User, for each user link will be provided which redirect to
additional filtered user page containing userInfo in table as above sample
table. This is also applicable only for CS.
# All active users table wont be rendered. Instead only link will be provided
for each user i.e step-3 in active user. Am I understading is correct?

Improve web UI to facilitate scheduling analysis and debugging
--

Key: YARN-3273
URL: https://issues.apache.org/jira/browse/YARN-3273
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
Attachments: 0001-YARN-3273-v1.patch,
YARN-3273-am-resource-used-AND-User-limit.PNG,
YARN-3273-application-headroom.PNG

Job may be stuck for reasons such as:
- hitting queue capacity
- hitting user-limit,
- hitting AM-resource-percentage
The first queueCapacity is already shown on the UI.
We may surface things like:
- what is user's current usage and user-limit;
- what is the AM resource usage and limit;
- what is the application's current HeadRoom;

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package


 [ 
https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-745:
--
Fix Version/s: (was: 2.7.0)

 Move UnmanagedAMLauncher to yarn client package
 ---

 Key: YARN-745
 URL: https://issues.apache.org/jira/browse/YARN-745
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha

 Its currently sitting in yarn applications project which sounds wrong. client 
 project sounds better since it contains the utilities/libraries that clients 
 use to write and debug yarn applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2172) Suspend/Resume Hadoop Jobs

[
https://issues.apache.org/jira/browse/YARN-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354306#comment-14354306
]

Hadoop QA commented on YARN-2172:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12658578/hadoop_job_suspend_resume.patch
against trunk revision 47f7f18.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6899//console

This message is automatically generated.

Suspend/Resume Hadoop Jobs
--

Key: YARN-2172
URL: https://issues.apache.org/jira/browse/YARN-2172
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager, webapp
Affects Versions: 2.2.0
Environment: CentOS 6.5, Hadoop 2.2.0
Reporter: Richard Chen
Labels: hadoop, jobs, resume, suspend
Attachments: Hadoop Job Suspend Resume Design.docx,
hadoop_job_suspend_resume.patch

Original Estimate: 336h
Remaining Estimate: 336h

In a multi-application cluster environment, jobs running inside Hadoop YARN
may be of lower-priority than jobs running outside Hadoop YARN like HBase. To
give way to other higher-priority jobs inside Hadoop, a user or some
cluster-level resource scheduling service should be able to suspend and/or
resume some particular jobs within Hadoop YARN.
When target jobs inside Hadoop are suspended, those already allocated and
running task containers will continue to run until their completion or active
preemption by other ways. But no more new containers would be allocated to
the target jobs. In contrast, when suspended jobs are put into resume mode,
they will continue to run from the previous job progress and have new task
containers allocated to complete the rest of the jobs.
My team has completed its implementation and our tests showed it works in a
rather solid and convenient way.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3323) Task UI, sort by name doesn't work

2015-03-09 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3323:

Summary: Task UI, sort by name doesn't work  (was: MR Task UI, sort by name 
doesn't work)

Moving to YARN project.

 Task UI, sort by name doesn't work
 --

 Key: YARN-3323
 URL: https://issues.apache.org/jira/browse/YARN-3323
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.5.1
Reporter: Thomas Graves
Assignee: Brahma Reddy Battula

 If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the 
 list of tasks, then try to sort by the task name/id, it does nothing.
 Note that if you go to the task attempts, that seem to sort fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354362#comment-14354362
 ] 

Naganarasimha G R commented on YARN-2495:
-

Hi [~wangda],
1) IMO method name was not readable when it was {{setAreNodeLabelsSet}} but i 
have changed it to {{setAreNodeLabelsSetInReq}} i feel this is sufficient. 
setAreNodeLabelsUpdated is same as earlier for which Craig had commented (which 
i also feel valid) 
{quote}
 I would go with areNodeLablesSet (all isNodeLabels = areNodeLabels 
wherever it appears, actually) - wrt Set vs Updated - this is primarily a 
workaround for the null/empty ambiguity and I think this name better reflects 
what is really going on (am I sending a value to act on or not), but I also 
think that this is a better contract, the receiver (rm) shouldn't really care 
about the logic the nm side is using to decide whether or not to set it's 
labels (freshness, updatedness, whatever), so all that should be communicated 
in the api is whether or not the value is set, not whether it's an 
update/whether it's checking freshness, etc. that's a nit, but I think it's a 
clearer name.
{quote}
 Yes true lets finalize the name this time after that will start working on the 
patch if not it will be a wasted effort
5) 
{quote}
It will be problematic to ask admins make NM/RM configuration keep 
synchronized, so I don't want (and also not necessary) NM depends on RM's 
configuration.
So I suggest to make a changes: In NodeManager.java: when user doesn't 
configure provider, it should be null. In your patch, you can return a null 
directly, and YARN-2729 will implement the logic of instancing provider from 
config. In NodeStatusUpdaterImpl: avoid using isDistributedNodeLabelsConf, 
since we will not have distributedNodeLabelConf in NM side if you agree on 
previously comment, instead, it will check null of provider.
{quote}
Well modifications side is clear to me but is it good to allow the 
configurations being different from NM and RM ? Infact i wanted to discuss 
regarding whether to send shutdown during register if NM is configured 
differently from RM, but waited for the base changes to go in before discussing 
new stuff.

8) ??You can add an additional comments in line 626 for this.?? Ok will add a 
comment in LabelProvider.getLabels , Idea is LabelProvider is expected to give 
same Labels continiously untill there is a change and if null or empty is 
returned then No label is assumed

10) {{updateNodeLabelsInNodeLabelsManager - updateNodeLabelsFromNMReport}} : 
will take care in next patch
{{LOG.info(... accepted from RM, use LOG.debug and check isDebugEnabled.}} : I 
feel better to Log this as Error as we are sending the labels only in case of 
any change and there has to be some way to identify if labels for a given NM 
and also currently we are sending out shutdown signal too.

??Make errorMessage clear: indicate 1# this is node labels reported from NM, 
and 2# it's failed to be put to RM instead of not properly configured.??
i think i have captured first point, but any way will reframe it as {{Node 
Labels labels reported from the NM with id nodeID were rejected from RM  
with exception message as exceptionMsg.}}

??Another thing we should do is, when distributed node label configuration is 
set, any direct modify node to labels mapping from RMAdminCLI should be 
rejected (like -replaceNodeToLabels).?? Will work on this once 2495 and 2729 
are done ..

Thanks [~vinodkv]  [~cwelch] for reviewing it 
??configuration.type - configuration-type?? will take care in next patch
{quote}
Should RegisterNodeManagerRequestProto.nodeLabels be a set instead? 
Do we really need NodeHeartbeatRequest.areNodeLabelsSetInReq()? Why not just 
look at the set as mentioned in the previous comment?
{quote}
Well as craig informed, RegisterNodeManagerRequestProto.nodeLabels  is already 
a set but as by default empty set is provided by protoc, its req to inform 
whether labels are set as part of request hence areNodeLabelsSetInReq is 
required.
??RegisterNodeManagerRequest is getting changed. It will be interesting to 
reason about rolling-upgrades in this scenario.??
Well though i am not much aware of Rolling upgrades, i don't see any problems 
in a normal case because RM tries to read the labels from NM's req only when 
its distributed conf and also {{areNodeLabelsSetInReq}} is by default false. 
But I had queries when some existing setup they want to modify to distributed 
conf setup 
# Whether we need to send shutdown during register if NM is configured 
differently from RM ?
# Will the new configurations be added in NM and RM and then Rolling upgrade 
will be done ? or we do rolling upgrade first and then reconfigure  restart 
RM's and NM's

??How about we simply things? Instead of accepting labels on both registration 
and heartbeat, why not restrict it to be just during registration??
Well i have

[jira] [Updated] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation

2015-03-09 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3305:
-
Attachment: 0001-YARN-3305.patch

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS


[ 
https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354236#comment-14354236
 ] 

Hudson commented on YARN-3300:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7293 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7293/])
YARN-3300. Outstanding_resource_requests table should not be shown in AHS. 
Contributed by Xuan Gong (jianhe: rev c3003eba6f9802f15699564a5eb7c6e34424cb14)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* hadoop-yarn-project/CHANGES.txt


 outstanding_resource_requests table should not be shown in AHS
 --

 Key: YARN-3300
 URL: https://issues.apache.org/jira/browse/YARN-3300
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-3300.1.patch, YARN-3300.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project


 [ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2784:
---
Fix Version/s: (was: 2.7.0)

 Yarn project module names in POM needs to consistent acros hadoop project
 -

 Key: YARN-2784
 URL: https://issues.apache.org/jira/browse/YARN-2784
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2784.patch


 All yarn and mapreduce pom.xml has project name has 
 hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
 projects build like 'Apache Hadoop Yarn module-name' and 'Apache Hadoop 
 MapReduce module-name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1147) Add end-to-end tests for HA


 [ 
https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-1147:
---
Fix Version/s: (was: 2.7.0)

 Add end-to-end tests for HA
 ---

 Key: YARN-1147
 URL: https://issues.apache.org/jira/browse/YARN-1147
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Xuan Gong

 While individual sub-tasks add tests for the code they include, it will be 
 handy to write end-to-end tests for HA including some stress testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS


 [ 
https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-153:
--
Fix Version/s: (was: 2.7.0)

 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a 
 PaaS
 

 Key: YARN-153
 URL: https://issues.apache.org/jira/browse/YARN-153
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Jacob Jaigak Song
Assignee: Jacob Jaigak Song
 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, 
 MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, 
 MAPREDUCE4393.patch

   Original Estimate: 336h
  Time Spent: 336h
  Remaining Estimate: 0h

 This application is to demonstrate that YARN can be used for non-mapreduce 
 applications. As Hadoop has already been adopted and deployed widely and its 
 deployment in future will be highly increased, we thought that it's a good 
 potential to be used as PaaS.  
 I have implemented a proof of concept to demonstrate that YARN can be used as 
 a PaaS (Platform as a Service). I have done a gap analysis against VMware's 
 Cloud Foundry and tried to achieve as many PaaS functionalities as possible 
 on YARN.
 I'd like to check in this POC as a YARN example application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections


 [ 
https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-113:
--
Fix Version/s: (was: 2.7.0)

 WebAppProxyServlet must use SSLFactory for the HttpClient connections
 -

 Key: YARN-113
 URL: https://issues.apache.org/jira/browse/YARN-113
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 The HttpClient must be configured to use the SSLFactory when the web UIs are 
 over HTTPS, otherwise the proxy servlet fails to connect to the AM because of 
 unknown (self-signed) certificates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3323) MR Task UI, sort by name doesn't work

2015-03-09 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA moved MAPREDUCE-6102 to YARN-3323:


  Component/s: (was: webapps)
   webapp
 Target Version/s:   (was: 2.6.0)
Affects Version/s: (was: 2.5.1)
   2.5.1
  Key: YARN-3323  (was: MAPREDUCE-6102)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 MR Task UI, sort by name doesn't work
 -

 Key: YARN-3323
 URL: https://issues.apache.org/jira/browse/YARN-3323
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.5.1
Reporter: Thomas Graves
Assignee: Brahma Reddy Battula

 If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the 
 list of tasks, then try to sort by the task name/id, it does nothing.
 Note that if you go to the task attempts, that seem to sort fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3319) Implement a Fair SchedulerOrderingPolicy


[ 
https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354006#comment-14354006
 ] 

Craig Welch commented on YARN-3319:
---

Initially this will be implemented for SchedulerApplicationAttempts in the 
CapacityScheduler LeafQueue (similar to the FIFO implementation in 
[YARN-3318]).  The expectation is that this will be implement the 
SchedulerComparator interface and will be used as a comparator within the 
SchedulerComparatorPolicy implementation to achieve the intended behavior.

 Implement a Fair SchedulerOrderingPolicy
 

 Key: YARN-3319
 URL: https://issues.apache.org/jira/browse/YARN-3319
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch

 Implement a Fair SchedulerOrderingPolicy which prefers to allocate to 
 SchedulerProcesses with least current usage, very similar to the 
 FairScheduler's FairSharePolicy.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3321) Health-Report column of NodePage should display more information.


 [ 
https://issues.apache.org/jira/browse/YARN-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-3091 to YARN-3321:
---

  Component/s: (was: nodemanager)
   (was: resourcemanager)
   resourcemanager
   nodemanager
 Assignee: (was: Subroto Sanyal)
Affects Version/s: (was: 0.23.0)
  Key: YARN-3321  (was: MAPREDUCE-3091)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Health-Report column of NodePage should display more information.
 ---

 Key: YARN-3321
 URL: https://issues.apache.org/jira/browse/YARN-3321
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Reporter: Subroto Sanyal
  Labels: javascript

 The Health-Checker script of the Nodes can run and generate some output, 
 error and exit code.
 These information is not available in the GUI. 
 It is possible the Health-Checker script generates some statistics about 
 node. The same can displayed to GUI user. I suggest we display the 
 information in pop-up balloon(using CSS/Javascript)?
 Any suggestions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-09 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354003#comment-14354003
 ] 

Karthik Kambatla commented on YARN-2928:


+1 to renaming. 

Prefer - TimelineCollector and TimelineReceiver in that order. 

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
 v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS