date:20150313


 [ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2854:

Attachment: YARN-2854.20150313-1.patch

Patch with [~zjshen] 's 1st and 2nd comment fixed. 

 The document about timeline service and generic service needs to be updated
 ---

 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, 
 YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, 
 YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-03-13 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360739#comment-14360739
 ] 

Wangda Tan commented on YARN-3243:
--

YARN-3204 tracks findbugs warning, and test failure is not related to this 
change.

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-03-13 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360669#comment-14360669
 ] 

Tsuyoshi Ozawa commented on YARN-2890:
--

[~mitdesai] thank you for updating a patch! +1 for the change itself to make it 
configurable.

{quote}
I was trying out the fix for MiniMRYarnCluster where we want to start the 
timeline service only if the TIMELINE_SERVICE_ENABLED == true. But as per 
current implementation of the miniCluster, it takes in a boolean when its 
instance is created to decide whether to start or not to start the timeline 
server.
{quote}

I don't understand the context - why you'd like to make the flag off by 
default? Could you clarify it? IMO, it would be enough to make it configurable. 

 MiniMRYarnCluster should turn on timeline service if configured to do so
 

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer


[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360805#comment-14360805
 ] 

zhihai xu commented on YARN-3336:
-

TestRMWebServices and TestFairSchedulerQueueACLs passed in my local latest 
build and both test failures are not related to my patch.
{code}
---
 T E S T S
---
Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.871 sec - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
---
 T E S T S
---
Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.004 sec - in 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
Results :
Tests run: 19, Failures: 0, Errors: 0, Skipped: 0
{code}

The findbugs warnings are also not related to my patch. YARN-3341 is to fix one 
of the findbugs warnings.


 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
 YARN-3336.002.patch, YARN-3336.003.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {

[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer


[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360822#comment-14360822
 ] 

Junping Du commented on YARN-3034:
--

Hi [~vinodkv], for DistributedShell patch, we have assumption that v1 and v2 
service could running at the same time (also for TestDistributedShell cases, we 
test v1 and v2 on the same miniYARN cluster). AM get launched with different 
parameter of version, then it pass the boolean value of newTimelineService to 
TimelineClient which will call related functions - that is current flow we 
have. 

 [Aggregator wireup] Implement RM starting its ATS writer
 

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer

2015-03-13 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360797#comment-14360797
 ] 

Vinod Kumar Vavilapalli commented on YARN-3034:
---

We don't have a plan to directly put any metrics data from NM to the storage 
yet. So, agree that this is an issue, but not an immediate one. When we come to 
it, may be we will have a yarn.system-metrics-publisher.enabled which is used 
by both RM and NM and deprecates the current RM flag.

+1 for a yarn.timeline-service.version. This is what we should have done for 
the DistributedShell patch? /cc [~djp]. May be for all clients when YARN-2928 
is ready to go in?

 [Aggregator wireup] Implement RM starting its ATS writer
 

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer

2015-03-13 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360823#comment-14360823
 ] 

Li Lu commented on YARN-3034:
-

Hi [~Naganarasimha], thanks for the clarification. I think this way of 
organization is fine for now. 

 [Aggregator wireup] Implement RM starting its ATS writer
 

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records


[ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360680#comment-14360680
 ] 

Hudson commented on YARN-3267:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7316 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7316/])
YARN-3267. Timelineserver applies the ACL rules after applying the limit on the 
number of records (Chang Li via jeagles) (jeagles: rev 
8180e676abb2bb500a48b3a0c0809d2a807ab235)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestTimelineDataManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineReader.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMRTimelineEventHandling.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java


 Timelineserver applies the ACL rules after applying the limit on the number 
 of records
 --

 Key: YARN-3267
 URL: https://issues.apache.org/jira/browse/YARN-3267
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Prakash Ramachandran
Assignee: Chang Li
 Attachments: YARN-3267.3.patch, YARN-3267.4.patch, YARN-3267.5.patch, 
 YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, 
 YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch


 While fetching the entities from timelineserver, the limit is applied on the 
 entities to be fetched from leveldb, the ACL filters are applied after this 
 (TimelineDataManager.java::getEntities). 
 this could mean that even if there are entities available which match the 
 query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-03-13 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360938#comment-14360938
 ] 

Mit Desai commented on YARN-2890:
-

bq. The above ctor was removed. If anyone is using MiniMRYARNCluster from 2.4.0 
to test their jobs, this will break compatibility.
My latest patch no longer removes this constructor

bq. Why use a hardcoded false instead of the DEFAULT field from 
YarnConfiguration?
Makes sense. Thanks. I will update the patch to use the default value which is 
set as false already. But I will wait for Zhijie's response before updating the 
patch.

 MiniMRYarnCluster should turn on timeline service if configured to do so
 

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360686#comment-14360686
 ] 

Junping Du commented on YARN-3225:
--

bq.  If we have a constraint that we should issue graceful decommission command 
from only one RMAdmin CLI then this issue will not be a problem.
Can we have this assumption in our first phase (target to catch up 2.8)? IMO, 
Decommissioning nodes is very restrictive operation that we don't expect 
multiple could happen at the same time on a cluster. We can improve later if we 
think this is not good enough. 

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-03-13 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360809#comment-14360809
 ] 

Hitesh Shah commented on YARN-2890:
---

2 issues with the patch: 

{code}
public MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS)
{code}

The above ctor was removed. If anyone is using MiniMRYARNCluster from 2.4.0 to 
test their jobs, this will break compatibility.

{code}
conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, false)
{code}

Why use a hardcoded false instead of the DEFAULT field from YarnConfiguration?

Also, to add to Tsuyoshi's comment, what is the issue with turning on Timeline 
in all scenarios? If Timeline is going to be a first class citizen of YARN 
going forwards, why make it false by default? [~zjshen] comments on this?



 MiniMRYarnCluster should turn on timeline service if configured to do so
 

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-03-13 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360704#comment-14360704
 ] 

Mit Desai commented on YARN-2890:
-

Thats because not everything is using the timeline server. Turning it off by 
default will prevent users from accidentally using the timeline server if they 
do not intend to.  Moreover if someone intends to use the timeline server, they 
are well aware and can turn the flag on.

 MiniMRYarnCluster should turn on timeline service if configured to do so
 

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN

2015-03-13 Thread Karthik Kambatla (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360697#comment-14360697
]

Karthik Kambatla commented on YARN-3306:

[~cwelch] - thanks for the clarifications. I also spoke to Vinod offline on the
goals and likely path of this work.

I see the benefits of having a single scheduler with pluggable policies. I feel
it might be easier to implement a new scheduler and plug-in FS and CS policies
into it. However, I understand the iterative approach you propose will get
validation from current users along the way.

I am a little circumspect about the iterative approach and how we avoid
regressions, but remain hopeful the code will convince me it is the right
approach. I would like to be involved in the work here, can we work on a branch
and merge in as and when appropriate.

[Umbrella] Proposing per-queue Policy driven scheduling in YARN
---

Key: YARN-3306
URL: https://issues.apache.org/jira/browse/YARN-3306
Project: Hadoop YARN
Issue Type: Bug
Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Attachments: PerQueuePolicydrivenschedulinginYARN.pdf

Scheduling layout in Apache Hadoop YARN today is very coarse grained. This
proposal aims at converting today’s rigid scheduling in YARN to a per-queue
policy driven architecture.
We propose the creation of a common policy framework and implement acommon
set of policies that administrators can pick and chose per queue
- Make scheduling policies configurable per queue
- Initially, we limit ourselves to a new type of scheduling policy that
determines the ordering of applications within the leaf queue
- In the near future, we will also pursue parent queue level policies and
potential algorithm reuse through a separate type of policies that control
resource limits per queue, user, application etc.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3291) DockerContainerExecutor should run as a non-root user inside the container


[ 
https://issues.apache.org/jira/browse/YARN-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360881#comment-14360881
 ] 

Hadoop QA commented on YARN-3291:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704462/YARN-3291.patch
  against trunk revision f446669.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6957//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6957//console

This message is automatically generated.

 DockerContainerExecutor should run as a non-root user inside the container
 --

 Key: YARN-3291
 URL: https://issues.apache.org/jira/browse/YARN-3291
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Abin Shahab
Assignee: Abin Shahab
 Attachments: YARN-3291.patch, YARN-3291.patch


 Currently DockerContainerExecutor runs container as root(inside the 
 container). Outside the container it runs as yarn. Inside the this can be run 
 as the user which is not root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN

2015-03-13 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360967#comment-14360967
 ] 

Vinod Kumar Vavilapalli commented on YARN-3306:
---

Yup, the discussion above captures the main bits, but to summarize

How do we avoid the fragmentation for this feature itself?
 - By putting the framework in the common place and use it in specific 
schedulers one after another.

Is this only for leaf-queue?
 - No, we start with leaf queue and demonstrate viability across different 
existing policies and then move up to parent queue (which should be easier than 
leaf-queue) and extending to limits

Why not a new scheduler?
 # Getting existing users to validate our changes, and a smoother migration 
path.
 # Make sure current behaviors are completely absorbed.

 [Umbrella] Proposing per-queue Policy driven scheduling in YARN
 ---

 Key: YARN-3306
 URL: https://issues.apache.org/jira/browse/YARN-3306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: PerQueuePolicydrivenschedulinginYARN.pdf


 Scheduling layout in Apache Hadoop YARN today is very coarse grained. This 
 proposal aims at converting today’s rigid scheduling in YARN to a per-queue 
 policy driven architecture.
 We propose the creation of a common policy framework and implement acommon 
 set of policies that administrators can pick and chose per queue
  - Make scheduling policies configurable per queue
  - Initially, we limit ourselves to a new type of scheduling policy that 
 determines the ordering of applications within the leaf queue
  - In the near future, we will also pursue parent queue level policies and 
 potential algorithm reuse through a separate type of policies that control 
 resource limits per queue, user, application etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3291) DockerContainerExecutor should run as a non-root user inside the container

2015-03-13 Thread Abin Shahab (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-3291:
--
Attachment: YARN-3291.patch

Removed findbugs warning

 DockerContainerExecutor should run as a non-root user inside the container
 --

 Key: YARN-3291
 URL: https://issues.apache.org/jira/browse/YARN-3291
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Abin Shahab
Assignee: Abin Shahab
 Attachments: YARN-3291.patch, YARN-3291.patch


 Currently DockerContainerExecutor runs container as root(inside the 
 container). Outside the container it runs as yarn. Inside the this can be run 
 as the user which is not root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated


[ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360841#comment-14360841
 ] 

Hadoop QA commented on YARN-2854:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12704461/YARN-2854.20150313-1.patch
  against trunk revision 8180e67.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6956//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6956//console

This message is automatically generated.

 The document about timeline service and generic service needs to be updated
 ---

 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, 
 YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, 
 YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2480) DockerContainerExecutor must support user namespaces

2015-03-13 Thread Ravindra Kumar Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361049#comment-14361049
 ] 

Ravindra Kumar Naik commented on YARN-2480:
---

Though this support exists in Linux containers (LXC), docker doesn't yet 
support such mapping 
Please have a look at https://github.com/docker/docker/issues/7906

 DockerContainerExecutor must support user namespaces
 

 Key: YARN-2480
 URL: https://issues.apache.org/jira/browse/YARN-2480
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Abin Shahab
  Labels: security

 When DockerContainerExector launches a container, the root inside that 
 container has root privileges on the host. 
 This is insecure in a mult-tenant environment. The uid of the container's 
 root user must be mapped to a non-privileged user on the host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-13 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361074#comment-14361074
 ] 

Sangjin Lee commented on YARN-3039:
---

I stand corrected [~djp]. For some strange reason I missed the null check in 
the while loop, which is why I mistakenly thought that every call would end up 
right in the Thread.sleep(). Thanks for the correction.

 [Aggregator wireup] Implement ATS app-appgregator service discovery
 ---

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, 
 YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, 
 YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2854) The document about timeline service and generic service needs to be updated


 [ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2854:
--
Issue Type: Improvement  (was: Bug)

 The document about timeline service and generic service needs to be updated
 ---

 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, 
 YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, 
 YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361035#comment-14361035
 ] 

Zhijie Shen commented on YARN-2890:
---

bq. If Timeline is going to be a first class citizen of YARN going forwards, 
why make it false by default?

I think so far it's still not assumed that timeline service is the always 
enabled component, though we we'd like to propose it, but maybe more persuasive 
until ATS v2?

And for MiniMRYARNCluster there's a technical issue too. Because of the 
singleton in Guice, only one webapp can be created per daemon. Enabling the ATS 
will break the other web test cases around RM/NM (if I remember it correctly, 
there seems to have such tests). 

 MiniMRYarnCluster should turn on timeline service if configured to do so
 

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361089#comment-14361089
 ] 

Junping Du commented on YARN-3225:
--

One additional comments:
For RMNodeEventType, DECOMMISSION_WITH_DELAY sounds better?
{code}
RMNodeEventType.java
@@ -24,6 +24,7 @@
   // Source: AdminService
   DECOMMISSION,
+  DECOMMISSION_WITH_TIMEOUT,
{code}

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated


[ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361072#comment-14361072
 ] 

Zhijie Shen commented on YARN-2854:
---

Sorry, I missed that sentence. The new patch looks good to me. Will commit it 
and the image.

 The document about timeline service and generic service needs to be updated
 ---

 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, 
 YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, 
 YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-03-13 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361084#comment-14361084
 ] 

Hitesh Shah commented on YARN-2890:
---

Thanks [~mitdesai]. In the future, it would be good if your patches are 
versioned to avoid confusion. 

More questions on the patch:

  - testTimelineServiceStartInMiniCluster() - is there a reason why a job is 
run when timeline is enabled but not run when it is disabled? 
- should be a job run be needed here in the first place given the name of 
the test?
- might be better to move the testing of job runs based on absence/presence 
of timeline to a separate test

 - testMRTimelineEventHandling, testMapreduceJobTimelineServiceEnabled, 
testMapreduceJobTimelineServiceEnabled
   - is there a need to change all of them?
   - there does not seem to be a code path that tests timeline being enabled by 
passing the enableAHS value in the ctor if all these are changed.


  

 MiniMRYarnCluster should turn on timeline service if configured to do so
 

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated


[ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361101#comment-14361101
 ] 

Hudson commented on YARN-2854:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7321 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7321/])
YARN-2854. Updated the documentation of the timeline service and the generic 
history service. Contributed by Naganarasimha G R. (zjshen: rev 
6fdef76cc3e818856ddcc4d385c2899a8e6ba916)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/timeline_structure.jpg
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* hadoop-yarn-project/CHANGES.txt


 The document about timeline service and generic service needs to be updated
 ---

 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, 
 YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, 
 YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated


[ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361103#comment-14361103
 ] 

Zhijie Shen commented on YARN-2854:
---

[~Naganarasimha], would you please check the branch-2? The patch cannot be 
merged to the branch-2 clearly. See if we need to create patch for branch-2 
only. Thanks!

 The document about timeline service and generic service needs to be updated
 ---

 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, 
 YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, 
 YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-13 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361267#comment-14361267
 ] 

Sangjin Lee commented on YARN-3039:
---

[~djp], a couple of quick comments (I'll follow up after reviewing the latest 
patch more carefully).

{quote}
We could have an annotation in class level which is default publicity and 
stability for each method. However, each method could have its own annotation 
to override the class one. In most cases, the class level annotation is more 
public and stable than individual methods which is first-class contract with 
end users or other components (or they will have concerns to use it). Take an 
example, if we need to add a new API which is not stable yet to a protocol 
class marked with stable, we shouldn't regression the whole class from stable 
to evolving or unstable, but we can mark the new method as unstable or 
evolving. Make sense?
{quote}

Yes, I get the reasoning for annotating individual methods. My concern is more 
about the *new classes*. Note that we're still evolving even the class names. 
This might be a fine point, but I feel we should annotate the *new classes* at 
least as unstable for now in addition to the method annotations. Thoughts?

{quote}
bq. RMAppImpl.java, Would this be backward compatible from the RM state store 
perspective?
I don't think so. ApplicationDataProto is also be a protobuffer object, and new 
field for aggreagtorAddress is optional.
{quote}

So you mean it will be backward compatible, right?


 [Aggregator wireup] Implement ATS app-appgregator service discovery
 ---

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, 
 YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, 
 YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-13 Thread Sangjin Lee (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361280#comment-14361280
]

Sangjin Lee commented on YARN-3039:
---

[~zjshen], [~djp], regarding the idea about having IPC from NM to the per-app
collector, I don't think that will work with a special container use case. The
special container for the per-app collector will bind to a port for RPC that
will not be determined until the time the collector binds to it. So it's
basically a chicken-and-egg problem: NM doesn't know the RPC port for the
per-all collector in the special container until ... the special containers
tells it. This is not a problem with the current per-node collector container
situation.

Although it's a little roundabout, I don't see a fundamental problem with
having the per-app collector (or the collection of them) sending its location
to the NM once it's up. It's actually conceptually simpler, and it should work
in all 3 modes (aux service, standalone per-node daemon, and special container).

[Aggregator wireup] Implement ATS app-appgregator service discovery
---

Key: YARN-3039
URL: https://issues.apache.org/jira/browse/YARN-3039
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
Attachments: Service Binding for applicationaggregator of ATS
(draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf,
YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch,
YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch

Per design in YARN-2928, implement ATS writer service discovery. This is
essential for off-node clients to send writes to the right ATS writer. This
should also handle the case of AM failures.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation


[ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361300#comment-14361300
 ] 

Jian He commented on YARN-3305:
---

looks good overall,  why is unmanagedAM check needed? 
{code}
if (!submissionContext.getUnmanagedAM())
{code}

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 
 0002-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation


[ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361565#comment-14361565
 ] 

Rohith commented on YARN-3305:
--

Unmanaged applications need not necessarily send AM RR(ResourceRequest) because 
RM won't allocate a container for the AM and start it. Instead RM expect AM to 
be launched and connect to the RM within the AM liveliness period. So for 
unmanaged applications RR can be null which would cause NPE while normalizing 
RR.

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 
 0002-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361586#comment-14361586
 ] 

Jian He commented on YARN-3273:
---

thanks Rohith ! overall looks great !
- synchronization of userResourceLimit - I think we can use volatile keyword 
for the userResourceLimit and do not need the synchronized keyword
{code}
public Resource getUserResourceLimit() {   return userResourceLimit; }
 public synchronized void setUserResourceLimit(Resource userResourceLimit) {   
this.userResourceLimit = userResourceLimit; }
{code}
- SchedulerCommonInfo - SchedulerInfo
- I think it’s fine to store below info with Resource type. i.e. “private 
Resource minAllocResource”; similarly on the UI, we can expose as minimum 
allocation resource
{code}
protected long minAllocMemory; protected long maxAllocMemory; protected long 
minAllocVirtualCores; protected long maxAllocVirtualCores;
{code}
- just for better code readability, we may use a variable to store the 
entry.getValue
{code}
 usersToReturn.add(new UserInfo(entry.getKey(), Resources.clone(entry
  .getValue().getUsed()), entry.getValue().getActiveApplications(),
  entry.getValue().getPendingApplications(), Resources.clone(entry
  .getValue().getConsumedAMResources()), Resources.clone(entry
  .getValue().getUserResourceLimit(;
{code}
- the headroom rendering should be inside the 
webUiType.equals(YarnWebParams.RM_WEB_UI) check.
{code}// TODO Need to get HeadRoom from scheduler and render it web ui {code}
bq.  I think headroom can be in RMAppAttemptMetric and render only if attempt 
is running.
sounds good to me.

 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
 YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3284) Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command


[ 
https://issues.apache.org/jira/browse/YARN-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361572#comment-14361572
 ] 

Hadoop QA commented on YARN-3284:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704544/YARN-3284.2.patch
  against trunk revision 6fdef76.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 13 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1153 javac 
compiler warnings (more than the trunk's current 1152 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6960//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6960//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6960//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6960//console

This message is automatically generated.

 Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN 
 command
 -

 Key: YARN-3284
 URL: https://issues.apache.org/jira/browse/YARN-3284
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3284.1.patch, YARN-3284.1.patch, YARN-3284.2.patch


 Current, we have some extra metrics about the application and current attempt 
 in RM Web UI. We should expose that information through YARN Command, too.
 1. Preemption metrics
 2. application outstanding resource requests
 3. container locality info



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation


[ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361590#comment-14361590
 ] 

Jian He commented on YARN-3305:
---

ah, right. forgot about that. 
 will commit this.  thanks !

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 
 0002-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy

2015-03-13 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3319:
--
Attachment: YARN-3319.17.patch

With support for configuration via the scheduler's config file

 Implement a Fair SchedulerOrderingPolicy
 

 Key: YARN-3319
 URL: https://issues.apache.org/jira/browse/YARN-3319
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3319.13.patch, YARN-3319.14.patch, 
 YARN-3319.17.patch


 Implement a Fair SchedulerOrderingPolicy which prefers to allocate to 
 SchedulerProcesses with least current usage, very similar to the 
 FairScheduler's FairSharePolicy.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3284) Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command


[ 
https://issues.apache.org/jira/browse/YARN-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361607#comment-14361607
 ] 

Rohith commented on YARN-3284:
--

Hi [~xgong], thanks for working on this Jira.
 For displaying application headroom for running applications( one of the 
point in YARN-3273), it is required to expose applicationHeadroom field in the 
ApplicationAttemptMetrics.java. Would you please mind adding this field in your 
patch that helps retrieve headroom?

 Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN 
 command
 -

 Key: YARN-3284
 URL: https://issues.apache.org/jira/browse/YARN-3284
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3284.1.patch, YARN-3284.1.patch, YARN-3284.2.patch


 Current, we have some extra metrics about the application and current attempt 
 in RM Web UI. We should expose that information through YARN Command, too.
 1. Preemption metrics
 2. application outstanding resource requests
 3. container locality info



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state


[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361338#comment-14361338
 ] 

Hadoop QA commented on YARN-3212:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704518/YARN-3212-v1.patch
  against trunk revision 6fdef76.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestNodesPage
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6958//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6958//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6958//console

This message is automatically generated.

 RMNode State Transition Update with DECOMMISSIONING state
 -

 Key: YARN-3212
 URL: https://issues.apache.org/jira/browse/YARN-3212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Junping Du
Assignee: Junping Du
 Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch


 As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
 can transition from “running” state triggered by a new event - 
 “decommissioning”. 
 This new state can be transit to state of “decommissioned” when 
 Resource_Update if no running apps on this NM or NM reconnect after restart. 
 Or it received DECOMMISSIONED event (after timeout from CLI).
 In addition, it can back to “running” if user decides to cancel previous 
 decommission by calling recommission on the same node. The reaction to other 
 events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3284) Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command

2015-03-13 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3284:

Attachment: YARN-3284.2.patch

 Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN 
 command
 -

 Key: YARN-3284
 URL: https://issues.apache.org/jira/browse/YARN-3284
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3284.1.patch, YARN-3284.1.patch, YARN-3284.2.patch


 Current, we have some extra metrics about the application and current attempt 
 in RM Web UI. We should expose that information through YARN Command, too.
 1. Preemption metrics
 2. application outstanding resource requests
 3. container locality info



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state


 [ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3212:
-
Attachment: YARN-3212-v1.patch

Upload the first patch for core state changes with decommissioning. For 
RMNodeEventType, I would prefer  DECOMMISSION_WITH_DELAY over 
DECOMMISSION_WITH_TIMEOUT like my comments in YARN-3225. May update later if 
that comments get adopted.

 RMNode State Transition Update with DECOMMISSIONING state
 -

 Key: YARN-3212
 URL: https://issues.apache.org/jira/browse/YARN-3212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Junping Du
Assignee: Junping Du
 Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch


 As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
 can transition from “running” state triggered by a new event - 
 “decommissioning”. 
 This new state can be transit to state of “decommissioned” when 
 Resource_Update if no running apps on this NM or NM reconnect after restart. 
 Or it received DECOMMISSIONED event (after timeout from CLI).
 In addition, it can back to “running” if user decides to cancel previous 
 decommission by calling recommission on the same node. The reaction to other 
 events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-03-13 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361409#comment-14361409
 ] 

Vinod Kumar Vavilapalli commented on YARN-1963:
---

Assuming integers are supported
 - Do we have a range?  Otherwise, nothing stops users from setting their 
priority be INTEGER_MAX and everybody scratching their heads.
 - If we have a range, which side is up? is -20 20 like unix (isn't intuitive 
at all to me) or -20  20 (intuitive)?
 - Either ways, it is an implicit decision that needs to be documented and told 
to users explicitly. Labels convey that without any of that.
 - What does a negative priority means anything anyways?
 - Admin comes and says I need a new super-high priority, now your ranges 
need to be dynamically size-able.

I don't see a difference between say 10 priorities and 10 labeled priorities, 
other than that labels are better in the following
 - They are more *human readable* on the UI and CLIs: This app has priority 
19 doesn't give much feedback as much as This app has HIGH priority
 - Even if we don't want them now, you can let admins create new priorities 
between two existing ones, create a new priority lower than the lowest easily 
etc. With integers, you start with 0-10, then adding one more lower than them 
all takes them into negative priorities' territory making it all confusing.
 - Specifying restrictions is very straight forward: for a root.enginnering 
queue, VERY_HIGH can be only be used by (u1,u2, g1), HIGH by (u3, u4) and 
everything else by everyone.

The way I see it, we will provide a predefined set of labeled priorities that 
should work for 80% of the clusters, the remaining can define their own set.

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: 0001-YARN-1963-prototype.patch, YARN Application 
 Priorities Design.pdf, YARN Application Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-03-13 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3318:
--
Attachment: YARN-3318.17.patch

With support for configuration via the scheduler's config file

 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
 YARN-3318.17.patch


 Create the initial framework required for using OrderingPolicies with 
 SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
 will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.


[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361274#comment-14361274
 ] 

Jian He commented on YARN-3243:
---

one thing is that the approach of subtracting all reserved resources to pass 
through various limits so as to dive down into the sub-queues may cause a lot 
of dry loop, which can be fixed separately. 

Patch looks good to me.  +1

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API


[ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361397#comment-14361397
 ] 

Hadoop QA commented on YARN-3345:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704531/YARN-3345.1.patch
  against trunk revision 6fdef76.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1153 javac 
compiler warnings (more than the trunk's current 1152 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 8 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6959//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6959//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6959//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6959//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6959//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6959//console

This message is automatically generated.

 Add non-exclusive node label RMAdmin CLI/API
 

 Key: YARN-3345
 URL: https://issues.apache.org/jira/browse/YARN-3345
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3345.1.patch


 As described in YARN-3214 (see design doc attached to that JIRA), we need add 
 non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API

2015-03-13 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3345:
-
Attachment: YARN-3345.1.patch

Attached ver.1 patch.

 Add non-exclusive node label RMAdmin CLI/API
 

 Key: YARN-3345
 URL: https://issues.apache.org/jira/browse/YARN-3345
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3345.1.patch


 As described in YARN-3214 (see design doc attached to that JIRA), we need add 
 non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery


[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361353#comment-14361353
 ] 

Junping Du commented on YARN-3039:
--

bq. Yes, I get the reasoning for annotating individual methods. My concern is 
more about the new classes. Note that we're still evolving even the class 
names. This might be a fine point, but I feel we should annotate the new 
classes at least as unstable for now in addition to the method annotations. 
Thoughts?
Agree. I think in v5 patch, I tried to mark all interfaces (include some 
abstract classes, we don't need to mark implementation because it follow the 
same with parent class/interface) with either Evolving or Unstable. Please let 
me know if I miss something there.

bq. So you mean it will be backward compatible, right?
Yes. I mean this.

bq. NM doesn't know the RPC port for the per-all collector in the special 
container until ... the special containers tells it. This is not a problem with 
the current per-node collector container situation.
Make sense. That's also a good reason to keep NM as RPC server and 
aggregator(collector)Collection as client.

 [Aggregator wireup] Implement ATS app-appgregator service discovery
 ---

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, 
 YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, 
 YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


 [ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3294:

Attachment: apache-yarn-3294.1.patch

Uploading patch again to kick off jenkins.

 Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
 period
 -

 Key: YARN-3294
 URL: https://issues.apache.org/jira/browse/YARN-3294
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
 apache-yarn-3294.0.patch, apache-yarn-3294.1.patch


 It would be nice to have a button on the web UI that would allow dumping of 
 debug logs for just the capacity scheduler for a fixed period of time(1 min, 
 5 min or so) in a separate log file. It would be useful when debugging 
 scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer


 [ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3336:

Attachment: YARN-3336.003.patch

 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
 YARN-3336.002.patch, YARN-3336.003.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes


[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360032#comment-14360032
 ] 

Naganarasimha G R commented on YARN-3326:
-

Hi [~vvasudev],  consider the scenario where user wants for all labels and 
hence does not pass any labels ? 
In that case the url will be just /nodes which i feel not so good , ur 
thoughts?

 ReST support for getLabelsToNodes 
 --

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-3326.20150310-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes


[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360007#comment-14360007
 ] 

Varun Vasudev commented on YARN-3326:
-

How about /nodes?labels=label1,label2 etc? If I understand it right - you 
want to give a list of labels and get the nodes back for those labels, so 
/nodes?labels= form should work?

 ReST support for getLabelsToNodes 
 --

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-3326.20150310-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer


[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360025#comment-14360025
 ] 

zhihai xu commented on YARN-3336:
-

I uploaded a new patch YARN-3336.003.patch to fix the test failure due to the 
change in FileSystem.


 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
 YARN-3336.002.patch, YARN-3336.003.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj instanceof Key) {
   Key that = (Key)obj;
   return isEqual(this.scheme, that.scheme)
   isEqual(this.authority, that.authority)
   isEqual(this.ugi, that.ugi)
   (this.unique == that.unique);
 }
 return false;
   }
 {code}
 UserGroupInformation.equals will compare subject by reference.
 {code}
   public boolean equals(Object o) {
 if (o == this) {
   return true;
 } else if (o == null || getClass() != o.getClass()) {
   return false;
 } else {
   return subject == ((UserGroupInformation) o).subject;
 }
   }
 {code}
 So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
 are called, a new FileSystem will be created and a new entry will be added to 
 FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3263) ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream


 [ 
https://issues.apache.org/jira/browse/YARN-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu resolved YARN-3263.
-
Resolution: Not a Problem

This is not an issue.
tokens.rewind() is called before credentials.readTokenStorageStream(buf).
This will have the same effect as rewind after readTokenStorageStream.
Also no other place accesses the tokens except parseCredentials.

 ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after 
 credentials.readTokenStorageStream
 --

 Key: YARN-3263
 URL: https://issues.apache.org/jira/browse/YARN-3263
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu

 ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after 
 credentials.readTokenStorageStream. So the next time if we access Tokens, we 
 will have EOFException.
 The following is the code for parseCredentials in ContainerManagerImpl.
 {code}
   private Credentials parseCredentials(ContainerLaunchContext launchContext)
   throws IOException {
 Credentials credentials = new Credentials();
 //  Parse credentials
 ByteBuffer tokens = launchContext.getTokens();
 if (tokens != null) {
   DataInputByteBuffer buf = new DataInputByteBuffer();
   tokens.rewind();
   buf.reset(tokens);
   credentials.readTokenStorageStream(buf);
   if (LOG.isDebugEnabled()) {
 for (Token? extends TokenIdentifier tk : 
 credentials.getAllTokens()) {
   LOG.debug(tk.getService() +  =  + tk.toString());
 }
   }
 }
 //  End of parsing credentials
 return credentials;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes


[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360054#comment-14360054
 ] 

Varun Vasudev commented on YARN-3326:
-

Will /nodes?labels=* work?

 ReST support for getLabelsToNodes 
 --

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-3326.20150310-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3261) rewrite resourcemanager restart doc to remove roadmap bits

2015-03-13 Thread Gururaj Shetty (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360129#comment-14360129
 ] 

Gururaj Shetty commented on YARN-3261:
--

Hi [~aw]/[~rohithsharma],

Kindly review the patch attached. 

 rewrite resourcemanager restart doc to remove roadmap bits 
 ---

 Key: YARN-3261
 URL: https://issues.apache.org/jira/browse/YARN-3261
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3261.01.patch


 Another mixture of roadmap and instruction manual that seems to be ever 
 present in a lot of the recently written documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


 [ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3273:
-
Attachment: 0001-YARN-3273-v2.patch

 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period

[
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360137#comment-14360137
]

Hadoop QA commented on YARN-3294:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12704374/apache-yarn-3294.1.patch
against trunk revision 387f271.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.

{color:red}-1 javac{color}. The applied patch generated 1177 javac
compiler warnings (more than the trunk's current 1152 warnings).

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 5 new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-common-project/hadoop-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.ha.TestActiveStandbyElectorRealZK

org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

The following test timeouts occurred in
hadoop-common-project/hadoop-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.metrics2.lib.TestMutableMetrics
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestTestTests
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/6952//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/6952//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Javac warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/6952//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6952//console

This message is automatically generated.

Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time
period
-

Key: YARN-3294
URL: https://issues.apache.org/jira/browse/YARN-3294
Project: Hadoop YARN
Issue Type: Improvement
Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png,
apache-yarn-3294.0.patch, apache-yarn-3294.1.patch

It would be nice to have a button on the web UI that would allow dumping of
debug logs for just the capacity scheduler for a fixed period of time(1 min,
5 min or so) in a separate log file. It would be useful when debugging
scheduler behavior without affecting the rest of the resourcemanager.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

[
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360148#comment-14360148
]

Rohith commented on YARN-3273:
--

Attached v2 patch for surfacing scheduler metrics. And attached the screenshot
of changed UI page.
YARN-3273-am-resource-used-AND-User-limit-v2.PNG shows following metrics
# SchedulerMetrics table is added in front page. This table containes generic
scheduler data like schedulerType,schedulerResourceType,min/max resource
allocation. This table can be used in future for other common scheduler metrics
to display.
# *Used Application Master Resources:* added for each leafqueue info.
# Active users info table is added per CS#LeafQueue. This display user's
ResourceLimit, ReusourceUsed,AM Resource,AM ResourceUsed and others. Since it
is specific to CS, this is added in this page.

YARN-3273-application-headroom-v2.PNG
# For headroom, it is added only display block with empty data. Since headroom
is not part of RMAppAttemptMetrics, retrieving this info directly from
scheduler is tedious on the fly. Headroom need to be stored in either RMApp or
RMAttempt state. I think headroom can be in RMAppAttemptMetric and render only
if attempt is running. Any thoughts?

Improve web UI to facilitate scheduling analysis and debugging
--

Key: YARN-3273
URL: https://issues.apache.org/jira/browse/YARN-3273
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch,
YARN-3273-am-resource-used-AND-User-limit-v2.PNG,
YARN-3273-am-resource-used-AND-User-limit.PNG,
YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG

Job may be stuck for reasons such as:
- hitting queue capacity
- hitting user-limit,
- hitting AM-resource-percentage
The first queueCapacity is already shown on the UI.
We may surface things like:
- what is user's current usage and user-limit;
- what is the AM resource usage and limit;
- what is the application's current HeadRoom;

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes


[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360155#comment-14360155
 ] 

Varun Vasudev commented on YARN-3326:
-

In general REST APIs are supposed to be about resources. labelsToNodes is not 
a resource. 
{quote}
getNodeToLabels to /get-node-to-labels
replaceLabelsOnNodes to /replace-node-to-labels
getClusterNodeLabels to /get-node-labels
addToClusterNodeLabels  to /add-node-labels
removeFromCluserNodeLabels to /remove-node-labels
getLabelsOnNode to /nodes/\{nodeId}/get-labels
replaceLabelsOnNodes to /replace-node-to-labels ...
{quote}
are not about resources either but they're already in and by adding more APIs 
of that form we're making things worse.

 ReST support for getLabelsToNodes 
 --

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-3326.20150310-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes


[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360146#comment-14360146
 ] 

Naganarasimha G R commented on YARN-3326:
-

Well [~vvasudev], i feel /nodes would be better than having internal logic 
for  \*   and checked again and saw that /nodes is already used for 
getNodes  but what abt /labelsToNodes or /labels-to-nodes you feel this is 
big  ?

 ReST support for getLabelsToNodes 
 --

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-3326.20150310-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes


[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360171#comment-14360171
 ] 

Naganarasimha G R commented on YARN-3326:
-

[~vvasudev] How about /label-mappings?label=label1,label2,... ?

 ReST support for getLabelsToNodes 
 --

 Key: YARN-3326
 URL: https://issues.apache.org/jira/browse/YARN-3326
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-3326.20150310-1.patch


 REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360140#comment-14360140
 ] 

Hadoop QA commented on YARN-3273:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12704390/YARN-3273-application-headroom-v2.PNG
  against trunk revision 387f271.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6953//console

This message is automatically generated.

 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
 YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3261) rewrite resourcemanager restart doc to remove roadmap bits

2015-03-13 Thread Gururaj Shetty (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gururaj Shetty updated YARN-3261:
-
Attachment: YARN-3261.01.patch

 rewrite resourcemanager restart doc to remove roadmap bits 
 ---

 Key: YARN-3261
 URL: https://issues.apache.org/jira/browse/YARN-3261
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3261.01.patch


 Another mixture of roadmap and instruction manual that seems to be ever 
 present in a lot of the recently written documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


 [ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3273:
-
Attachment: YARN-3273-application-headroom-v2.PNG
YARN-3273-am-resource-used-AND-User-limit-v2.PNG

 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
 YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer


[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360141#comment-14360141
 ] 

Hadoop QA commented on YARN-3336:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704376/YARN-3336.003.patch
  against trunk revision 387f271.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6951//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6951//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6951//console

This message is automatically generated.

 FileSystem memory leak in DelegationTokenRenewer
 

 Key: YARN-3336
 URL: https://issues.apache.org/jira/browse/YARN-3336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
 YARN-3336.002.patch, YARN-3336.003.patch


 FileSystem memory leak in DelegationTokenRenewer.
 Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
 FileSystem entry will be added to  FileSystem#CACHE which will never be 
 garbage collected.
 This is the implementation of obtainSystemTokensForUser:
 {code}
   protected Token?[] obtainSystemTokensForUser(String user,
   final Credentials credentials) throws IOException, InterruptedException 
 {
 // Get new hdfs tokens on behalf of this user
 UserGroupInformation proxyUser =
 UserGroupInformation.createProxyUser(user,
   UserGroupInformation.getLoginUser());
 Token?[] newTokens =
 proxyUser.doAs(new PrivilegedExceptionActionToken?[]() {
   @Override
   public Token?[] run() throws Exception {
 return FileSystem.get(getConfig()).addDelegationTokens(
   UserGroupInformation.getLoginUser().getUserName(), credentials);
   }
 });
 return newTokens;
   }
 {code}
 The memory leak happened when FileSystem.get(getConfig()) is called with a 
 new proxy user.
 Because createProxyUser will always create a new Subject.
 The calling sequence is 
 FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), 
 conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, 
 conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf)
 {code}
 public static UserGroupInformation createProxyUser(String user,
   UserGroupInformation realUser) {
 if (user == null || user.isEmpty()) {
   throw new IllegalArgumentException(Null user);
 }
 if (realUser == null) {
   throw new IllegalArgumentException(Null real user);
 }
 Subject subject = new Subject();
 SetPrincipal principals = subject.getPrincipals();
 principals.add(new User(user));
 principals.add(new RealUser(realUser));
 UserGroupInformation result =new UserGroupInformation(subject);
 result.setAuthenticationMethod(AuthenticationMethod.PROXY);
 return result;
   }
 {code}
 FileSystem#Cache#Key.equals will compare the ugi
 {code}
   Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase();
 authority = 
 uri.getAuthority()==null?:uri.getAuthority().toLowerCase();
 this.unique = unique;
 this.ugi = UserGroupInformation.getCurrentUser();
   }
   public boolean equals(Object obj) {
 if (obj == this) {
   return true;
 }
 if (obj != null  obj

[jira] [Commented] (YARN-3261) rewrite resourcemanager restart doc to remove roadmap bits


[ 
https://issues.apache.org/jira/browse/YARN-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360197#comment-14360197
 ] 

Hadoop QA commented on YARN-3261:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704385/YARN-3261.01.patch
  against trunk revision 387f271.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6954//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6954//console

This message is automatically generated.

 rewrite resourcemanager restart doc to remove roadmap bits 
 ---

 Key: YARN-3261
 URL: https://issues.apache.org/jira/browse/YARN-3261
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3261.01.patch


 Another mixture of roadmap and instruction manual that seems to be ever 
 present in a lot of the recently written documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation


[ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360227#comment-14360227
 ] 

Rohith commented on YARN-3305:
--

Updated the patch correcting test failures. 
TestCapacitySchedulerNodeLabelUpdate failure JIRA is YARN-3343 .
Kindly review the updated patch

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 
 0002-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360250#comment-14360250
 ] 

Hudson commented on YARN-3154:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #865 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/865/])
YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating 
running logs of application when rolling is enabled. Contributed by Xuan Gong. 
(vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
 YARN-3154.4.patch


 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN


[ 
https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360251#comment-14360251
 ] 

Hudson commented on YARN-3338:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #865 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/865/])
YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: 
rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44)
* hadoop-project/pom.xml
* hadoop-yarn-project/CHANGES.txt


 Exclude jline dependency from YARN
 --

 Key: YARN-3338
 URL: https://issues.apache.org/jira/browse/YARN-3338
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3338.1.patch


 It was fixed in YARN-2815, but is broken again by YARN-1514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360237#comment-14360237
 ] 

Hudson commented on YARN-3154:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #131 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/131/])
YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating 
running logs of application when rolling is enabled. Contributed by Xuan Gong. 
(vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java


 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
 YARN-3154.4.patch


 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360437#comment-14360437
 ] 

Hudson commented on YARN-3154:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #122 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/122/])
YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating 
running logs of application when rolling is enabled. Contributed by Xuan Gong. 
(vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java


 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
 YARN-3154.4.patch


 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-03-13 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360415#comment-14360415
 ] 

Mit Desai commented on YARN-2890:
-

[~hitesh] [~zjshen] Can you guys take a look?

 MiniMRYarnCluster should turn on timeline service if configured to do so
 

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN


[ 
https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360438#comment-14360438
 ] 

Hudson commented on YARN-3338:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #122 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/122/])
YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: 
rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44)
* hadoop-yarn-project/CHANGES.txt
* hadoop-project/pom.xml


 Exclude jline dependency from YARN
 --

 Key: YARN-3338
 URL: https://issues.apache.org/jira/browse/YARN-3338
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3338.1.patch


 It was fixed in YARN-2815, but is broken again by YARN-1514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360430#comment-14360430
 ] 

Hudson commented on YARN-3154:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2063 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2063/])
YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating 
running logs of application when rolling is enabled. Contributed by Xuan Gong. 
(vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java


 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
 YARN-3154.4.patch


 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN


[ 
https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360431#comment-14360431
 ] 

Hudson commented on YARN-3338:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2063 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2063/])
YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: 
rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44)
* hadoop-yarn-project/CHANGES.txt
* hadoop-project/pom.xml


 Exclude jline dependency from YARN
 --

 Key: YARN-3338
 URL: https://issues.apache.org/jira/browse/YARN-3338
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3338.1.patch


 It was fixed in YARN-2815, but is broken again by YARN-1514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation


 [ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3305:
-
Attachment: 0002-YARN-3305.patch

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 
 0002-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN


[ 
https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360238#comment-14360238
 ] 

Hudson commented on YARN-3338:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #131 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/131/])
YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: 
rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44)
* hadoop-yarn-project/CHANGES.txt
* hadoop-project/pom.xml


 Exclude jline dependency from YARN
 --

 Key: YARN-3338
 URL: https://issues.apache.org/jira/browse/YARN-3338
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3338.1.patch


 It was fixed in YARN-2815, but is broken again by YARN-1514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation


[ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360297#comment-14360297
 ] 

Hadoop QA commented on YARN-3305:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704406/0002-YARN-3305.patch
  against trunk revision 387f271.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6955//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6955//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6955//console

This message is automatically generated.

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 
 0002-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360498#comment-14360498
 ] 

Hudson commented on YARN-3154:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #131 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/131/])
YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating 
running logs of application when rolling is enabled. Contributed by Xuan Gong. 
(vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
 YARN-3154.4.patch


 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage

Varun Vasudev created YARN-3348:
---

 Summary: Add a 'yarn top' tool to help understand cluster usage
 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev


It would be helpful to have a 'yarn top' tool that would allow administrators 
to understand which apps are consuming resources.

Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
and show you statistics on container allocation across the cluster to find out 
which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN


[ 
https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360364#comment-14360364
 ] 

Hudson commented on YARN-3338:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2081 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2081/])
YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: 
rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44)
* hadoop-yarn-project/CHANGES.txt
* hadoop-project/pom.xml


 Exclude jline dependency from YARN
 --

 Key: YARN-3338
 URL: https://issues.apache.org/jira/browse/YARN-3338
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3338.1.patch


 It was fixed in YARN-2815, but is broken again by YARN-1514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360363#comment-14360363
 ] 

Hudson commented on YARN-3154:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2081 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2081/])
YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating 
running logs of application when rolling is enabled. Contributed by Xuan Gong. 
(vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
 YARN-3154.4.patch


 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360387#comment-14360387
 ] 

Varun Vasudev commented on YARN-3294:
-

The findbugs errors are unrelated to the patch. The test failures are also 
unrelated as per my analysis and I'm unsure of the Javac warnings since they 
seem to be from files I didn't modify. [~jianhe], can you help me out and take 
a look at the patch?

 Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
 period
 -

 Key: YARN-3294
 URL: https://issues.apache.org/jira/browse/YARN-3294
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
 apache-yarn-3294.0.patch, apache-yarn-3294.1.patch


 It would be nice to have a button on the web UI that would allow dumping of 
 debug logs for just the capacity scheduler for a fixed period of time(1 min, 
 5 min or so) in a separate log file. It would be useful when debugging 
 scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-03-13 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360591#comment-14360591
 ] 

Devaraj K commented on YARN-3225:
-

I was describing in my previous comment about having two RMAdmin CLI's for 
increasing the timeout value, one CLI runs with the timeout(say ‘x’) value and 
continues waiting for timeout, during this time another CLI issues the command 
with the higher timeout(x). If we keep the CLI(x timeout) running it would 
issue the forceful decommission with x timeout and new CLI timeout(x) will not 
reflect. If we have a constraint that we should issue graceful decommission 
command from only one RMAdmin CLI then this issue will not be a problem.

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation


[ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360586#comment-14360586
 ] 

Rohith commented on YARN-3305:
--

Failed test is unrelated to this patch.

 AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
 less than minimumAllocation
 

 Key: YARN-3305
 URL: https://issues.apache.org/jira/browse/YARN-3305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 
 0002-YARN-3305.patch


 For given any ResourceRequest, {{CS#allocate}} normalizes request to 
 minimumAllocation if requested memory is less than minimumAllocation.
 But AM-used resource is updated with actual ResourceRequest made by user. 
 This results in AM container allocation more than Max ApplicationMaster 
 Resource.
 This is because AM-Used is updated with actual ResourceRequest made by user 
 while activating the applications. But during allocation of container, 
 ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated


[ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360605#comment-14360605
 ] 

Zhijie Shen commented on YARN-2854:
---

Hi Naga, thanks for updating the patch. Almost good to me. Some additional 
comments.

1. Format is wrong around the following sentence (from the built web page). It 
seems to be the problem of quote mark. Would you please double check?
{code}
Developers can define what information they want to record for their 
applications by composing `TimelineEntity and `TimelineEvent` objects, and put 
the entities and events to the Timeline server via `TimelineClient`. Below is 
an example:
{code}

2. Changing Publishing of per-framework data by applications to Publishing 
of application-specific data?

3. In Current Status, shall we also mention  we're rolling out the timeline 
service next generation as a scalable solution?

 The document about timeline service and generic service needs to be updated
 ---

 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, 
 YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, 
 YARN-2854.20150311-1.patch, timeline_structure.jpg






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN

2015-03-13 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360652#comment-14360652
 ] 

Craig Welch commented on YARN-3306:
---

Thanks for your thoughts, [~kasha]

The immediate proposal is to begin adding new functionality in a fashion which 
can be easily shared across scheduler implementations and mixed together in a 
single cluster.  The first case is to support additional container assignment 
and preemption types to fifo for applications in the capacity scheduler and 
potentially the fair scheduler using the same code, but this is expected to be 
expanded to cover queue relationships and potentially other behaviors (limits, 
etc) over time.

The hope is that this allows us to iterate toward a state where the various 
behaviors of the schedulers can be mixed, matched, and shared across 
implementations rather than having to try and accomplish this all in one go, 
and allows us to achieve the benefit of mixing and matching some of the 
features earlier/along the way.

I suspect that at some point we'll hit a critical mass where enough of the 
functionality has been extracted to sharable components and where we've been 
able to establish an understanding of how these can be made to compose well, 
and then we'll take that as an inflection point and go down the path you are 
suggesting, introduce a new scheduler to house the policies and in that way 
complete the picture, deprecating the others.  That's by no means the only 
possible conclusion, but it seems to be a good and/or  likely one.

 [Umbrella] Proposing per-queue Policy driven scheduling in YARN
 ---

 Key: YARN-3306
 URL: https://issues.apache.org/jira/browse/YARN-3306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: PerQueuePolicydrivenschedulinginYARN.pdf


 Scheduling layout in Apache Hadoop YARN today is very coarse grained. This 
 proposal aims at converting today’s rigid scheduling in YARN to a per-queue 
 policy driven architecture.
 We propose the creation of a common policy framework and implement acommon 
 set of policies that administrators can pick and chose per queue
  - Make scheduling policies configurable per queue
  - Initially, we limit ourselves to a new type of scheduling policy that 
 determines the ordering of applications within the leaf queue
  - In the near future, we will also pursue parent queue level policies and 
 potential algorithm reuse through a separate type of policies that control 
 resource limits per queue, user, application etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated