date:20150407


 [ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3443:

Attachment: YARN-3443.004.patch

Patch with documentation fixes.

 Create a 'ResourceHandler' subsystem to ease addition of support for new 
 resource types on the NM
 -

 Key: YARN-3443
 URL: https://issues.apache.org/jira/browse/YARN-3443
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3443.001.patch, YARN-3443.002.patch, 
 YARN-3443.003.patch, YARN-3443.004.patch


 The current cgroups implementation is closely tied to supporting CPU as a 
 resource . We need to separate out CGroups support as well a provide a simple 
 ResourceHandler subsystem that will enable us to add support for new resource 
 types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


 [ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3366:

Attachment: YARN-3366.002.patch

Uploading a patch that includes changes to YarnConfiguration.java

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3366.001.patch, YARN-3366.002.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp


[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482713#comment-14482713
 ] 

Hadoop QA commented on YARN-3021:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723545/YARN-3021.007.patch
  against trunk revision 3fb5abf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader
  org.apache.hadoop.mapred.TestLineRecordReader
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apTests
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7233//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7233//console

This message is automatically generated.

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling

[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482789#comment-14482789
 ] 

Varun Vasudev commented on YARN-3366:
-

Thanks for the patch [~sidharta-s]! Feedback below.

# In YarnConfiguration.java
{noformat}
   /**
-   * True if linux-container-executor should limit itself to one user
+   * If linux-container-executor should limit itself to one user
* when running in non-secure mode.
*/
-  public static final String NM_NONSECURE_MODE_LIMIT_USERS = NM_PREFIX +
+  public static final String NM_NONSECURE_MODE_LIMIT_USERS= NM_PREFIX +
  linux-container-executor.nonsecure-mode.limit-users;

-  public static final boolean DEFAULT_NM_NONSECURE_MODE_LIMIT_USERS = true;
+  public static final boolean DEFAULT_NM_NONSECURE_MODE_LIMIT_USERS = true;
{noformat}

It looks like these are unnecessary changes. Can you please remove them?
# In TrafficController.java
{noformat}
if (LOG.isInfoEnabled()) {
  LOG.info(NM recovery is not enabled.);
}
{noformat}
{noformat}
if (LOG.isInfoEnabled()) {
  LOG.info(TC configuration is incomplete.);
}
{noformat}
Can you change these to debug? It doesn't seem to be something that needs to be 
logged by the class.
# In TrafficController.java
{noformat}
else {
  if (LOG.isWarnEnabled()) {
String logLine = new StringBuffer(Failed to match regex: )
  .append(regex).append( Current state: 
).append(state).toString();
LOG.warn(logLine);
return false;
  }
}
{noformat}
Shouldn't the return be outside the warn enabled check? 
# In TrafficController.java
{noformat}
//This could happen if the interface is already in its default state.
//Ignoring.
//throw new ResourceHandlerException(Failed to wipe tc state, e);
{noformat}
The comments are in a different block than the warn message. Also, the 
commented throw is confusing.
# Minor nit - In TrafficController.java, function parseStatsString, the 
continue isn't really required
# In TrafficControlBandwidthHandlerImpl.java - Unused import import 
com.google.common.annotations.VisibleForTesting
# In TrafficControlBandwidthHandlerImpl.java 
{noformat}
LOG.info(strict mode is set to :  + strictMode);
{noformat}
{noformat}
LOG.info(Attempting to reacquire classId for container:  +
  containerIdStr);
{noformat}
Change levels to debug?
# In TrafficControlBandwidthHandlerImpl.java
{noformat}
String opArg = new StringBuffer(PrivilegedOperation.CGROUP_ARG_PREFIX)
.append(tasksFile).toString();
{noformat}
You can use the String class itself instead of StringBuffer?
# In TrafficControlBandwidthHandlerImpl.java
{noformat}
if (LOG.isWarnEnabled()) {
  LOG.warn(teardown(): Nothing to do);
}
{noformat}
Why are you logging a warning?
# In TestTrafficControlBandwidthHandlerImpl.java and TestTrafficController.java
{noformat}
Assert.assertTrue(Caught unexpected ResourceHandlerException!, false);
{noformat}
User Assert.fail? This pattern is used in multiple places.
# In LinuxContainerExecutor.java.java
{noformat}
} catch (ResourceHandlerException e) {
+  if (LOG.isWarnEnabled()) {
+LOG.warn(ResourceHandlerChain.reacquireContainer failed for  +
+containerId:  + containerId);
+  }
{noformat}
Can you add the exception to the warn message?
# In LinuxContainerExecutor.java
{noformat}
} catch (ResourceHandlerException e) {
if (LOG.isWarnEnabled()) {
  LOG.warn(e);
  LOG.warn(ResourceHandlerChain.postComplete failed for  +
  containerId:  + containerId);
}
}
{noformat}
Merge the warn messages.
# In LinuxContainerExecutor.java
{noformat}
+command.addAll(Arrays.asList(containerExecutorExe,
{noformat}
Remove the extra space added.
# In LinuxContainerExecutor.java
{noformat}
+String tcCommandFile = null;
+
+try {
+  if (resourceHandlerChain != null) {
+ListPrivilegedOperation ops = resourceHandlerChain
+.preStart(container);
+
+if (ops != null) {
+  ListPrivilegedOperation resourceOps = new ArrayList();
+
+  resourceOps.add(new PrivilegedOperation
+  (PrivilegedOperation.OperationType.ADD_PID_TO_CGROUP,
+  resourcesOptions));
+
+  for (PrivilegedOperation op : ops) {
+switch (op.getOperationType()) {
+  case ADD_PID_TO_CGROUP:
+resourceOps.add(op);
+break;
+  case TC_MODIFY_STATE:
+tcCommandFile = op.getArguments().get(0);
+  default:
+if (LOG.isWarnEnabled()) {
+  LOG.warn(PrivilegedOperation type unsupported in launch: 
+  + op.getOperationType());
+}
+continue;
+}
+  }
+
+  if (resourceOps.size()  1) {
+//squash resource operations
+try {
+  PrivilegedOperation operation =

[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM


[ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482688#comment-14482688
 ] 

Hadoop QA commented on YARN-3443:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723549/YARN-3443.004.patch
  against trunk revision 3fb5abf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7234//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7234//console

This message is automatically generated.

 Create a 'ResourceHandler' subsystem to ease addition of support for new 
 resource types on the NM
 -

 Key: YARN-3443
 URL: https://issues.apache.org/jira/browse/YARN-3443
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3443.001.patch, YARN-3443.002.patch, 
 YARN-3443.003.patch, YARN-3443.004.patch


 The current cgroups implementation is closely tied to supporting CPU as a 
 resource . We need to separate out CGroups support as well a provide a simple 
 ResourceHandler subsystem that will enable us to add support for new resource 
 types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3404) View the queue name to YARN Application page


[ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482717#comment-14482717
 ] 

Hadoop QA commented on YARN-3404:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723543/YARN-3404.2.patch
  against trunk revision 3fb5abf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7232//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7232//console

This message is automatically generated.

 View the queue name to YARN Application page
 

 Key: YARN-3404
 URL: https://issues.apache.org/jira/browse/YARN-3404
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3404.1.patch, YARN-3404.2.patch, screenshot.png


 It want to display the name of the queue that is used to YARN Application 
 page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI

2015-04-07 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482922#comment-14482922
 ] 

Naganarasimha G R commented on YARN-3102:
-

Hi  [~zhiguohong], I had actually started to work on this patch but was 
skeptical that YARN-914 (or its subjira's ) might have impact or take care of 
this issue. Give me couple of days time, will try the check the state of my 
patch and update you.

 Decommisioned Nodes not listed in Web UI
 

 Key: YARN-3102
 URL: https://issues.apache.org/jira/browse/YARN-3102
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
 Environment: 2 Node Manager and 1 Resource Manager 
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Minor

 Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
 yarn.exlude file In RM1 machine
 Add Yarn.exclude with NM1 Host Name 
 Start the node as listed below NM1,NM2 Resource manager
 Now check Nodes decommisioned in /cluster/nodes
 Number of decommisioned node is listed as 1 but Table is empty in 
 /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.


 [ 
https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned YARN-2444:


Assignee: Steve Loughran

 Primary filters added after first submission not indexed, cause exceptions in 
 logs.
 ---

 Key: YARN-2444
 URL: https://issues.apache.org/jira/browse/YARN-2444
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.5.0
Reporter: Marcelo Vanzin
Assignee: Steve Loughran
 Attachments: YARN-2444-001.patch, ats.java, 
 org.apache.hadoop.yarn.server.timeline.TestTimelineClientPut-output.txt


 See attached code for an example. The code creates an entity with a primary 
 filter, submits it to the ATS. After that, a new primary filter value is 
 added and the entity is resubmitted. At that point two things can be seen:
 - Searching for the new primary filter value does not return the entity
 - The following exception shows up in the logs:
 {noformat}
 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying 
 access for user dr.who (auth:SIMPLE) on the events of the timeline entity { 
 id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test }
 org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the 
 timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test 
 } is corrupted.
 at 
 org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67)
 at 
 org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI

2015-04-07 Thread Hong Zhiguo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482891#comment-14482891
 ] 

Hong Zhiguo commented on YARN-3102:
---

I met the same problem. Hi,  [~Naganarasimha], can I take this issue?

 Decommisioned Nodes not listed in Web UI
 

 Key: YARN-3102
 URL: https://issues.apache.org/jira/browse/YARN-3102
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
 Environment: 2 Node Manager and 1 Resource Manager 
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Minor

 Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
 yarn.exlude file In RM1 machine
 Add Yarn.exclude with NM1 Host Name 
 Start the node as listed below NM1,NM2 Resource manager
 Now check Nodes decommisioned in /cluster/nodes
 Number of decommisioned node is listed as 1 but Table is empty in 
 /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-1394) RM to inform AMs when a container completed due to NM going offline -planned or unplanned

2015-04-07 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-1394:


Assignee: Rohith

 RM to inform AMs when a container completed due to NM going offline -planned 
 or unplanned
 -

 Key: YARN-1394
 URL: https://issues.apache.org/jira/browse/YARN-1394
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Rohith

 YARN-914 proposes graceful decommission of an NM, and NMs already have the 
 right to go offline.
 If AMs could be told that a container completed from an NM option -offline vs 
 decommission, the AM could use that in its future blacklisting and placement 
 policy. 
 This matters in long-lived services which may like to place new instances 
 where they were placed before, and track hosts failure rates



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2429) LCE should blacklist based upon group


[ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482979#comment-14482979
 ] 

Hudson commented on YARN-2429:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #156 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/156/])
YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error 
message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 
99b08a748e7b00a58b63330b353902a6da6aae27)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 LCE should blacklist based upon group
 -

 Key: YARN-2429
 URL: https://issues.apache.org/jira/browse/YARN-2429
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Allen Wittenauer

 It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482977#comment-14482977
 ] 

Hudson commented on YARN-3273:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #156 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/156/])
Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 
3fb5abfc87953377f86e06578518801a181d7697)
* hadoop-yarn-project/CHANGES.txt


 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3456) Improve handling of incomplete TimelineEntities

2015-04-07 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3456:
--

Assignee: Varun Saxena

 Improve handling of incomplete TimelineEntities
 ---

 Key: YARN-3456
 URL: https://issues.apache.org/jira/browse/YARN-3456
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Varun Saxena
Priority: Minor

 If an incomplete TimelineEntity is posted, it isn't checked client side ... 
 it gets all the way to the far end before triggering an NPE in the store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2429) LCE should blacklist based upon group


[ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482986#comment-14482986
 ] 

Hudson commented on YARN-2429:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #890 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/890/])
YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error 
message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 
99b08a748e7b00a58b63330b353902a6da6aae27)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java
* hadoop-yarn-project/CHANGES.txt


 LCE should blacklist based upon group
 -

 Key: YARN-2429
 URL: https://issues.apache.org/jira/browse/YARN-2429
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Allen Wittenauer

 It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482984#comment-14482984
 ] 

Hudson commented on YARN-3273:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #890 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/890/])
Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 
3fb5abfc87953377f86e06578518801a181d7697)
* hadoop-yarn-project/CHANGES.txt


 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3456) Improve handling of incomplete TimelineEntities


[ 
https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483040#comment-14483040
 ] 

Steve Loughran commented on YARN-3456:
--

Stack trace when an entity with a null type is posted by the client. Client 
side preflight checking could prevent some of this, REST API validation be even 
stronger

{code}
2015-04-07 12:23:40,290 [614480043@qtp-2026808370-0] INFO  
container.GuiceComponentProviderFactory 
(GuiceComponentProviderFactory.java:getComponentProvider(159)) - Binding 
org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices to 
GuiceManagedComponentProvider with the scope Singleton
2015-04-07 12:23:40,632 [614480043@qtp-2026808370-0] ERROR 
timeline.TimelineDataManager (TimelineDataManager.java:postEntities(275)) - 
Skip the timeline entity: { id: post, type: null }
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$KeyBuilder.add(LeveldbTimelineStore.java:352)
at 
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.createStartTimeLookupKey(LeveldbTimelineStore.java:1188)
at 
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getStartTimeLong(LeveldbTimelineStore.java:1081)
at 
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getEntity(LeveldbTimelineStore.java:433)
at 
org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(TimelineDataManager.java:257)
at 
org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.postEntities(TimelineWebServices.java:259)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at

[jira] [Created] (YARN-3456) Improve handling of incomplete TimelineEntities

Steve Loughran created YARN-3456:


 Summary: Improve handling of incomplete TimelineEntities
 Key: YARN-3456
 URL: https://issues.apache.org/jira/browse/YARN-3456
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Steve Loughran
Priority: Minor


If an incomplete TimelineEntity is posted, it isn't checked client side ... it 
gets all the way to the far end before triggering an NPE in the store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3457) NPE when NodeManager.serviceInit fails and stopRecoveryStore called

2015-04-07 Thread Bibin A Chundatt (JIRA)

Bibin A Chundatt created YARN-3457:
--

 Summary: NPE when NodeManager.serviceInit fails and 
stopRecoveryStore called
 Key: YARN-3457
 URL: https://issues.apache.org/jira/browse/YARN-3457
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor


When NodeManager service init fails during stopRecoveryStore null pointer 
exception is thrown

{code}
 @Override
  protected void serviceInit(Configuration conf) throws Exception {
   ..
  try {
  exec.init();
} catch (IOException e) {
  throw new YarnRuntimeException(Failed to initialize container executor, 
e);
}

this.context = createNMContext(containerTokenSecretManager,
nmTokenSecretManager, nmStore);

{code}

context is null when service init fails

{code}
  private void stopRecoveryStore() throws IOException {
nmStore.stop();
if (context.getDecommissioned()  nmStore.canRecover()) {
   ..
}
  }
{code}

Null pointer exception thrown

{quote}
015-04-07 17:31:45,807 WARN org.apache.hadoop.service.AbstractService: When 
stopping the service NodeManager : java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:168)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:280)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:484)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:534)

{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-04-07 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483093#comment-14483093
 ] 

Bibin A Chundatt commented on YARN-2801:


[~leftnoteasy] Any update on documentation for Node Labels . Its difficult to 
completely evaluate this feature with out documentation 

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Wangda Tan

 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


[ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483281#comment-14483281
 ] 

Hadoop QA commented on YARN-3348:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723624/apache-yarn-3348.0.patch
  against trunk revision 75c5454.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.api.TestPBImplRecords

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7235//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7235//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7235//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7235//console

This message is automatically generated.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


 [ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3348:

Attachment: apache-yarn-3348.0.patch

Uploaded initial version of the patch. Most of the work is in a new TopCLI 
class. I added an application reports cache in ClientRMService with a timeout 
of 5 seconds as well as a boolean in GetApplicationsRequest to fetch cached 
versions of the reports.

The tool essentially prints out the application report. The default refresh 
rate is 3 seconds.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI


[ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483319#comment-14483319
 ] 

Hadoop QA commented on YARN-3293:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723647/apache-yarn-3293.3.patch
  against trunk revision 75c5454.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7238//console

This message is automatically generated.

 Track and display capacity scheduler health metrics in web UI
 -

 Key: YARN-3293
 URL: https://issues.apache.org/jira/browse/YARN-3293
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
 apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, 
 apache-yarn-3293.3.patch


 It would be good to display metrics that let users know about the health of 
 the capacity scheduler in the web UI. Today it is hard to get an idea if the 
 capacity scheduler is functioning correctly. Metrics such as the time for the 
 last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3293) Track and display capacity scheduler health metrics in web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3293:

Attachment: apache-yarn-3293.4.patch

Doh! Uploaded the stat instead of the patch. Uploading the real patch.

 Track and display capacity scheduler health metrics in web UI
 -

 Key: YARN-3293
 URL: https://issues.apache.org/jira/browse/YARN-3293
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
 apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, 
 apache-yarn-3293.4.patch


 It would be good to display metrics that let users know about the health of 
 the capacity scheduler in the web UI. Today it is hard to get an idea if the 
 capacity scheduler is functioning correctly. Metrics such as the time for the 
 last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2429) LCE should blacklist based upon group


[ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483233#comment-14483233
 ] 

Hudson commented on YARN-2429:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #147 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/147/])
YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error 
message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 
99b08a748e7b00a58b63330b353902a6da6aae27)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java
* hadoop-yarn-project/CHANGES.txt


 LCE should blacklist based upon group
 -

 Key: YARN-2429
 URL: https://issues.apache.org/jira/browse/YARN-2429
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Allen Wittenauer

 It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483231#comment-14483231
 ] 

Hudson commented on YARN-3273:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #147 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/147/])
Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 
3fb5abfc87953377f86e06578518801a181d7697)
* hadoop-yarn-project/CHANGES.txt


 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]

2015-04-07 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2003:
--
Attachment: 0006-YARN-2003.patch

Hi [~leftnoteasy]
Rebased the patch. This patch is independent of others. But YARN-2004 will have 
to depend on this to implement the abstract methods defined here.

 Support to process Job priority from Submission Context in 
 AppAttemptAddedSchedulerEvent [RM side]
 --

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 
 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 
 0006-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483261#comment-14483261
 ] 

Hudson commented on YARN-3273:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2088 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2088/])
Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 
3fb5abfc87953377f86e06578518801a181d7697)
* hadoop-yarn-project/CHANGES.txt


 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2429) LCE should blacklist based upon group


[ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483263#comment-14483263
 ] 

Hudson commented on YARN-2429:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2088 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2088/])
YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error 
message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 
99b08a748e7b00a58b63330b353902a6da6aae27)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java
* hadoop-yarn-project/CHANGES.txt


 LCE should blacklist based upon group
 -

 Key: YARN-2429
 URL: https://issues.apache.org/jira/browse/YARN-2429
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Allen Wittenauer

 It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3293) Track and display capacity scheduler health metrics in web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3293:

Attachment: apache-yarn-3293.3.patch

{quote}
General - it looks like the counters could possibly overflow and provide 
negative values, perhaps this is not something which could possibly happen in 
the lifetime of a cluster, but a large long-running cluster, is it a 
possiblilty/concern?
{quote}
The counters in SchedulerHealth are Long so it should be fine. The counters in 
AssignmentInformation(new class I added) are reset every allocation cycle.

{quote}
This presently looks to be capasched only, had a suggestion to make slightly 
more general below, Vinod Kumar Vavilapalli also mentioned not specific to 
scheduler, perhaps it's fine to go capasched only for the first iteration, but 
wanted to verify (perhaps we need a followon jira for other schedulers).
{quote}
Yes. That's the plan - once it's in for CapacityScheduler, I'll file a ticket 
to add the information for FairScheduler and point to this one as an example of 
the stuff we added.

{quote}
on the web page
It's a nit, but I find I don't like the look of the / between the counter and 
the resource expression where that occurs, maybe - instead of / for those 
(allocations/reservations/releases)?
{quote}
Fixed.

{quote}
TestSchedulerHealth
can we import Nodemanager  get rid of package references in code
{quote}
Fixed.

{quote}
CapacitySchedulerHealthInfo
looks like there is no need to keep a reference to the CapacityScheduler 
instance after construction, can we drop it from being a member then?
{quote}
Fixed.

{quote}
looks like line changes in info log are just whitespace, can you drop them?
{quote}
Fixed.

{quote}
LeafQueue
L884 looks to be just whitespace, can you revert?
{quote}
Fixed.

{quote}
CSAssignment
I think that there should be a new, gsharable between schedulers class which 
incorporates all the new assignment info and that it should be a member of 
CSAssignment, instead of adding all of the details directly to CSAssignment. 
You would still pack the info into CSAssignment (as an instance of that type), 
but now would take a form that can be shared across schedulers
{quote}
Fixed. I created a new class called AssignmentInformation which encapsulates 
everything.

 Track and display capacity scheduler health metrics in web UI
 -

 Key: YARN-3293
 URL: https://issues.apache.org/jira/browse/YARN-3293
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
 apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, 
 apache-yarn-3293.3.patch


 It would be good to display metrics that let users know about the health of 
 the capacity scheduler in the web UI. Today it is hard to get an idea if the 
 capacity scheduler is functioning correctly. Metrics such as the time for the 
 last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3293) Track and display capacity scheduler health metrics in web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3293:

Attachment: (was: apache-yarn-3293.3.patch)

 Track and display capacity scheduler health metrics in web UI
 -

 Key: YARN-3293
 URL: https://issues.apache.org/jira/browse/YARN-3293
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
 apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch


 It would be good to display metrics that let users know about the health of 
 the capacity scheduler in the web UI. Today it is hard to get an idea if the 
 capacity scheduler is functioning correctly. Metrics such as the time for the 
 last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-07 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.007.patch

Upload same patch again for another test.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


[ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483424#comment-14483424
 ] 

Hadoop QA commented on YARN-3348:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723653/apache-yarn-3348.1.patch
  against trunk revision 19a4fea.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7240//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7240//console

This message is automatically generated.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483486#comment-14483486
 ] 

Xuan Gong commented on YARN-3294:
-

+1 lgtm. Will commit

 Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
 period
 -

 Key: YARN-3294
 URL: https://issues.apache.org/jira/browse/YARN-3294
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
 apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, 
 apache-yarn-3294.3.patch, apache-yarn-3294.4.patch


 It would be nice to have a button on the web UI that would allow dumping of 
 debug logs for just the capacity scheduler for a fixed period of time(1 min, 
 5 min or so) in a separate log file. It would be useful when debugging 
 scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3293) Track and display capacity scheduler health metrics in web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3293:

Attachment: apache-yarn-3293.5.patch

The findbug warnings are incorrect - the fields are used by JAXB. Updated patch 
to exclude them. The failing test is unrelated.

 Track and display capacity scheduler health metrics in web UI
 -

 Key: YARN-3293
 URL: https://issues.apache.org/jira/browse/YARN-3293
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
 apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, 
 apache-yarn-3293.4.patch, apache-yarn-3293.5.patch


 It would be good to display metrics that let users know about the health of 
 the capacity scheduler in the web UI. Today it is hard to get an idea if the 
 capacity scheduler is functioning correctly. Metrics such as the time for the 
 last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3046) [Event producers] Implement MapReduce AM writing some MR metrics to ATS


 [ 
https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3046:
-
Attachment: YARN-3046-no-test-v2.patch

Forget to add code for finding of TimelineCollectorAddress in previous patch, 
add it in v2. 
End 2 end test is still missing in this patch here because some existing tests 
- TestMRTimelineEventHandling get failed locally even without applying this 
patch. Still in digging out the issue.

 [Event producers] Implement MapReduce AM writing some MR metrics to ATS
 ---

 Key: YARN-3046
 URL: https://issues.apache.org/jira/browse/YARN-3046
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch


 Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes 
 written) and have the MR AM write the framework-specific metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat


[ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483392#comment-14483392
 ] 

Junping Du commented on YARN-1376:
--

Forget to mention, if RM can know log aggregation status in NM side, I think we 
can remove getKeepAliveApplications() in NM-RM heartbeat because RM can keep 
finished applications tokens alive based on status of log aggregation directly. 
However, we don't have to address this issue in this JIRA and can file a 
separated one. 
For UI changes, [~xgong], can you put a screenshot as well? Thanks!

 NM need to notify the log aggregation status to RM through Node heartbeat
 -

 Key: YARN-1376
 URL: https://issues.apache.org/jira/browse/YARN-1376
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
 YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, YARN-1376.3.patch, 
 YARN-1376.4.patch


 Expose a client API to allow clients to figure if log aggregation is 
 complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3127) Apphistory url crashes when RM switches with ATS enabled

2015-04-07 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483396#comment-14483396
 ] 

Naganarasimha G R commented on YARN-3127:
-

Hi [~xgong], If you have the bandwidth can you take a look at this patch too ?

 Apphistory url crashes when RM switches with ATS enabled
 

 Key: YARN-3127
 URL: https://issues.apache.org/jira/browse/YARN-3127
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.6.0
 Environment: RM HA with ATS
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
 Attachments: YARN-3127.20150213-1.patch, YARN-3127.20150329-1.patch


 1.Start RM with HA and ATS configured and run some yarn applications
 2.Once applications are finished sucessfully start timeline server
 3.Now failover HA form active to standby
 4.Access timeline server URL IP:PORT/applicationhistory
 Result: Application history URL fails with below info
 {quote}
 2015-02-03 20:28:09,511 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
 read the applications.
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:80)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   ...
 Caused by: 
 org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: The 
 entity for application attempt appattempt_1422972608379_0001_01 doesn't 
 exist in the timeline store
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplicationAttempt(ApplicationHistoryManagerOnTimelineStore.java:151)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.generateApplicationReport(ApplicationHistoryManagerOnTimelineStore.java:499)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAllApplications(ApplicationHistoryManagerOnTimelineStore.java:108)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:84)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:81)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   ... 51 more
 2015-02-03 20:28:09,512 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /applicationhistory
 org.apache.hadoop.yarn.webapp.WebAppException: Error rendering block: 
 nestLevel=6 expected 5
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
 {quote}
 Behaviour with AHS with file based history store
   -Apphistory url is working 
   -No attempt entries are shown for each application.
   
 Based on inital analysis when RM switches ,application attempts from state 
 store  are not replayed but only applications are.
 So when /applicaitonhistory url is accessed it tries for all attempt id and 
 fails



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3110) Few issues in ApplicationHistory web ui


[ 
https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483371#comment-14483371
 ] 

Hudson commented on YARN-3110:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7519 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7519/])
YARN-3110. Few issues in ApplicationHistory web ui. Contributed by 
Naganarasimha G R (xgong: rev 19a4feaf6fcf42ebbfe98b8a7153ade96d37fb14)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java


 Few issues in ApplicationHistory web ui
 ---

 Key: YARN-3110
 URL: https://issues.apache.org/jira/browse/YARN-3110
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: applications, timelineserver
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch, 
 YARN-3110.20150406-1.patch


 Application state and History link wrong when Application is in unassigned 
 state
  
 1.Configure capacity schedular with queue size as 1  also max Absolute Max 
 Capacity:  10.0%
 (Current application state is Accepted and Unassigned from resource manager 
 side)
 2.Submit application to queue and check the state and link in Application 
 history
 State= null and History link shown as N/A in applicationhistory page
 Kill the same application . In timeline server logs the below is show when 
 selecting application link.
 {quote}
 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
 read the AM container of the application attempt 
 appattempt_1422467063659_0007_01.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
   at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at

[jira] [Commented] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]


[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483416#comment-14483416
 ] 

Hadoop QA commented on YARN-2003:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723643/0006-YARN-2003.patch
  against trunk revision 75c5454.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 13 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7236//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7236//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7236//console

This message is automatically generated.

 Support to process Job priority from Submission Context in 
 AppAttemptAddedSchedulerEvent [RM side]
 --

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 
 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 
 0006-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp


[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483417#comment-14483417
 ] 

Hadoop QA commented on YARN-3021:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723642/YARN-3021.007.patch
  against trunk revision 75c5454.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7237//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7237//console

This message is automatically generated.

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage

2015-04-07 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483429#comment-14483429
 ] 

Allen Wittenauer commented on YARN-3348:


{code}
+doNotSetCols=0
+doNotSetRows=0
+for i in $@; do
+  if [[ $i == -cols ]]; then
+doNotSetCols=1
+  fi
+  if [[ $i == -rows ]]; then
+doNotSetRows=1
+  fi
+done
+if [[ $doNotSetCols == 0 ]]; then
+  cols=`tput cols`
+  args=( $@ )
+  args=(${args[@]} -cols $cols)
+  set -- ${args[@]}
+fi
+if [[ $doNotSetRows == 0 ]]; then
+  rows=`tput lines`
+  args=( $@ )
+  args=(${args[@]} -rows $rows)
+  set -- ${args[@]}
+fi
{code}

* Why are we doing this manipulation here and not in the Java code?

* backticks are antiquated in modern bash.  Use {{$()}} construction

* What happens if tput gives you zero or an error because you are on a 
non-addressable terminal? (You can generally simulate this by unset TERM or 
equivalent env var)


 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3110) Few issues in ApplicationHistory web ui


[ 
https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483353#comment-14483353
 ] 

Xuan Gong commented on YARN-3110:
-

Committed into trunk/branch-2. Thanks, Naganarasimha

 Few issues in ApplicationHistory web ui
 ---

 Key: YARN-3110
 URL: https://issues.apache.org/jira/browse/YARN-3110
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: applications, timelineserver
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch, 
 YARN-3110.20150406-1.patch


 Application state and History link wrong when Application is in unassigned 
 state
  
 1.Configure capacity schedular with queue size as 1  also max Absolute Max 
 Capacity:  10.0%
 (Current application state is Accepted and Unassigned from resource manager 
 side)
 2.Submit application to queue and check the state and link in Application 
 history
 State= null and History link shown as N/A in applicationhistory page
 Kill the same application . In timeline server logs the below is show when 
 selecting application link.
 {quote}
 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
 read the AM container of the application attempt 
 appattempt_1422467063659_0007_01.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
   at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at

[jira] [Updated] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


 [ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3348:

Attachment: apache-yarn-3348.1.patch

Uploaded a new patch to fix release audit warning and failing test. The 
findbugs warning is from another test.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


[ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483373#comment-14483373
 ] 

Varun Vasudev commented on YARN-3348:
-

Sorry, that last comment should have been The findbugs warning is from some 
another patch - YARN-2901.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.


[ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483432#comment-14483432
 ] 

Junping Du commented on YARN-3431:
--

Thanks [~zjshen] for the patch and [~gtCarrera9] for review and comments.
bq.  However, I'm a little bit confused about the big picture of this patch.
I put some contents and background in JIRA description. Hope it helps.

{code}
-putObjects(entities, params, entitiesContainer);
+for (org.apache.hadoop.yarn.api.records.timelineservice.TimelineEntity 
entity : entities) {
+  String path = entities;
+  try {
+path += / + TimelineEntityType.valueOf(entity.getType()).toString();
+  } catch (IllegalArgumentException e) {
+// Do nothing, generic entity type
+  }
+  putObjects(path, params, entity);
+}
{code}
Looks like we are breaking one put operation into pieces. This doesn't make 
sense in performance prospective. Do we have to do this? BTW, we should handle 
IllegalArgumentException instead of ignoring it. Isn't it?

 Sub resources of timeline entity needs to be passed to a separate endpoint.
 ---

 Key: YARN-3431
 URL: https://issues.apache.org/jira/browse/YARN-3431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3431.1.patch, YARN-3431.2.patch


 We have TimelineEntity and some other entities as subclass that inherit from 
 it. However, we only have a single endpoint, which consume TimelineEntity 
 rather than sub-classes and this endpoint will check the incoming request 
 body contains exactly TimelineEntity object. However, the json data which is 
 serialized from sub-class object seems not to be treated as an TimelineEntity 
 object, and won't be deserialized into the corresponding sub-class object 
 which cause deserialization failure as some discussions in YARN-3334 : 
 https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI


[ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483445#comment-14483445
 ] 

Hadoop QA commented on YARN-3293:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723649/apache-yarn-3293.4.patch
  against trunk revision 75c5454.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 6 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7239//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7239//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7239//console

This message is automatically generated.

 Track and display capacity scheduler health metrics in web UI
 -

 Key: YARN-3293
 URL: https://issues.apache.org/jira/browse/YARN-3293
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
 apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, 
 apache-yarn-3293.4.patch


 It would be good to display metrics that let users know about the health of 
 the capacity scheduler in the web UI. Today it is hard to get an idea if the 
 capacity scheduler is functioning correctly. Metrics such as the time for the 
 last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

[
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483498#comment-14483498
]

Junping Du commented on YARN-3391:
--

Sorry for coming a little late. Thanks guys for good discussions here and
[~zjshen] for updating the patch!
bq. I just wanted to add my 2 cents that this is something we already see and
experience with hRaven so it's not theoretical.
+1, [~sjlee0]! I think that's very important feedback for improving user
experience for new feature here. Let's try to get a good balance between
addressing these solid scenarios as well as providing flexibility to possible
new scenarios. e.g. we can provide different flow group policies that user can
use to group application into flow by name or keeping them as isolated flow,
etc. Anyway, as everyone's agreement so far, let's continue the discussion on a
separated JIRA for figuring it out later.

The patch looks good in overall. However, I still haven't seen we put
definition of flow, flow run and flow version in any places of Javadoc.
As I mentioned earlier, it should be useful for developers. The official Apache
feature doc is more user oriented and we can address it later when feature get
completed.

Clearly define flow ID/ flow run / flow version in API and storage
--

Key: YARN-3391
URL: https://issues.apache.org/jira/browse/YARN-3391
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Attachments: YARN-3391.1.patch, YARN-3391.2.patch

To continue the discussion in YARN-3040, let's figure out the best way to
describe the flow.
Some key issues that we need to conclude on:
- How do we include the flow version in the context so that it gets passed
into the collector and to the storage eventually?
- Flow run id should be a number as opposed to a generic string?
- Default behavior for the flow run id if it is missing (i.e. client did not
set it)
- How do we handle flow attributes in case of nested levels of flows?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.


 [ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3431:
-
Description: We have TimelineEntity and some other entities as subclass 
that inherit from it. However, we only have a single endpoint, which consume 
TimelineEntity rather than sub-classes and this endpoint will check the 
incoming request body contains exactly TimelineEntity object. However, the json 
data which is serialized from sub-class object seems not to be treated as an 
TimelineEntity object, and won't be deserialized into the corresponding 
sub-class object which cause deserialization failure as some discussions in 
YARN-3334 : 
https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.

 Sub resources of timeline entity needs to be passed to a separate endpoint.
 ---

 Key: YARN-3431
 URL: https://issues.apache.org/jira/browse/YARN-3431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3431.1.patch, YARN-3431.2.patch


 We have TimelineEntity and some other entities as subclass that inherit from 
 it. However, we only have a single endpoint, which consume TimelineEntity 
 rather than sub-classes and this endpoint will check the incoming request 
 body contains exactly TimelineEntity object. However, the json data which is 
 serialized from sub-class object seems not to be treated as an TimelineEntity 
 object, and won't be deserialized into the corresponding sub-class object 
 which cause deserialization failure as some discussions in YARN-3334 : 
 https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat


[ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483419#comment-14483419
 ] 

Junping Du commented on YARN-1376:
--

bq. I didn't see where we remove element from logAggregationReportForApps. I 
think we need to remove it when log aggregation finished or it will still 
occupy (and may eat up gradually) NM's memory.
Just synced with [~xgong] offline that we do poll operation on 
logAggregationReportForApps, so element should get removed from 
ConcurrentLinkedQueue, so my previous comments is not valid here. However, 
there is a case we need pay attention that if heartbeat request is not getting 
respond, we shouldn't poll the aggregation report directly. Instead, we could 
put polled element to some temporary list, then drop it if get response 
successfully or merge it back to LinkedQueue for next time heartbeat. 

 NM need to notify the log aggregation status to RM through Node heartbeat
 -

 Key: YARN-1376
 URL: https://issues.apache.org/jira/browse/YARN-1376
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
 YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, YARN-1376.3.patch, 
 YARN-1376.4.patch


 Expose a client API to allow clients to figure if log aggregation is 
 complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM


[ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483489#comment-14483489
 ] 

Varun Vasudev commented on YARN-3443:
-

+1, lgtm for the latest patch.

 Create a 'ResourceHandler' subsystem to ease addition of support for new 
 resource types on the NM
 -

 Key: YARN-3443
 URL: https://issues.apache.org/jira/browse/YARN-3443
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3443.001.patch, YARN-3443.002.patch, 
 YARN-3443.003.patch, YARN-3443.004.patch


 The current cgroups implementation is closely tied to supporting CPU as a 
 resource . We need to separate out CGroups support as well a provide a simple 
 ResourceHandler subsystem that will enable us to add support for new resource 
 types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483490#comment-14483490
 ] 

Xuan Gong commented on YARN-3294:
-

Committed into trunk/branch-2. Thanks, varun.

 Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
 period
 -

 Key: YARN-3294
 URL: https://issues.apache.org/jira/browse/YARN-3294
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
 apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, 
 apache-yarn-3294.3.patch, apache-yarn-3294.4.patch


 It would be nice to have a button on the web UI that would allow dumping of 
 debug logs for just the capacity scheduler for a fixed period of time(1 min, 
 5 min or so) in a separate log file. It would be useful when debugging 
 scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483518#comment-14483518
 ] 

Hudson commented on YARN-3294:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7521 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7521/])
YARN-3294. Allow dumping of Capacity Scheduler debug logs via web UI for 
(xgong: rev d27e9241e8676a0edb2d35453cac5f9495fcd605)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestAdHocLogDumper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AdHocLogDumper.java


 Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
 period
 -

 Key: YARN-3294
 URL: https://issues.apache.org/jira/browse/YARN-3294
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
 apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, 
 apache-yarn-3294.3.patch, apache-yarn-3294.4.patch


 It would be nice to have a button on the web UI that would allow dumping of 
 debug logs for just the capacity scheduler for a fixed period of time(1 min, 
 5 min or so) in a separate log file. It would be useful when debugging 
 scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


[ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483377#comment-14483377
 ] 

Varun Vasudev commented on YARN-3348:
-

The attached patch applies only to trunk. Once I get a +1, I'll put a version 
that applies to branch-2.

 Add a 'yarn top' tool to help understand cluster usage
 --

 Key: YARN-3348
 URL: https://issues.apache.org/jira/browse/YARN-3348
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch


 It would be helpful to have a 'yarn top' tool that would allow administrators 
 to understand which apps are consuming resources.
 Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
 and show you statistics on container allocation across the cluster to find 
 out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3110) Few issues in ApplicationHistory web ui


[ 
https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483346#comment-14483346
 ] 

Xuan Gong commented on YARN-3110:
-

+1 LGTM. Will commit

 Few issues in ApplicationHistory web ui
 ---

 Key: YARN-3110
 URL: https://issues.apache.org/jira/browse/YARN-3110
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: applications, timelineserver
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch, 
 YARN-3110.20150406-1.patch


 Application state and History link wrong when Application is in unassigned 
 state
  
 1.Configure capacity schedular with queue size as 1  also max Absolute Max 
 Capacity:  10.0%
 (Current application state is Accepted and Unassigned from resource manager 
 side)
 2.Submit application to queue and check the state and link in Application 
 history
 State= null and History link shown as N/A in applicationhistory page
 Kill the same application . In timeline server logs the below is show when 
 selecting application link.
 {quote}
 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
 read the AM container of the application attempt 
 appattempt_1422467063659_0007_01.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
   at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

[jira] [Commented] (YARN-3110) Few issues in ApplicationHistory web ui

2015-04-07 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483366#comment-14483366
 ] 

Naganarasimha G R commented on YARN-3110:
-

Thanks for reviewing  Commiting [~xgong] :)

 Few issues in ApplicationHistory web ui
 ---

 Key: YARN-3110
 URL: https://issues.apache.org/jira/browse/YARN-3110
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: applications, timelineserver
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch, 
 YARN-3110.20150406-1.patch


 Application state and History link wrong when Application is in unassigned 
 state
  
 1.Configure capacity schedular with queue size as 1  also max Absolute Max 
 Capacity:  10.0%
 (Current application state is Accepted and Unassigned from resource manager 
 side)
 2.Submit application to queue and check the state and link in Application 
 history
 State= null and History link shown as N/A in applicationhistory page
 Kill the same application . In timeline server logs the below is show when 
 selecting application link.
 {quote}
 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
 read the AM container of the application attempt 
 appattempt_1422467063659_0007_01.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
   at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at

[jira] [Updated] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat


 [ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1376:

Attachment: Screen Shot 2015-04-07 at 9.30.42 AM.png

 NM need to notify the log aggregation status to RM through Node heartbeat
 -

 Key: YARN-1376
 URL: https://issues.apache.org/jira/browse/YARN-1376
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, 
 YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
 YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, 
 YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch


 Expose a client API to allow clients to figure if log aggregation is 
 complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat


 [ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1376:

Attachment: YARN-1376.2015-04-07.patch

Address all the latest comments.

 NM need to notify the log aggregation status to RM through Node heartbeat
 -

 Key: YARN-1376
 URL: https://issues.apache.org/jira/browse/YARN-1376
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, 
 YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
 YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, 
 YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch


 Expose a client API to allow clients to figure if log aggregation is 
 complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI


[ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483668#comment-14483668
 ] 

Hadoop QA commented on YARN-3293:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723665/apache-yarn-3293.5.patch
  against trunk revision 0b5d7d2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 6 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7242//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7242//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7242//console

This message is automatically generated.

 Track and display capacity scheduler health metrics in web UI
 -

 Key: YARN-3293
 URL: https://issues.apache.org/jira/browse/YARN-3293
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
 apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, 
 apache-yarn-3293.4.patch, apache-yarn-3293.5.patch


 It would be good to display metrics that let users know about the health of 
 the capacity scheduler in the web UI. Today it is hard to get an idea if the 
 capacity scheduler is functioning correctly. Metrics such as the time for the 
 last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2429) LCE should blacklist based upon group


[ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483666#comment-14483666
 ] 

Hudson commented on YARN-2429:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #157 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/157/])
YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error 
message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 
99b08a748e7b00a58b63330b353902a6da6aae27)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 LCE should blacklist based upon group
 -

 Key: YARN-2429
 URL: https://issues.apache.org/jira/browse/YARN-2429
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Allen Wittenauer

 It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM


[ 
https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483827#comment-14483827
 ] 

Hadoop QA commented on YARN-3460:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723668/HADOOP-11810-1.patch
  against trunk revision d27e924.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1148 javac 
compiler warnings (more than the trunk's current 209 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7245//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7245//console

This message is automatically generated.

 Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
 

 Key: YARN-3460
 URL: https://issues.apache.org/jira/browse/YARN-3460
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.6.0
 Environment: $ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T11:37:52-06:00)
 Maven home: /opt/apache-maven-3.2.1
 Java version: 1.7.0, vendor: IBM Corporation
 Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 3.10.0-229.ael7b.ppc64le, arch: ppc64le, 
 family: unix
Reporter: pascal oliva
 Attachments: HADOOP-11810-1.patch


 TestSecureRMRegistryOperations failed with JBM IBM JAVA
 mvn test -X 
 -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations
 ModuleTotal Failure Error Skipped
 -
 hadoop-yarn-registry 12  0   12 0
 -
  Total  12  0   12 0
 With 
 javax.security.auth.login.LoginException: Bad JAAS configuration: 
 unrecognized option: isInitiator
 and 
 Bad JAAS configuration: unrecognized option: storeKey



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3458) CPU resource monitoring in Windows

Inigo Goiri created YARN-3458:
-

 Summary: CPU resource monitoring in Windows
 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor


The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree 
is left as unavailable. Attached a proposal of how to do it. I reused the 
CpuTimeTracker using 1 jiffy=1ms.

This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483664#comment-14483664
 ] 

Hudson commented on YARN-3273:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #157 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/157/])
Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 
3fb5abfc87953377f86e06578518801a181d7697)
* hadoop-yarn-project/CHANGES.txt


 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-07 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483685#comment-14483685
 ] 

Jian He commented on YARN-3439:
---

IIUC, isn't this a long-standing issue that Ozzie doesn't set 
mapreduce.job.complete.cancel.delegation.tokens to false for standard MR Job 
? according to [here | 
https://issues.apache.org/jira/browse/YARN-2964?focusedCommentId=14250926page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14250926].
 Should we set it to false on Ozzie side ?

 RM fails to renew token when Oozie launcher leaves before sub-job finishes
 --

 Key: YARN-3439
 URL: https://issues.apache.org/jira/browse/YARN-3439
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: YARN-3439.001.patch


 When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
 linger waiting for the sub-job to finish.  At that point the RM stops 
 renewing delegation tokens for the launcher job which wreaks havoc on the 
 sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483631#comment-14483631
 ] 

Hadoop QA commented on YARN-3458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723671/YARN-3458-1.patch
  against trunk revision d27e924.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7243//console

This message is automatically generated.

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor
  Labels: containers, metrics, windows
 Attachments: YARN-3458-1.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3458:
--
Labels: containers metrics windows  (was: )

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor
  Labels: containers, metrics, windows
 Attachments: YARN-3458-1.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM


 [ 
https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved HADOOP-11810 to YARN-3460:
---

Fix Version/s: (was: 3.0.0)
 Target Version/s: 2.8.0  (was: 2.6.0)
Affects Version/s: (was: 2.6.0)
   (was: 3.0.0)
   3.0.0
   2.6.0
  Key: YARN-3460  (was: HADOOP-11810)
  Project: Hadoop YARN  (was: Hadoop Common)

 Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
 

 Key: YARN-3460
 URL: https://issues.apache.org/jira/browse/YARN-3460
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.6.0, 3.0.0
 Environment: $ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T11:37:52-06:00)
 Maven home: /opt/apache-maven-3.2.1
 Java version: 1.7.0, vendor: IBM Corporation
 Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 3.10.0-229.ael7b.ppc64le, arch: ppc64le, 
 family: unix
Reporter: pascal oliva
 Attachments: HADOOP-11810-1.patch


 TestSecureRMRegistryOperations failed with JBM IBM JAVA
 mvn test -X 
 -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations
 ModuleTotal Failure Error Skipped
 -
 hadoop-yarn-registry 12  0   12 0
 -
  Total  12  0   12 0
 With 
 javax.security.auth.login.LoginException: Bad JAAS configuration: 
 unrecognized option: isInitiator
 and 
 Bad JAAS configuration: unrecognized option: storeKey



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage


 [ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3391:
--
Attachment: YARN-3391.3.patch

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483822#comment-14483822
 ] 

Hadoop QA commented on YARN-3458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723681/YARN-3458-3.patch
  against trunk revision d27e924.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7246//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7246//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7246//console

This message is automatically generated.

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor
  Labels: containers, metrics, windows
 Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483627#comment-14483627
 ] 

Inigo Goiri commented on YARN-3458:
---

Not sure if the patch has been created properly as I'm in between a couple 
versions.
I would created based on trunk if this doesn't work.

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor
 Attachments: YARN-3458-1.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3458:
--
Attachment: YARN-3458-2.patch

Patch based on trunk. Let's see if Jenkins likes it.

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor
  Labels: containers, metrics, windows
 Attachments: YARN-3458-1.patch, YARN-3458-2.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-07 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483698#comment-14483698
 ] 

Jason Lowe commented on YARN-3439:
--

I believe it is setting that to false, as that behavior hasn't changed on the 
Oozie side.  However this isn't an issue of the token being cancelled but 
rather expiring.  The RM properly avoids cancelling the token when the launcher 
job exits, but it then forgets to keep renewing it as well.  Eventually the 
token expires and downstream jobs fail (if they run long enough).

 RM fails to renew token when Oozie launcher leaves before sub-job finishes
 --

 Key: YARN-3439
 URL: https://issues.apache.org/jira/browse/YARN-3439
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: YARN-3439.001.patch


 When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
 linger waiting for the sub-job to finish.  At that point the RM stops 
 renewing delegation tokens for the launcher job which wreaks havoc on the 
 sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3458:
--
Attachment: YARN-3458-3.patch

Git and I are getting through a rough relation, let's see if now...

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor
  Labels: containers, metrics, windows
 Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

[
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483605#comment-14483605
]

Sidharta Seethana commented on YARN-3366:
-

Thanks for the review, [~vvasudev] . Responses inline :

1. I'll fix this. This is an artifact of differences between trunk/branch-2
(repeated) 1. I think these are useful log lines that specify change in
behavior due to settings/system state etc. I'll clarify/improve the log
messages.
2. good catch, I'll fix it. Tests ran fine because WARN logging was enabled.
3. I'll fix the comments' location. The exception used to exist before but was
causing bootstrapping issues. I left it in there along with an explanation for
why it shouldn't be thrown. I'll remove it and modify comments.
4. Intellij warns me about this too - but I had left it in there for
clarity/consistency with the earlier code block - I believe it makes the code a
bit more readable. I would prefer to leave it in place.
5. I'll fix this
6. I'll fix this
7. why? compiler optimization?
8. I'll fix this.
9. I'll fix this.
10. I'll fix this.
11. I'll fix this - though I don't believe the merging always helps for
error/warn metrics
12. I'll fix this.
13. Not trivially, would refactoring launchContainer.

Outbound network bandwidth : classify/shape traffic originating from YARN
containers

Key: YARN-3366
URL: https://issues.apache.org/jira/browse/YARN-3366
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
Attachments: YARN-3366.001.patch, YARN-3366.002.patch

In order to be able to isolate based on/enforce outbound traffic bandwidth
limits, we need a mechanism to classify/shape network traffic in the
nodemanager. For more information on the design, please see the attached
design document in the parent JIRA.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat


[ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483660#comment-14483660
 ] 

Hadoop QA commented on YARN-1376:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723661/YARN-1376.2015-04-07.patch
  against trunk revision 0b5d7d2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7241//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7241//console

This message is automatically generated.

 NM need to notify the log aggregation status to RM through Node heartbeat
 -

 Key: YARN-1376
 URL: https://issues.apache.org/jira/browse/YARN-1376
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, 
 YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
 YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, 
 YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch


 Expose a client API to allow clients to figure if log aggregation is 
 complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3426) Add jdiff support to YARN


[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483681#comment-14483681
 ] 

Li Lu commented on YARN-3426:
-

The failed unit test also breaks in trunk. Will file a blocker on this. 

 Add jdiff support to YARN
 -

 Key: YARN-3426
 URL: https://issues.apache.org/jira/browse/YARN-3426
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
Priority: Blocker
 Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch


 Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
 to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483692#comment-14483692
 ] 

Hadoop QA commented on YARN-3458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723677/YARN-3458-2.patch
  against trunk revision d27e924.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7244//console

This message is automatically generated.

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor
  Labels: containers, metrics, windows
 Attachments: YARN-3458-1.patch, YARN-3458-2.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk


[ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483779#comment-14483779
 ] 

Li Lu commented on YARN-3459:
-

Reproduced this failure on my local machine as well as Jenkins run for 
YARN-3426. Seems like the test failure was introduced by YARN-2901. 
[~wangda][~vvasudev] can anyone of you take a look at it? Thanks! 

 TestLog4jWarningErrorMetricsAppender breaks in trunk
 

 Key: YARN-3459
 URL: https://issues.apache.org/jira/browse/YARN-3459
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu
Priority: Blocker
 Fix For: 2.7.0


 TestLog4jWarningErrorMetricsAppender fails with the following message:
 {code}
 Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
 testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
 Time elapsed: 2.01 sec   FAILURE!
 java.lang.AssertionError: expected:0 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

2015-04-07 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483808#comment-14483808
 ] 

Vinod Kumar Vavilapalli commented on YARN-3361:
---

Review of the tests
 - testNonExclusiveNodeLabelsAllocationIgnoreAppSubmitOrder
  --  - testPreferenceOfNeedyAppsTowardsNodePartitions ?
  -- This doesn't really guarantee if app2 is getting preference or not. How 
about changing it to say app2 has enough requests to fill the entire node?
 - testNonExclusiveNodeLabelsAllocationIgnorePriority
  -- - testPreferenceOfNeedyContainersTowardsNodePartitions ?
  -- Actually, now that I rename it that way, this may not be the right 
behavior. Not respecting priorities within an app can result in scheduling 
deadlocks.
 - testLabeledResourceRequestsGetPreferrenceInHierarchyOfQueue: This is really 
testQueuesWithAccessGetPreferrenceInPartitionedNodes?
 - testNonLabeledQueueUsesLabeledResource
  -- - testQueuesWithoutAccessUsingPartitionedNodes
  -- Also validate that the wait for non-labeled requests not getting allocated 
on non-partitioned nodes is only for one cycle through all nodes in the cluster
 - Let's move all these node-label related tests into their own test-case.
 - More tests?
  -- AMs with labeled requirement not getting allocated on non-exclusive 
partitions
  -- To verify that we are not putting absolute max-capacities on the 
individual queues when not-respecting-partitions


 CapacityScheduler side changes to support non-exclusive node labels
 ---

 Key: YARN-3361
 URL: https://issues.apache.org/jira/browse/YARN-3361
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3361.1.patch, YARN-3361.2.patch


 According to design doc attached in YARN-3214, we need implement following 
 logic in CapacityScheduler:
 1) When allocate a resource request with no node-label specified, it should 
 get preferentially allocated to node without labels.
 2) When there're some available resource in a node with label, they can be 
 used by applications with following order:
 - Applications under queues which can access the label and ask for same 
 labeled resource. 
 - Applications under queues which can access the label and ask for 
 non-labeled resource.
 - Applications under queues cannot access the label and ask for non-labeled 
 resource.
 3) Expose necessary information that can be used by preemption policy to make 
 preemption decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3046) [Event producers] Implement MapReduce AM writing some MR metrics to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483588#comment-14483588
 ] 

Junping Du commented on YARN-3046:
--

Linked with MAPREDUCE-6189 - the test failure on trunk is solid, not only on my 
local test bed.

 [Event producers] Implement MapReduce AM writing some MR metrics to ATS
 ---

 Key: YARN-3046
 URL: https://issues.apache.org/jira/browse/YARN-3046
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch


 Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes 
 written) and have the MR AM write the framework-specific metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3458:
--
Attachment: YARN-3458-1.patch

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor
 Attachments: YARN-3458-1.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

2015-04-07 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483693#comment-14483693
 ] 

Jian He commented on YARN-3361:
---

Some comments on my side
- should treat each limit differently for different labeled requests?  
{code}
// Otherwise, if any of the label of this node beyond queue limit, we // cannot 
allocate on this node. Consider a small epsilon here.
{code}
- Merge queue#needResource and application#needResource
- needResource - hasPendingResourceRequest; needResource can also be 
simplified if pass in partionToAllocate 
- Some methods like canAssignToThisQueue where both nodeLabels and 
exclusiveType are passed, it may be simplified by passing the current 
partitionToAllocate to simplify the internal if/else check.
- The following may be incorrect, as the current request may be not the AM 
container request, though null == rmAppAttempt.getMasterContainer()
{code} // AM container allocation doesn't support non-exclusive allocation to 
// avoid painful of preempt an AM container if 
{code}

- below if/else can be avoided if passing the nodePartition into 
queueCapacities.getAbsoluteCapacity(nodePartition),
{code}
if (!nodePartition.equals(RMNodeLabelsManager.NO_LABEL)) {
  queueCapacity =
  Resources
  .max(resourceCalculator, clusterResource, queueCapacity,

  Resources.multiplyAndNormalizeUp(
  resourceCalculator,
  labelManager.getResourceByLabel(nodePartition,
  clusterResource),
  queueCapacities.getAbsoluteCapacity(nodePartition),
  minimumAllocation));
} else {
  // else there's no label on request, just to use absolute capacity as
  // capacity for nodes without label
  queueCapacity =
  Resources.multiplyAndNormalizeUp(resourceCalculator, labelManager
.getResourceByLabel(CommonNodeLabelsManager.NO_LABEL, 
clusterResource),
  queueCapacities.getAbsoluteCapacity(),
  minimumAllocation);
}
{code}
- the second limit won’t be hit?
{code}
if (exclusiveType == ExclusiveType.EXCLUSIVE) {
  maxUserLimit =
  Resources.multiplyAndRoundDown(queueCapacity, userLimitFactor);
} else if (exclusiveType == ExclusiveType.NON_EXECLUSIVE) {
  maxUserLimit =
  labelManager.getResourceByLabel(nodePartition, clusterResource);
}
{code}
- nonExclusiveSchedulingOpportunities#setCount - add(Priority)


 CapacityScheduler side changes to support non-exclusive node labels
 ---

 Key: YARN-3361
 URL: https://issues.apache.org/jira/browse/YARN-3361
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3361.1.patch, YARN-3361.2.patch


 According to design doc attached in YARN-3214, we need implement following 
 logic in CapacityScheduler:
 1) When allocate a resource request with no node-label specified, it should 
 get preferentially allocated to node without labels.
 2) When there're some available resource in a node with label, they can be 
 used by applications with following order:
 - Applications under queues which can access the label and ask for same 
 labeled resource. 
 - Applications under queues which can access the label and ask for 
 non-labeled resource.
 - Applications under queues cannot access the label and ask for non-labeled 
 resource.
 3) Expose necessary information that can be used by preemption policy to make 
 preemption decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2429) LCE should blacklist based upon group


[ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483716#comment-14483716
 ] 

Hudson commented on YARN-2429:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2106 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2106/])
YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error 
message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 
99b08a748e7b00a58b63330b353902a6da6aae27)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java
* hadoop-yarn-project/CHANGES.txt


 LCE should blacklist based upon group
 -

 Key: YARN-2429
 URL: https://issues.apache.org/jira/browse/YARN-2429
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Allen Wittenauer

 It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483714#comment-14483714
 ] 

Hudson commented on YARN-3273:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2106 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2106/])
Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 
3fb5abfc87953377f86e06578518801a181d7697)
* hadoop-yarn-project/CHANGES.txt


 Improve web UI to facilitate scheduling analysis and debugging
 --

 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
 YARN-3273-am-resource-used-AND-User-limit.PNG, 
 YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG


 Job may be stuck for reasons such as:
 - hitting queue capacity 
 - hitting user-limit, 
 - hitting AM-resource-percentage 
 The  first queueCapacity is already shown on the UI.
 We may surface things like:
 - what is user's current usage and user-limit; 
 - what is the AM resource usage and limit;
 - what is the application's current HeadRoom;
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk

Li Lu created YARN-3459:
---

 Summary: TestLog4jWarningErrorMetricsAppender breaks in trunk
 Key: YARN-3459
 URL: https://issues.apache.org/jira/browse/YARN-3459
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu
Priority: Blocker
 Fix For: 2.7.0


TestLog4jWarningErrorMetricsAppender fails with the following message:
{code}
Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec  
FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
Time elapsed: 2.01 sec   FAILURE!
java.lang.AssertionError: expected:0 but was:1
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3426) Add jdiff support to YARN


 [ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3426:

Attachment: YARN-3426-040715.patch

Added license information to the four .xml API files. 

 Add jdiff support to YARN
 -

 Key: YARN-3426
 URL: https://issues.apache.org/jira/browse/YARN-3426
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
Priority: Blocker
 Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, 
 YARN-3426-040715.patch


 Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
 to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3461) Consolidate flow name/version/run defaults

Zhijie Shen created YARN-3461:
-

 Summary: Consolidate flow name/version/run defaults
 Key: YARN-3461
 URL: https://issues.apache.org/jira/browse/YARN-3461
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


In YARN-3391, it's not resolved what should be the defaults for flow 
name/version/run. Let's continue the discussion here and unblock YARN-3391 from 
moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

[
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483812#comment-14483812
]

Zhijie Shen commented on YARN-3391:
---

bq. let's continue the discussion on a separated JIRA for figuring it out later.

Agree. Let's unblock this Jira which will unblock the writer implementation
consequently. I filed YARN-3461 to continue the defaults discussion there.

bq. I just wanted to add my 2 cents that this is something we already see and
experience with hRaven so it's not theoretical.

Sangjin, thanks for sharing the use case in hRaven. It's helpful to understand
the proper defaults. To generalize it, we need to consider different use cases
such as adhoc applications only. Shall we continue the discussion on YARN-3461?

bq. As I mentioned earlier, it should be useful for developers

I make use of Sangjin's previous comments to add some inline code comments
about their definitions in TimelineCollectorContext.

Clearly define flow ID/ flow run / flow version in API and storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3426) Add jdiff support to YARN


[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483837#comment-14483837
 ] 

Hadoop QA commented on YARN-3426:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723684/YARN-3426-040715.patch
  against trunk revision d27e924.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7247//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7247//console

This message is automatically generated.

 Add jdiff support to YARN
 -

 Key: YARN-3426
 URL: https://issues.apache.org/jira/browse/YARN-3426
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
Priority: Blocker
 Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, 
 YARN-3426-040715.patch


 Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
 to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-07 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-3439.
--
Resolution: Duplicate

bq. IAC, this is a dup of YARN-3055.

Agreed, closing as a duplicate.

 RM fails to renew token when Oozie launcher leaves before sub-job finishes
 --

 Key: YARN-3439
 URL: https://issues.apache.org/jira/browse/YARN-3439
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: YARN-3439.001.patch


 When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
 linger waiting for the sub-job to finish.  At that point the RM stops 
 renewing delegation tokens for the launcher job which wreaks havoc on the 
 sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3452) Bogus token usernames cause many invalid group lookups

2015-04-07 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483861#comment-14483861
 ] 

Jason Lowe commented on YARN-3452:
--

The extra lookups started in 2.6 releases, and it appears to be caused by 
HADOOP-10650.  However YARN really should not be using bogus users on tokens 
anyway in case the RPC layer (or other non-YARN systems) try to do something 
with those users like HADOOP-10650 did.

 Bogus token usernames cause many invalid group lookups
 --

 Key: YARN-3452
 URL: https://issues.apache.org/jira/browse/YARN-3452
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Jason Lowe

 YARN uses a number of bogus usernames for tokens, like application attempt 
 IDs for NM tokens or even the hardcoded testing for the container localizer 
 token.  These tokens cause the RPC layer to do group lookups on these bogus 
 usernames which will never succeed but can take a long time to perform.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483874#comment-14483874
 ] 

Inigo Goiri commented on YARN-3458:
---

For the tests, I checked the original TestWindowsBasedProcessTree and it didn't 
have related to actually testing the resource monitoring; I'm open to 
suggestions.

Regarding the two warning, I'm not able to understand what this is complaining 
about; it says that I have fields not accessed but the ones I added are 
referenced. I think ti refers to Log but I'm not able to parse the error.

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor
  Labels: containers, metrics, windows
 Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


 [ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3366:

Attachment: YARN-3366.003.patch

Uploading patch incorporating code review feedback.

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3426) Add jdiff support to YARN


[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484008#comment-14484008
 ] 

Li Lu commented on YARN-3426:
-

Could not reproduce the mvn eclipse:eclipse failure locally. The failure looks 
to be irrelevant. 

 Add jdiff support to YARN
 -

 Key: YARN-3426
 URL: https://issues.apache.org/jira/browse/YARN-3426
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
Priority: Blocker
 Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, 
 YARN-3426-040715.patch


 Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
 to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-04-07 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484117#comment-14484117
]

Daryn Sharp commented on YARN-3055:
---

On cursory glance, are you sure this isn't going to leak tokens? Ie. does it
remove tokens from data structures in all cases or can a token get left in
allTokens?

The token is not renewed properly if it's shared by jobs (oozie) in
DelegationTokenRenewer
--

Key: YARN-3055
URL: https://issues.apache.org/jira/browse/YARN-3055
Project: Hadoop YARN
Issue Type: Bug
Components: security
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Blocker
Attachments: YARN-3055.001.patch, YARN-3055.002.patch

After YARN-2964, there is only one timer to renew the token if it's shared by
jobs.
In {{removeApplicationFromRenewal}}, when going to remove a token, and the
token is shared by other jobs, we will not cancel the token.
Meanwhile, we should not cancel the _timerTask_, also we should not remove it
from {{allTokens}}. Otherwise for the existing submitted applications which
share this token will not get renew any more, and for new submitted
applications which share this token, the token will be renew immediately.
For example, we have 3 applications: app1, app2, app3. And they share the
token1. See following scenario:
*1).* app1 is submitted firstly, then app2, and then app3. In this case,
there is only one token renewal timer for token1, and is scheduled when app1
is submitted
*2).* app1 is finished, then the renewal timer is cancelled. token1 will not
be renewed any more, but app2 and app3 still use it, so there is problem.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

[
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484032#comment-14484032
]

Zhijie Shen commented on YARN-3448:
---

Jonathan, thanks for your contribution. It sounds an interesting proposal. I'd
like to take a look at the patch too.

Add Rolling Time To Lives Level DB Plugin Capabilities
--

Key: YARN-3448
URL: https://issues.apache.org/jira/browse/YARN-3448
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Attachments: YARN-3448.1.patch

For large applications, the majority of the time in LeveldbTimelineStore is
spent deleting old entities record at a time. An exclusive write lock is held
during the entire deletion phase which in practice can be hours. If we are to
relax some of the consistency constraints, other performance enhancing
techniques can be employed to maximize the throughput and minimize locking
time.
Split the 5 sections of the leveldb database (domain, owner, start time,
entity, index) into 5 separate databases. This allows each database to
maximize the read cache effectiveness based on the unique usage patterns of
each database. With 5 separate databases each lookup is much faster. This can
also help with I/O to have the entity and index databases on separate disks.
Rolling DBs for entity and index DBs. 99.9% of the data are in these two
sections 4:1 ration (index to entity) at least for tez. We replace DB record
removal with file system removal if we create a rolling set of databases that
age out and can be efficiently removed. To do this we must place a constraint
to always place an entity's events into it's correct rolling db instance
based on start time. This allows us to stitching the data back together while
reading and artificial paging.
Relax the synchronous writes constraints. If we are willing to accept losing
some records that we not flushed in the operating system during a crash, we
can use async writes that can be much faster.
Prefer Sequential writes. sequential writes can be several times faster than
random writes. Spend some small effort arranging the writes in such a way
that will trend towards sequential write performance over random write
performance.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-04-07 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484201#comment-14484201
]

Daryn Sharp commented on YARN-3055:
---

This appears to go back to the really old days of renewing the token for its
entire lifetime. Most unfortunate.

The renewer looks like it may turn into a DOS weapon. Renewing a token returns
the next expiration. The renewer uses a timer to renew 90% before expiration.
After the last renewal, the same expiration (the wall) will be returned as
before. 90% of the wall eventually becomes a rapid fire renewal. There's an
army of 50 threads prepared to fire concurrently.

My other concern is that it used to be the first job submitted with a given
token that determined if the token is to be cancelled. Now any job can
influence the cancelling. This patch didn't specifically break that behavior,
but the original YARN-2704 did, which precipitated YARN-2964 to break it
differently, and now this jira.

The ramification is we used to tell users to make sure the first job set the
conf correctly, and essentially don't worry after that. Now they do have to
worry. Any sub-job with the default of canceling tokens will kill the overall
workflow. Sub-jobs should not have jurisdiction over the tokens.

The token is not renewed properly if it's shared by jobs (oozie) in
DelegationTokenRenewer
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3426) Add jdiff support to YARN

2015-04-07 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3426:
--
Target Version/s: 2.8.0  (was: 2.7.0)

bq. The bigger question is the duplication of the maven code across Common, 
YARN and MAPREDUCE. But this may take more time to cleanup.
Removing it from 2.7.0 as the effort needed for this cleanup is huge.


 Add jdiff support to YARN
 -

 Key: YARN-3426
 URL: https://issues.apache.org/jira/browse/YARN-3426
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
Priority: Blocker
 Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, 
 YARN-3426-040715.patch


 Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
 to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode


[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484058#comment-14484058
 ] 

Sidharta Seethana commented on YARN-2424:
-

Here it is : https://issues.apache.org/jira/browse/YARN-3462

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Blocker
 Fix For: 2.6.0

 Attachments: Y2424-1.patch, YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2