[jira] [Updated] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM
[ https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3443: Attachment: YARN-3443.004.patch Patch with documentation fixes. Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM - Key: YARN-3443 URL: https://issues.apache.org/jira/browse/YARN-3443 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3443.001.patch, YARN-3443.002.patch, YARN-3443.003.patch, YARN-3443.004.patch The current cgroups implementation is closely tied to supporting CPU as a resource . We need to separate out CGroups support as well a provide a simple ResourceHandler subsystem that will enable us to add support for new resource types on the NM - e.g Network, Disk etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3366: Attachment: YARN-3366.002.patch Uploading a patch that includes changes to YarnConfiguration.java Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3366.001.patch, YARN-3366.002.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482713#comment-14482713 ] Hadoop QA commented on YARN-3021: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723545/YARN-3021.007.patch against trunk revision 3fb5abf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader org.apache.hadoop.mapred.TestLineRecordReader org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore org.apache.hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apTests org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7233//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7233//console This message is automatically generated. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Assignee: Yongjun Zhang Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482789#comment-14482789 ] Varun Vasudev commented on YARN-3366: - Thanks for the patch [~sidharta-s]! Feedback below. # In YarnConfiguration.java {noformat} /** - * True if linux-container-executor should limit itself to one user + * If linux-container-executor should limit itself to one user * when running in non-secure mode. */ - public static final String NM_NONSECURE_MODE_LIMIT_USERS = NM_PREFIX + + public static final String NM_NONSECURE_MODE_LIMIT_USERS= NM_PREFIX + linux-container-executor.nonsecure-mode.limit-users; - public static final boolean DEFAULT_NM_NONSECURE_MODE_LIMIT_USERS = true; + public static final boolean DEFAULT_NM_NONSECURE_MODE_LIMIT_USERS = true; {noformat} It looks like these are unnecessary changes. Can you please remove them? # In TrafficController.java {noformat} if (LOG.isInfoEnabled()) { LOG.info(NM recovery is not enabled.); } {noformat} {noformat} if (LOG.isInfoEnabled()) { LOG.info(TC configuration is incomplete.); } {noformat} Can you change these to debug? It doesn't seem to be something that needs to be logged by the class. # In TrafficController.java {noformat} else { if (LOG.isWarnEnabled()) { String logLine = new StringBuffer(Failed to match regex: ) .append(regex).append( Current state: ).append(state).toString(); LOG.warn(logLine); return false; } } {noformat} Shouldn't the return be outside the warn enabled check? # In TrafficController.java {noformat} //This could happen if the interface is already in its default state. //Ignoring. //throw new ResourceHandlerException(Failed to wipe tc state, e); {noformat} The comments are in a different block than the warn message. Also, the commented throw is confusing. # Minor nit - In TrafficController.java, function parseStatsString, the continue isn't really required # In TrafficControlBandwidthHandlerImpl.java - Unused import import com.google.common.annotations.VisibleForTesting # In TrafficControlBandwidthHandlerImpl.java {noformat} LOG.info(strict mode is set to : + strictMode); {noformat} {noformat} LOG.info(Attempting to reacquire classId for container: + containerIdStr); {noformat} Change levels to debug? # In TrafficControlBandwidthHandlerImpl.java {noformat} String opArg = new StringBuffer(PrivilegedOperation.CGROUP_ARG_PREFIX) .append(tasksFile).toString(); {noformat} You can use the String class itself instead of StringBuffer? # In TrafficControlBandwidthHandlerImpl.java {noformat} if (LOG.isWarnEnabled()) { LOG.warn(teardown(): Nothing to do); } {noformat} Why are you logging a warning? # In TestTrafficControlBandwidthHandlerImpl.java and TestTrafficController.java {noformat} Assert.assertTrue(Caught unexpected ResourceHandlerException!, false); {noformat} User Assert.fail? This pattern is used in multiple places. # In LinuxContainerExecutor.java.java {noformat} } catch (ResourceHandlerException e) { + if (LOG.isWarnEnabled()) { +LOG.warn(ResourceHandlerChain.reacquireContainer failed for + +containerId: + containerId); + } {noformat} Can you add the exception to the warn message? # In LinuxContainerExecutor.java {noformat} } catch (ResourceHandlerException e) { if (LOG.isWarnEnabled()) { LOG.warn(e); LOG.warn(ResourceHandlerChain.postComplete failed for + containerId: + containerId); } } {noformat} Merge the warn messages. # In LinuxContainerExecutor.java {noformat} +command.addAll(Arrays.asList(containerExecutorExe, {noformat} Remove the extra space added. # In LinuxContainerExecutor.java {noformat} +String tcCommandFile = null; + +try { + if (resourceHandlerChain != null) { +ListPrivilegedOperation ops = resourceHandlerChain +.preStart(container); + +if (ops != null) { + ListPrivilegedOperation resourceOps = new ArrayList(); + + resourceOps.add(new PrivilegedOperation + (PrivilegedOperation.OperationType.ADD_PID_TO_CGROUP, + resourcesOptions)); + + for (PrivilegedOperation op : ops) { +switch (op.getOperationType()) { + case ADD_PID_TO_CGROUP: +resourceOps.add(op); +break; + case TC_MODIFY_STATE: +tcCommandFile = op.getArguments().get(0); + default: +if (LOG.isWarnEnabled()) { + LOG.warn(PrivilegedOperation type unsupported in launch: + + op.getOperationType()); +} +continue; +} + } + + if (resourceOps.size() 1) { +//squash resource operations +try { + PrivilegedOperation operation =
[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM
[ https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482688#comment-14482688 ] Hadoop QA commented on YARN-3443: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723549/YARN-3443.004.patch against trunk revision 3fb5abf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7234//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7234//console This message is automatically generated. Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM - Key: YARN-3443 URL: https://issues.apache.org/jira/browse/YARN-3443 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3443.001.patch, YARN-3443.002.patch, YARN-3443.003.patch, YARN-3443.004.patch The current cgroups implementation is closely tied to supporting CPU as a resource . We need to separate out CGroups support as well a provide a simple ResourceHandler subsystem that will enable us to add support for new resource types on the NM - e.g Network, Disk etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3404) View the queue name to YARN Application page
[ https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482717#comment-14482717 ] Hadoop QA commented on YARN-3404: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723543/YARN-3404.2.patch against trunk revision 3fb5abf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7232//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7232//console This message is automatically generated. View the queue name to YARN Application page Key: YARN-3404 URL: https://issues.apache.org/jira/browse/YARN-3404 Project: Hadoop YARN Issue Type: Improvement Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3404.1.patch, YARN-3404.2.patch, screenshot.png It want to display the name of the queue that is used to YARN Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI
[ https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482922#comment-14482922 ] Naganarasimha G R commented on YARN-3102: - Hi [~zhiguohong], I had actually started to work on this patch but was skeptical that YARN-914 (or its subjira's ) might have impact or take care of this issue. Give me couple of days time, will try the check the state of my patch and update you. Decommisioned Nodes not listed in Web UI Key: YARN-3102 URL: https://issues.apache.org/jira/browse/YARN-3102 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Environment: 2 Node Manager and 1 Resource Manager Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to yarn.exlude file In RM1 machine Add Yarn.exclude with NM1 Host Name Start the node as listed below NM1,NM2 Resource manager Now check Nodes decommisioned in /cluster/nodes Number of decommisioned node is listed as 1 but Table is empty in /cluster/nodes/decommissioned (detail of Decommision node not shown) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.
[ https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reassigned YARN-2444: Assignee: Steve Loughran Primary filters added after first submission not indexed, cause exceptions in logs. --- Key: YARN-2444 URL: https://issues.apache.org/jira/browse/YARN-2444 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.5.0 Reporter: Marcelo Vanzin Assignee: Steve Loughran Attachments: YARN-2444-001.patch, ats.java, org.apache.hadoop.yarn.server.timeline.TestTimelineClientPut-output.txt See attached code for an example. The code creates an entity with a primary filter, submits it to the ATS. After that, a new primary filter value is added and the entity is resubmitted. At that point two things can be seen: - Searching for the new primary filter value does not return the entity - The following exception shows up in the logs: {noformat} 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying access for user dr.who (auth:SIMPLE) on the events of the timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test } org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test } is corrupted. at org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67) at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI
[ https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482891#comment-14482891 ] Hong Zhiguo commented on YARN-3102: --- I met the same problem. Hi, [~Naganarasimha], can I take this issue? Decommisioned Nodes not listed in Web UI Key: YARN-3102 URL: https://issues.apache.org/jira/browse/YARN-3102 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Environment: 2 Node Manager and 1 Resource Manager Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to yarn.exlude file In RM1 machine Add Yarn.exclude with NM1 Host Name Start the node as listed below NM1,NM2 Resource manager Now check Nodes decommisioned in /cluster/nodes Number of decommisioned node is listed as 1 but Table is empty in /cluster/nodes/decommissioned (detail of Decommision node not shown) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1394) RM to inform AMs when a container completed due to NM going offline -planned or unplanned
[ https://issues.apache.org/jira/browse/YARN-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-1394: Assignee: Rohith RM to inform AMs when a container completed due to NM going offline -planned or unplanned - Key: YARN-1394 URL: https://issues.apache.org/jira/browse/YARN-1394 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Rohith YARN-914 proposes graceful decommission of an NM, and NMs already have the right to go offline. If AMs could be told that a container completed from an NM option -offline vs decommission, the AM could use that in its future blacklisting and placement policy. This matters in long-lived services which may like to place new instances where they were placed before, and track hosts failure rates -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2429) LCE should blacklist based upon group
[ https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482979#comment-14482979 ] Hudson commented on YARN-2429: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #156 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/156/]) YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 99b08a748e7b00a58b63330b353902a6da6aae27) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java LCE should blacklist based upon group - Key: YARN-2429 URL: https://issues.apache.org/jira/browse/YARN-2429 Project: Hadoop YARN Issue Type: New Feature Reporter: Allen Wittenauer It should be possible to list a group to ban, not just individual users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482977#comment-14482977 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #156 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/156/]) Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 3fb5abfc87953377f86e06578518801a181d7697) * hadoop-yarn-project/CHANGES.txt Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3456) Improve handling of incomplete TimelineEntities
[ https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3456: -- Assignee: Varun Saxena Improve handling of incomplete TimelineEntities --- Key: YARN-3456 URL: https://issues.apache.org/jira/browse/YARN-3456 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Priority: Minor If an incomplete TimelineEntity is posted, it isn't checked client side ... it gets all the way to the far end before triggering an NPE in the store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2429) LCE should blacklist based upon group
[ https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482986#comment-14482986 ] Hudson commented on YARN-2429: -- FAILURE: Integrated in Hadoop-Yarn-trunk #890 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/890/]) YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 99b08a748e7b00a58b63330b353902a6da6aae27) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java * hadoop-yarn-project/CHANGES.txt LCE should blacklist based upon group - Key: YARN-2429 URL: https://issues.apache.org/jira/browse/YARN-2429 Project: Hadoop YARN Issue Type: New Feature Reporter: Allen Wittenauer It should be possible to list a group to ban, not just individual users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482984#comment-14482984 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Yarn-trunk #890 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/890/]) Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 3fb5abfc87953377f86e06578518801a181d7697) * hadoop-yarn-project/CHANGES.txt Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3456) Improve handling of incomplete TimelineEntities
[ https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483040#comment-14483040 ] Steve Loughran commented on YARN-3456: -- Stack trace when an entity with a null type is posted by the client. Client side preflight checking could prevent some of this, REST API validation be even stronger {code} 2015-04-07 12:23:40,290 [614480043@qtp-2026808370-0] INFO container.GuiceComponentProviderFactory (GuiceComponentProviderFactory.java:getComponentProvider(159)) - Binding org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices to GuiceManagedComponentProvider with the scope Singleton 2015-04-07 12:23:40,632 [614480043@qtp-2026808370-0] ERROR timeline.TimelineDataManager (TimelineDataManager.java:postEntities(275)) - Skip the timeline entity: { id: post, type: null } java.lang.NullPointerException at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$KeyBuilder.add(LeveldbTimelineStore.java:352) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.createStartTimeLookupKey(LeveldbTimelineStore.java:1188) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getStartTimeLong(LeveldbTimelineStore.java:1081) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getEntity(LeveldbTimelineStore.java:433) at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(TimelineDataManager.java:257) at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.postEntities(TimelineWebServices.java:259) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at
[jira] [Created] (YARN-3456) Improve handling of incomplete TimelineEntities
Steve Loughran created YARN-3456: Summary: Improve handling of incomplete TimelineEntities Key: YARN-3456 URL: https://issues.apache.org/jira/browse/YARN-3456 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Affects Versions: 2.6.0 Reporter: Steve Loughran Priority: Minor If an incomplete TimelineEntity is posted, it isn't checked client side ... it gets all the way to the far end before triggering an NPE in the store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3457) NPE when NodeManager.serviceInit fails and stopRecoveryStore called
Bibin A Chundatt created YARN-3457: -- Summary: NPE when NodeManager.serviceInit fails and stopRecoveryStore called Key: YARN-3457 URL: https://issues.apache.org/jira/browse/YARN-3457 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor When NodeManager service init fails during stopRecoveryStore null pointer exception is thrown {code} @Override protected void serviceInit(Configuration conf) throws Exception { .. try { exec.init(); } catch (IOException e) { throw new YarnRuntimeException(Failed to initialize container executor, e); } this.context = createNMContext(containerTokenSecretManager, nmTokenSecretManager, nmStore); {code} context is null when service init fails {code} private void stopRecoveryStore() throws IOException { nmStore.stop(); if (context.getDecommissioned() nmStore.canRecover()) { .. } } {code} Null pointer exception thrown {quote} 015-04-07 17:31:45,807 WARN org.apache.hadoop.service.AbstractService: When stopping the service NodeManager : java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:168) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:280) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:484) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:534) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483093#comment-14483093 ] Bibin A Chundatt commented on YARN-2801: [~leftnoteasy] Any update on documentation for Node Labels . Its difficult to completely evaluate this feature with out documentation Documentation development for Node labels requirment Key: YARN-2801 URL: https://issues.apache.org/jira/browse/YARN-2801 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gururaj Shetty Assignee: Wangda Tan Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage
[ https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483281#comment-14483281 ] Hadoop QA commented on YARN-3348: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723624/apache-yarn-3348.0.patch against trunk revision 75c5454. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.api.TestPBImplRecords Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7235//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7235//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7235//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7235//console This message is automatically generated. Add a 'yarn top' tool to help understand cluster usage -- Key: YARN-3348 URL: https://issues.apache.org/jira/browse/YARN-3348 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-3348.0.patch It would be helpful to have a 'yarn top' tool that would allow administrators to understand which apps are consuming resources. Ideally the tool would allow you to filter by queue, user, maybe labels, etc and show you statistics on container allocation across the cluster to find out which apps are consuming the most resources on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage
[ https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3348: Attachment: apache-yarn-3348.0.patch Uploaded initial version of the patch. Most of the work is in a new TopCLI class. I added an application reports cache in ClientRMService with a timeout of 5 seconds as well as a boolean in GetApplicationsRequest to fetch cached versions of the reports. The tool essentially prints out the application report. The default refresh rate is 3 seconds. Add a 'yarn top' tool to help understand cluster usage -- Key: YARN-3348 URL: https://issues.apache.org/jira/browse/YARN-3348 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-3348.0.patch It would be helpful to have a 'yarn top' tool that would allow administrators to understand which apps are consuming resources. Ideally the tool would allow you to filter by queue, user, maybe labels, etc and show you statistics on container allocation across the cluster to find out which apps are consuming the most resources on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483319#comment-14483319 ] Hadoop QA commented on YARN-3293: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723647/apache-yarn-3293.3.patch against trunk revision 75c5454. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7238//console This message is automatically generated. Track and display capacity scheduler health metrics in web UI - Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, apache-yarn-3293.3.patch It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3293: Attachment: apache-yarn-3293.4.patch Doh! Uploaded the stat instead of the patch. Uploading the real patch. Track and display capacity scheduler health metrics in web UI - Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, apache-yarn-3293.4.patch It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2429) LCE should blacklist based upon group
[ https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483233#comment-14483233 ] Hudson commented on YARN-2429: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #147 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/147/]) YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 99b08a748e7b00a58b63330b353902a6da6aae27) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java * hadoop-yarn-project/CHANGES.txt LCE should blacklist based upon group - Key: YARN-2429 URL: https://issues.apache.org/jira/browse/YARN-2429 Project: Hadoop YARN Issue Type: New Feature Reporter: Allen Wittenauer It should be possible to list a group to ban, not just individual users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483231#comment-14483231 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #147 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/147/]) Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 3fb5abfc87953377f86e06578518801a181d7697) * hadoop-yarn-project/CHANGES.txt Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2003: -- Attachment: 0006-YARN-2003.patch Hi [~leftnoteasy] Rebased the patch. This patch is independent of others. But YARN-2004 will have to depend on this to implement the abstract methods defined here. Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] -- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483261#comment-14483261 ] Hudson commented on YARN-3273: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2088 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2088/]) Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 3fb5abfc87953377f86e06578518801a181d7697) * hadoop-yarn-project/CHANGES.txt Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2429) LCE should blacklist based upon group
[ https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483263#comment-14483263 ] Hudson commented on YARN-2429: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2088 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2088/]) YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 99b08a748e7b00a58b63330b353902a6da6aae27) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java * hadoop-yarn-project/CHANGES.txt LCE should blacklist based upon group - Key: YARN-2429 URL: https://issues.apache.org/jira/browse/YARN-2429 Project: Hadoop YARN Issue Type: New Feature Reporter: Allen Wittenauer It should be possible to list a group to ban, not just individual users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3293: Attachment: apache-yarn-3293.3.patch {quote} General - it looks like the counters could possibly overflow and provide negative values, perhaps this is not something which could possibly happen in the lifetime of a cluster, but a large long-running cluster, is it a possiblilty/concern? {quote} The counters in SchedulerHealth are Long so it should be fine. The counters in AssignmentInformation(new class I added) are reset every allocation cycle. {quote} This presently looks to be capasched only, had a suggestion to make slightly more general below, Vinod Kumar Vavilapalli also mentioned not specific to scheduler, perhaps it's fine to go capasched only for the first iteration, but wanted to verify (perhaps we need a followon jira for other schedulers). {quote} Yes. That's the plan - once it's in for CapacityScheduler, I'll file a ticket to add the information for FairScheduler and point to this one as an example of the stuff we added. {quote} on the web page It's a nit, but I find I don't like the look of the / between the counter and the resource expression where that occurs, maybe - instead of / for those (allocations/reservations/releases)? {quote} Fixed. {quote} TestSchedulerHealth can we import Nodemanager get rid of package references in code {quote} Fixed. {quote} CapacitySchedulerHealthInfo looks like there is no need to keep a reference to the CapacityScheduler instance after construction, can we drop it from being a member then? {quote} Fixed. {quote} looks like line changes in info log are just whitespace, can you drop them? {quote} Fixed. {quote} LeafQueue L884 looks to be just whitespace, can you revert? {quote} Fixed. {quote} CSAssignment I think that there should be a new, gsharable between schedulers class which incorporates all the new assignment info and that it should be a member of CSAssignment, instead of adding all of the details directly to CSAssignment. You would still pack the info into CSAssignment (as an instance of that type), but now would take a form that can be shared across schedulers {quote} Fixed. I created a new class called AssignmentInformation which encapsulates everything. Track and display capacity scheduler health metrics in web UI - Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, apache-yarn-3293.3.patch It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3293: Attachment: (was: apache-yarn-3293.3.patch) Track and display capacity scheduler health metrics in web UI - Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated YARN-3021: Attachment: YARN-3021.007.patch Upload same patch again for another test. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Assignee: Yongjun Zhang Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage
[ https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483424#comment-14483424 ] Hadoop QA commented on YARN-3348: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723653/apache-yarn-3348.1.patch against trunk revision 19a4fea. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7240//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7240//console This message is automatically generated. Add a 'yarn top' tool to help understand cluster usage -- Key: YARN-3348 URL: https://issues.apache.org/jira/browse/YARN-3348 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch It would be helpful to have a 'yarn top' tool that would allow administrators to understand which apps are consuming resources. Ideally the tool would allow you to filter by queue, user, maybe labels, etc and show you statistics on container allocation across the cluster to find out which apps are consuming the most resources on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483486#comment-14483486 ] Xuan Gong commented on YARN-3294: - +1 lgtm. Will commit Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, apache-yarn-3294.3.patch, apache-yarn-3294.4.patch It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3293: Attachment: apache-yarn-3293.5.patch The findbug warnings are incorrect - the fields are used by JAXB. Updated patch to exclude them. The failing test is unrelated. Track and display capacity scheduler health metrics in web UI - Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, apache-yarn-3293.4.patch, apache-yarn-3293.5.patch It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3046) [Event producers] Implement MapReduce AM writing some MR metrics to ATS
[ https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3046: - Attachment: YARN-3046-no-test-v2.patch Forget to add code for finding of TimelineCollectorAddress in previous patch, add it in v2. End 2 end test is still missing in this patch here because some existing tests - TestMRTimelineEventHandling get failed locally even without applying this patch. Still in digging out the issue. [Event producers] Implement MapReduce AM writing some MR metrics to ATS --- Key: YARN-3046 URL: https://issues.apache.org/jira/browse/YARN-3046 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes written) and have the MR AM write the framework-specific metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat
[ https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483392#comment-14483392 ] Junping Du commented on YARN-1376: -- Forget to mention, if RM can know log aggregation status in NM side, I think we can remove getKeepAliveApplications() in NM-RM heartbeat because RM can keep finished applications tokens alive based on status of log aggregation directly. However, we don't have to address this issue in this JIRA and can file a separated one. For UI changes, [~xgong], can you put a screenshot as well? Thanks! NM need to notify the log aggregation status to RM through Node heartbeat - Key: YARN-1376 URL: https://issues.apache.org/jira/browse/YARN-1376 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, YARN-1376.3.patch, YARN-1376.4.patch Expose a client API to allow clients to figure if log aggregation is complete. The ticket is used to track the changes on NM side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3127) Apphistory url crashes when RM switches with ATS enabled
[ https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483396#comment-14483396 ] Naganarasimha G R commented on YARN-3127: - Hi [~xgong], If you have the bandwidth can you take a look at this patch too ? Apphistory url crashes when RM switches with ATS enabled Key: YARN-3127 URL: https://issues.apache.org/jira/browse/YARN-3127 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: RM HA with ATS Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Attachments: YARN-3127.20150213-1.patch, YARN-3127.20150329-1.patch 1.Start RM with HA and ATS configured and run some yarn applications 2.Once applications are finished sucessfully start timeline server 3.Now failover HA form active to standby 4.Access timeline server URL IP:PORT/applicationhistory Result: Application history URL fails with below info {quote} 2015-02-03 20:28:09,511 ERROR org.apache.hadoop.yarn.webapp.View: Failed to read the applications. java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) at org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:80) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) ... Caused by: org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: The entity for application attempt appattempt_1422972608379_0001_01 doesn't exist in the timeline store at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplicationAttempt(ApplicationHistoryManagerOnTimelineStore.java:151) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.generateApplicationReport(ApplicationHistoryManagerOnTimelineStore.java:499) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAllApplications(ApplicationHistoryManagerOnTimelineStore.java:108) at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:84) at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:81) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ... 51 more 2015-02-03 20:28:09,512 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /applicationhistory org.apache.hadoop.yarn.webapp.WebAppException: Error rendering block: nestLevel=6 expected 5 at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) {quote} Behaviour with AHS with file based history store -Apphistory url is working -No attempt entries are shown for each application. Based on inital analysis when RM switches ,application attempts from state store are not replayed but only applications are. So when /applicaitonhistory url is accessed it tries for all attempt id and fails -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3110) Few issues in ApplicationHistory web ui
[ https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483371#comment-14483371 ] Hudson commented on YARN-3110: -- FAILURE: Integrated in Hadoop-trunk-Commit #7519 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7519/]) YARN-3110. Few issues in ApplicationHistory web ui. Contributed by Naganarasimha G R (xgong: rev 19a4feaf6fcf42ebbfe98b8a7153ade96d37fb14) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java Few issues in ApplicationHistory web ui --- Key: YARN-3110 URL: https://issues.apache.org/jira/browse/YARN-3110 Project: Hadoop YARN Issue Type: Sub-task Components: applications, timelineserver Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Fix For: 2.8.0 Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch, YARN-3110.20150406-1.patch Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity: 10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistory page Kill the same application . In timeline server logs the below is show when selecting application link. {quote} 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to read the AM container of the application attempt appattempt_1422467063659_0007_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38) at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at
[jira] [Commented] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483416#comment-14483416 ] Hadoop QA commented on YARN-2003: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723643/0006-YARN-2003.patch against trunk revision 75c5454. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7236//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7236//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7236//console This message is automatically generated. Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] -- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483417#comment-14483417 ] Hadoop QA commented on YARN-3021: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723642/YARN-3021.007.patch against trunk revision 75c5454. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7237//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7237//console This message is automatically generated. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Assignee: Yongjun Zhang Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage
[ https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483429#comment-14483429 ] Allen Wittenauer commented on YARN-3348: {code} +doNotSetCols=0 +doNotSetRows=0 +for i in $@; do + if [[ $i == -cols ]]; then +doNotSetCols=1 + fi + if [[ $i == -rows ]]; then +doNotSetRows=1 + fi +done +if [[ $doNotSetCols == 0 ]]; then + cols=`tput cols` + args=( $@ ) + args=(${args[@]} -cols $cols) + set -- ${args[@]} +fi +if [[ $doNotSetRows == 0 ]]; then + rows=`tput lines` + args=( $@ ) + args=(${args[@]} -rows $rows) + set -- ${args[@]} +fi {code} * Why are we doing this manipulation here and not in the Java code? * backticks are antiquated in modern bash. Use {{$()}} construction * What happens if tput gives you zero or an error because you are on a non-addressable terminal? (You can generally simulate this by unset TERM or equivalent env var) Add a 'yarn top' tool to help understand cluster usage -- Key: YARN-3348 URL: https://issues.apache.org/jira/browse/YARN-3348 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch It would be helpful to have a 'yarn top' tool that would allow administrators to understand which apps are consuming resources. Ideally the tool would allow you to filter by queue, user, maybe labels, etc and show you statistics on container allocation across the cluster to find out which apps are consuming the most resources on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3110) Few issues in ApplicationHistory web ui
[ https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483353#comment-14483353 ] Xuan Gong commented on YARN-3110: - Committed into trunk/branch-2. Thanks, Naganarasimha Few issues in ApplicationHistory web ui --- Key: YARN-3110 URL: https://issues.apache.org/jira/browse/YARN-3110 Project: Hadoop YARN Issue Type: Sub-task Components: applications, timelineserver Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch, YARN-3110.20150406-1.patch Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity: 10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistory page Kill the same application . In timeline server logs the below is show when selecting application link. {quote} 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to read the AM container of the application attempt appattempt_1422467063659_0007_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38) at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at
[jira] [Updated] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage
[ https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3348: Attachment: apache-yarn-3348.1.patch Uploaded a new patch to fix release audit warning and failing test. The findbugs warning is from another test. Add a 'yarn top' tool to help understand cluster usage -- Key: YARN-3348 URL: https://issues.apache.org/jira/browse/YARN-3348 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch It would be helpful to have a 'yarn top' tool that would allow administrators to understand which apps are consuming resources. Ideally the tool would allow you to filter by queue, user, maybe labels, etc and show you statistics on container allocation across the cluster to find out which apps are consuming the most resources on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage
[ https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483373#comment-14483373 ] Varun Vasudev commented on YARN-3348: - Sorry, that last comment should have been The findbugs warning is from some another patch - YARN-2901. Add a 'yarn top' tool to help understand cluster usage -- Key: YARN-3348 URL: https://issues.apache.org/jira/browse/YARN-3348 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch It would be helpful to have a 'yarn top' tool that would allow administrators to understand which apps are consuming resources. Ideally the tool would allow you to filter by queue, user, maybe labels, etc and show you statistics on container allocation across the cluster to find out which apps are consuming the most resources on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.
[ https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483432#comment-14483432 ] Junping Du commented on YARN-3431: -- Thanks [~zjshen] for the patch and [~gtCarrera9] for review and comments. bq. However, I'm a little bit confused about the big picture of this patch. I put some contents and background in JIRA description. Hope it helps. {code} -putObjects(entities, params, entitiesContainer); +for (org.apache.hadoop.yarn.api.records.timelineservice.TimelineEntity entity : entities) { + String path = entities; + try { +path += / + TimelineEntityType.valueOf(entity.getType()).toString(); + } catch (IllegalArgumentException e) { +// Do nothing, generic entity type + } + putObjects(path, params, entity); +} {code} Looks like we are breaking one put operation into pieces. This doesn't make sense in performance prospective. Do we have to do this? BTW, we should handle IllegalArgumentException instead of ignoring it. Isn't it? Sub resources of timeline entity needs to be passed to a separate endpoint. --- Key: YARN-3431 URL: https://issues.apache.org/jira/browse/YARN-3431 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3431.1.patch, YARN-3431.2.patch We have TimelineEntity and some other entities as subclass that inherit from it. However, we only have a single endpoint, which consume TimelineEntity rather than sub-classes and this endpoint will check the incoming request body contains exactly TimelineEntity object. However, the json data which is serialized from sub-class object seems not to be treated as an TimelineEntity object, and won't be deserialized into the corresponding sub-class object which cause deserialization failure as some discussions in YARN-3334 : https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483445#comment-14483445 ] Hadoop QA commented on YARN-3293: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723649/apache-yarn-3293.4.patch against trunk revision 75c5454. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7239//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7239//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7239//console This message is automatically generated. Track and display capacity scheduler health metrics in web UI - Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, apache-yarn-3293.4.patch It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483498#comment-14483498 ] Junping Du commented on YARN-3391: -- Sorry for coming a little late. Thanks guys for good discussions here and [~zjshen] for updating the patch! bq. I just wanted to add my 2 cents that this is something we already see and experience with hRaven so it's not theoretical. +1, [~sjlee0]! I think that's very important feedback for improving user experience for new feature here. Let's try to get a good balance between addressing these solid scenarios as well as providing flexibility to possible new scenarios. e.g. we can provide different flow group policies that user can use to group application into flow by name or keeping them as isolated flow, etc. Anyway, as everyone's agreement so far, let's continue the discussion on a separated JIRA for figuring it out later. The patch looks good in overall. However, I still haven't seen we put definition of flow, flow run and flow version in any places of Javadoc. As I mentioned earlier, it should be useful for developers. The official Apache feature doc is more user oriented and we can address it later when feature get completed. Clearly define flow ID/ flow run / flow version in API and storage -- Key: YARN-3391 URL: https://issues.apache.org/jira/browse/YARN-3391 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3391.1.patch, YARN-3391.2.patch To continue the discussion in YARN-3040, let's figure out the best way to describe the flow. Some key issues that we need to conclude on: - How do we include the flow version in the context so that it gets passed into the collector and to the storage eventually? - Flow run id should be a number as opposed to a generic string? - Default behavior for the flow run id if it is missing (i.e. client did not set it) - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.
[ https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3431: - Description: We have TimelineEntity and some other entities as subclass that inherit from it. However, we only have a single endpoint, which consume TimelineEntity rather than sub-classes and this endpoint will check the incoming request body contains exactly TimelineEntity object. However, the json data which is serialized from sub-class object seems not to be treated as an TimelineEntity object, and won't be deserialized into the corresponding sub-class object which cause deserialization failure as some discussions in YARN-3334 : https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059. Sub resources of timeline entity needs to be passed to a separate endpoint. --- Key: YARN-3431 URL: https://issues.apache.org/jira/browse/YARN-3431 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3431.1.patch, YARN-3431.2.patch We have TimelineEntity and some other entities as subclass that inherit from it. However, we only have a single endpoint, which consume TimelineEntity rather than sub-classes and this endpoint will check the incoming request body contains exactly TimelineEntity object. However, the json data which is serialized from sub-class object seems not to be treated as an TimelineEntity object, and won't be deserialized into the corresponding sub-class object which cause deserialization failure as some discussions in YARN-3334 : https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat
[ https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483419#comment-14483419 ] Junping Du commented on YARN-1376: -- bq. I didn't see where we remove element from logAggregationReportForApps. I think we need to remove it when log aggregation finished or it will still occupy (and may eat up gradually) NM's memory. Just synced with [~xgong] offline that we do poll operation on logAggregationReportForApps, so element should get removed from ConcurrentLinkedQueue, so my previous comments is not valid here. However, there is a case we need pay attention that if heartbeat request is not getting respond, we shouldn't poll the aggregation report directly. Instead, we could put polled element to some temporary list, then drop it if get response successfully or merge it back to LinkedQueue for next time heartbeat. NM need to notify the log aggregation status to RM through Node heartbeat - Key: YARN-1376 URL: https://issues.apache.org/jira/browse/YARN-1376 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, YARN-1376.3.patch, YARN-1376.4.patch Expose a client API to allow clients to figure if log aggregation is complete. The ticket is used to track the changes on NM side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM
[ https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483489#comment-14483489 ] Varun Vasudev commented on YARN-3443: - +1, lgtm for the latest patch. Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM - Key: YARN-3443 URL: https://issues.apache.org/jira/browse/YARN-3443 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3443.001.patch, YARN-3443.002.patch, YARN-3443.003.patch, YARN-3443.004.patch The current cgroups implementation is closely tied to supporting CPU as a resource . We need to separate out CGroups support as well a provide a simple ResourceHandler subsystem that will enable us to add support for new resource types on the NM - e.g Network, Disk etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483490#comment-14483490 ] Xuan Gong commented on YARN-3294: - Committed into trunk/branch-2. Thanks, varun. Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, apache-yarn-3294.3.patch, apache-yarn-3294.4.patch It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483518#comment-14483518 ] Hudson commented on YARN-3294: -- FAILURE: Integrated in Hadoop-trunk-Commit #7521 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7521/]) YARN-3294. Allow dumping of Capacity Scheduler debug logs via web UI for (xgong: rev d27e9241e8676a0edb2d35453cac5f9495fcd605) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestAdHocLogDumper.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AdHocLogDumper.java Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, apache-yarn-3294.3.patch, apache-yarn-3294.4.patch It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage
[ https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483377#comment-14483377 ] Varun Vasudev commented on YARN-3348: - The attached patch applies only to trunk. Once I get a +1, I'll put a version that applies to branch-2. Add a 'yarn top' tool to help understand cluster usage -- Key: YARN-3348 URL: https://issues.apache.org/jira/browse/YARN-3348 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch It would be helpful to have a 'yarn top' tool that would allow administrators to understand which apps are consuming resources. Ideally the tool would allow you to filter by queue, user, maybe labels, etc and show you statistics on container allocation across the cluster to find out which apps are consuming the most resources on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3110) Few issues in ApplicationHistory web ui
[ https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483346#comment-14483346 ] Xuan Gong commented on YARN-3110: - +1 LGTM. Will commit Few issues in ApplicationHistory web ui --- Key: YARN-3110 URL: https://issues.apache.org/jira/browse/YARN-3110 Project: Hadoop YARN Issue Type: Sub-task Components: applications, timelineserver Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch, YARN-3110.20150406-1.patch Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity: 10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistory page Kill the same application . In timeline server logs the below is show when selecting application link. {quote} 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to read the AM container of the application attempt appattempt_1422467063659_0007_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38) at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
[jira] [Commented] (YARN-3110) Few issues in ApplicationHistory web ui
[ https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483366#comment-14483366 ] Naganarasimha G R commented on YARN-3110: - Thanks for reviewing Commiting [~xgong] :) Few issues in ApplicationHistory web ui --- Key: YARN-3110 URL: https://issues.apache.org/jira/browse/YARN-3110 Project: Hadoop YARN Issue Type: Sub-task Components: applications, timelineserver Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Fix For: 2.8.0 Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch, YARN-3110.20150406-1.patch Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity: 10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistory page Kill the same application . In timeline server logs the below is show when selecting application link. {quote} 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to read the AM container of the application attempt appattempt_1422467063659_0007_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38) at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at
[jira] [Updated] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat
[ https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1376: Attachment: Screen Shot 2015-04-07 at 9.30.42 AM.png NM need to notify the log aggregation status to RM through Node heartbeat - Key: YARN-1376 URL: https://issues.apache.org/jira/browse/YARN-1376 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch Expose a client API to allow clients to figure if log aggregation is complete. The ticket is used to track the changes on NM side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat
[ https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1376: Attachment: YARN-1376.2015-04-07.patch Address all the latest comments. NM need to notify the log aggregation status to RM through Node heartbeat - Key: YARN-1376 URL: https://issues.apache.org/jira/browse/YARN-1376 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch Expose a client API to allow clients to figure if log aggregation is complete. The ticket is used to track the changes on NM side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483668#comment-14483668 ] Hadoop QA commented on YARN-3293: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723665/apache-yarn-3293.5.patch against trunk revision 0b5d7d2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7242//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7242//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7242//console This message is automatically generated. Track and display capacity scheduler health metrics in web UI - Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, apache-yarn-3293.4.patch, apache-yarn-3293.5.patch It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2429) LCE should blacklist based upon group
[ https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483666#comment-14483666 ] Hudson commented on YARN-2429: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #157 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/157/]) YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 99b08a748e7b00a58b63330b353902a6da6aae27) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java LCE should blacklist based upon group - Key: YARN-2429 URL: https://issues.apache.org/jira/browse/YARN-2429 Project: Hadoop YARN Issue Type: New Feature Reporter: Allen Wittenauer It should be possible to list a group to ban, not just individual users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483827#comment-14483827 ] Hadoop QA commented on YARN-3460: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723668/HADOOP-11810-1.patch against trunk revision d27e924. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1148 javac compiler warnings (more than the trunk's current 209 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 43 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7245//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7245//console This message is automatically generated. Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM Key: YARN-3460 URL: https://issues.apache.org/jira/browse/YARN-3460 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.6.0 Environment: $ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T11:37:52-06:00) Maven home: /opt/apache-maven-3.2.1 Java version: 1.7.0, vendor: IBM Corporation Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 3.10.0-229.ael7b.ppc64le, arch: ppc64le, family: unix Reporter: pascal oliva Attachments: HADOOP-11810-1.patch TestSecureRMRegistryOperations failed with JBM IBM JAVA mvn test -X -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations ModuleTotal Failure Error Skipped - hadoop-yarn-registry 12 0 12 0 - Total 12 0 12 0 With javax.security.auth.login.LoginException: Bad JAAS configuration: unrecognized option: isInitiator and Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3458) CPU resource monitoring in Windows
Inigo Goiri created YARN-3458: - Summary: CPU resource monitoring in Windows Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483664#comment-14483664 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #157 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/157/]) Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 3fb5abfc87953377f86e06578518801a181d7697) * hadoop-yarn-project/CHANGES.txt Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes
[ https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483685#comment-14483685 ] Jian He commented on YARN-3439: --- IIUC, isn't this a long-standing issue that Ozzie doesn't set mapreduce.job.complete.cancel.delegation.tokens to false for standard MR Job ? according to [here | https://issues.apache.org/jira/browse/YARN-2964?focusedCommentId=14250926page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14250926]. Should we set it to false on Ozzie side ? RM fails to renew token when Oozie launcher leaves before sub-job finishes -- Key: YARN-3439 URL: https://issues.apache.org/jira/browse/YARN-3439 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Jason Lowe Assignee: Daryn Sharp Priority: Blocker Attachments: YARN-3439.001.patch When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't linger waiting for the sub-job to finish. At that point the RM stops renewing delegation tokens for the launcher job which wreaks havoc on the sub-job if the sub-job runs long enough for the tokens to expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483631#comment-14483631 ] Hadoop QA commented on YARN-3458: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723671/YARN-3458-1.patch against trunk revision d27e924. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7243//console This message is automatically generated. CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Labels: containers metrics windows (was: ) CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran moved HADOOP-11810 to YARN-3460: --- Fix Version/s: (was: 3.0.0) Target Version/s: 2.8.0 (was: 2.6.0) Affects Version/s: (was: 2.6.0) (was: 3.0.0) 3.0.0 2.6.0 Key: YARN-3460 (was: HADOOP-11810) Project: Hadoop YARN (was: Hadoop Common) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM Key: YARN-3460 URL: https://issues.apache.org/jira/browse/YARN-3460 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.6.0, 3.0.0 Environment: $ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T11:37:52-06:00) Maven home: /opt/apache-maven-3.2.1 Java version: 1.7.0, vendor: IBM Corporation Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 3.10.0-229.ael7b.ppc64le, arch: ppc64le, family: unix Reporter: pascal oliva Attachments: HADOOP-11810-1.patch TestSecureRMRegistryOperations failed with JBM IBM JAVA mvn test -X -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations ModuleTotal Failure Error Skipped - hadoop-yarn-registry 12 0 12 0 - Total 12 0 12 0 With javax.security.auth.login.LoginException: Bad JAAS configuration: unrecognized option: isInitiator and Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3391: -- Attachment: YARN-3391.3.patch Clearly define flow ID/ flow run / flow version in API and storage -- Key: YARN-3391 URL: https://issues.apache.org/jira/browse/YARN-3391 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch To continue the discussion in YARN-3040, let's figure out the best way to describe the flow. Some key issues that we need to conclude on: - How do we include the flow version in the context so that it gets passed into the collector and to the storage eventually? - Flow run id should be a number as opposed to a generic string? - Default behavior for the flow run id if it is missing (i.e. client did not set it) - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483822#comment-14483822 ] Hadoop QA commented on YARN-3458: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723681/YARN-3458-3.patch against trunk revision d27e924. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7246//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7246//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7246//console This message is automatically generated. CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483627#comment-14483627 ] Inigo Goiri commented on YARN-3458: --- Not sure if the patch has been created properly as I'm in between a couple versions. I would created based on trunk if this doesn't work. CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor Attachments: YARN-3458-1.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Attachment: YARN-3458-2.patch Patch based on trunk. Let's see if Jenkins likes it. CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes
[ https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483698#comment-14483698 ] Jason Lowe commented on YARN-3439: -- I believe it is setting that to false, as that behavior hasn't changed on the Oozie side. However this isn't an issue of the token being cancelled but rather expiring. The RM properly avoids cancelling the token when the launcher job exits, but it then forgets to keep renewing it as well. Eventually the token expires and downstream jobs fail (if they run long enough). RM fails to renew token when Oozie launcher leaves before sub-job finishes -- Key: YARN-3439 URL: https://issues.apache.org/jira/browse/YARN-3439 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Jason Lowe Assignee: Daryn Sharp Priority: Blocker Attachments: YARN-3439.001.patch When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't linger waiting for the sub-job to finish. At that point the RM stops renewing delegation tokens for the launcher job which wreaks havoc on the sub-job if the sub-job runs long enough for the tokens to expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Attachment: YARN-3458-3.patch Git and I are getting through a rough relation, let's see if now... CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483605#comment-14483605 ] Sidharta Seethana commented on YARN-3366: - Thanks for the review, [~vvasudev] . Responses inline : 1. I'll fix this. This is an artifact of differences between trunk/branch-2 (repeated) 1. I think these are useful log lines that specify change in behavior due to settings/system state etc. I'll clarify/improve the log messages. 2. good catch, I'll fix it. Tests ran fine because WARN logging was enabled. 3. I'll fix the comments' location. The exception used to exist before but was causing bootstrapping issues. I left it in there along with an explanation for why it shouldn't be thrown. I'll remove it and modify comments. 4. Intellij warns me about this too - but I had left it in there for clarity/consistency with the earlier code block - I believe it makes the code a bit more readable. I would prefer to leave it in place. 5. I'll fix this 6. I'll fix this 7. why? compiler optimization? 8. I'll fix this. 9. I'll fix this. 10. I'll fix this. 11. I'll fix this - though I don't believe the merging always helps for error/warn metrics 12. I'll fix this. 13. Not trivially, would refactoring launchContainer. Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3366.001.patch, YARN-3366.002.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat
[ https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483660#comment-14483660 ] Hadoop QA commented on YARN-1376: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723661/YARN-1376.2015-04-07.patch against trunk revision 0b5d7d2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7241//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7241//console This message is automatically generated. NM need to notify the log aggregation status to RM through Node heartbeat - Key: YARN-1376 URL: https://issues.apache.org/jira/browse/YARN-1376 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch Expose a client API to allow clients to figure if log aggregation is complete. The ticket is used to track the changes on NM side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483681#comment-14483681 ] Li Lu commented on YARN-3426: - The failed unit test also breaks in trunk. Will file a blocker on this. Add jdiff support to YARN - Key: YARN-3426 URL: https://issues.apache.org/jira/browse/YARN-3426 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Priority: Blocker Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483692#comment-14483692 ] Hadoop QA commented on YARN-3458: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723677/YARN-3458-2.patch against trunk revision d27e924. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7244//console This message is automatically generated. CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk
[ https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483779#comment-14483779 ] Li Lu commented on YARN-3459: - Reproduced this failure on my local machine as well as Jenkins run for YARN-3426. Seems like the test failure was introduced by YARN-2901. [~wangda][~vvasudev] can anyone of you take a look at it? Thanks! TestLog4jWarningErrorMetricsAppender breaks in trunk Key: YARN-3459 URL: https://issues.apache.org/jira/browse/YARN-3459 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Priority: Blocker Fix For: 2.7.0 TestLog4jWarningErrorMetricsAppender fails with the following message: {code} Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender) Time elapsed: 2.01 sec FAILURE! java.lang.AssertionError: expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483808#comment-14483808 ] Vinod Kumar Vavilapalli commented on YARN-3361: --- Review of the tests - testNonExclusiveNodeLabelsAllocationIgnoreAppSubmitOrder -- - testPreferenceOfNeedyAppsTowardsNodePartitions ? -- This doesn't really guarantee if app2 is getting preference or not. How about changing it to say app2 has enough requests to fill the entire node? - testNonExclusiveNodeLabelsAllocationIgnorePriority -- - testPreferenceOfNeedyContainersTowardsNodePartitions ? -- Actually, now that I rename it that way, this may not be the right behavior. Not respecting priorities within an app can result in scheduling deadlocks. - testLabeledResourceRequestsGetPreferrenceInHierarchyOfQueue: This is really testQueuesWithAccessGetPreferrenceInPartitionedNodes? - testNonLabeledQueueUsesLabeledResource -- - testQueuesWithoutAccessUsingPartitionedNodes -- Also validate that the wait for non-labeled requests not getting allocated on non-partitioned nodes is only for one cycle through all nodes in the cluster - Let's move all these node-label related tests into their own test-case. - More tests? -- AMs with labeled requirement not getting allocated on non-exclusive partitions -- To verify that we are not putting absolute max-capacities on the individual queues when not-respecting-partitions CapacityScheduler side changes to support non-exclusive node labels --- Key: YARN-3361 URL: https://issues.apache.org/jira/browse/YARN-3361 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3361.1.patch, YARN-3361.2.patch According to design doc attached in YARN-3214, we need implement following logic in CapacityScheduler: 1) When allocate a resource request with no node-label specified, it should get preferentially allocated to node without labels. 2) When there're some available resource in a node with label, they can be used by applications with following order: - Applications under queues which can access the label and ask for same labeled resource. - Applications under queues which can access the label and ask for non-labeled resource. - Applications under queues cannot access the label and ask for non-labeled resource. 3) Expose necessary information that can be used by preemption policy to make preemption decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3046) [Event producers] Implement MapReduce AM writing some MR metrics to ATS
[ https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483588#comment-14483588 ] Junping Du commented on YARN-3046: -- Linked with MAPREDUCE-6189 - the test failure on trunk is solid, not only on my local test bed. [Event producers] Implement MapReduce AM writing some MR metrics to ATS --- Key: YARN-3046 URL: https://issues.apache.org/jira/browse/YARN-3046 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes written) and have the MR AM write the framework-specific metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Attachment: YARN-3458-1.patch CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor Attachments: YARN-3458-1.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483693#comment-14483693 ] Jian He commented on YARN-3361: --- Some comments on my side - should treat each limit differently for different labeled requests? {code} // Otherwise, if any of the label of this node beyond queue limit, we // cannot allocate on this node. Consider a small epsilon here. {code} - Merge queue#needResource and application#needResource - needResource - hasPendingResourceRequest; needResource can also be simplified if pass in partionToAllocate - Some methods like canAssignToThisQueue where both nodeLabels and exclusiveType are passed, it may be simplified by passing the current partitionToAllocate to simplify the internal if/else check. - The following may be incorrect, as the current request may be not the AM container request, though null == rmAppAttempt.getMasterContainer() {code} // AM container allocation doesn't support non-exclusive allocation to // avoid painful of preempt an AM container if {code} - below if/else can be avoided if passing the nodePartition into queueCapacities.getAbsoluteCapacity(nodePartition), {code} if (!nodePartition.equals(RMNodeLabelsManager.NO_LABEL)) { queueCapacity = Resources .max(resourceCalculator, clusterResource, queueCapacity, Resources.multiplyAndNormalizeUp( resourceCalculator, labelManager.getResourceByLabel(nodePartition, clusterResource), queueCapacities.getAbsoluteCapacity(nodePartition), minimumAllocation)); } else { // else there's no label on request, just to use absolute capacity as // capacity for nodes without label queueCapacity = Resources.multiplyAndNormalizeUp(resourceCalculator, labelManager .getResourceByLabel(CommonNodeLabelsManager.NO_LABEL, clusterResource), queueCapacities.getAbsoluteCapacity(), minimumAllocation); } {code} - the second limit won’t be hit? {code} if (exclusiveType == ExclusiveType.EXCLUSIVE) { maxUserLimit = Resources.multiplyAndRoundDown(queueCapacity, userLimitFactor); } else if (exclusiveType == ExclusiveType.NON_EXECLUSIVE) { maxUserLimit = labelManager.getResourceByLabel(nodePartition, clusterResource); } {code} - nonExclusiveSchedulingOpportunities#setCount - add(Priority) CapacityScheduler side changes to support non-exclusive node labels --- Key: YARN-3361 URL: https://issues.apache.org/jira/browse/YARN-3361 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3361.1.patch, YARN-3361.2.patch According to design doc attached in YARN-3214, we need implement following logic in CapacityScheduler: 1) When allocate a resource request with no node-label specified, it should get preferentially allocated to node without labels. 2) When there're some available resource in a node with label, they can be used by applications with following order: - Applications under queues which can access the label and ask for same labeled resource. - Applications under queues which can access the label and ask for non-labeled resource. - Applications under queues cannot access the label and ask for non-labeled resource. 3) Expose necessary information that can be used by preemption policy to make preemption decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2429) LCE should blacklist based upon group
[ https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483716#comment-14483716 ] Hudson commented on YARN-2429: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2106 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2106/]) YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 99b08a748e7b00a58b63330b353902a6da6aae27) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java * hadoop-yarn-project/CHANGES.txt LCE should blacklist based upon group - Key: YARN-2429 URL: https://issues.apache.org/jira/browse/YARN-2429 Project: Hadoop YARN Issue Type: New Feature Reporter: Allen Wittenauer It should be possible to list a group to ban, not just individual users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483714#comment-14483714 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2106 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2106/]) Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 3fb5abfc87953377f86e06578518801a181d7697) * hadoop-yarn-project/CHANGES.txt Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk
Li Lu created YARN-3459: --- Summary: TestLog4jWarningErrorMetricsAppender breaks in trunk Key: YARN-3459 URL: https://issues.apache.org/jira/browse/YARN-3459 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Priority: Blocker Fix For: 2.7.0 TestLog4jWarningErrorMetricsAppender fails with the following message: {code} Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender) Time elapsed: 2.01 sec FAILURE! java.lang.AssertionError: expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3426: Attachment: YARN-3426-040715.patch Added license information to the four .xml API files. Add jdiff support to YARN - Key: YARN-3426 URL: https://issues.apache.org/jira/browse/YARN-3426 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Priority: Blocker Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, YARN-3426-040715.patch Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3461) Consolidate flow name/version/run defaults
Zhijie Shen created YARN-3461: - Summary: Consolidate flow name/version/run defaults Key: YARN-3461 URL: https://issues.apache.org/jira/browse/YARN-3461 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen In YARN-3391, it's not resolved what should be the defaults for flow name/version/run. Let's continue the discussion here and unblock YARN-3391 from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483812#comment-14483812 ] Zhijie Shen commented on YARN-3391: --- bq. let's continue the discussion on a separated JIRA for figuring it out later. Agree. Let's unblock this Jira which will unblock the writer implementation consequently. I filed YARN-3461 to continue the defaults discussion there. bq. I just wanted to add my 2 cents that this is something we already see and experience with hRaven so it's not theoretical. Sangjin, thanks for sharing the use case in hRaven. It's helpful to understand the proper defaults. To generalize it, we need to consider different use cases such as adhoc applications only. Shall we continue the discussion on YARN-3461? bq. As I mentioned earlier, it should be useful for developers I make use of Sangjin's previous comments to add some inline code comments about their definitions in TimelineCollectorContext. Clearly define flow ID/ flow run / flow version in API and storage -- Key: YARN-3391 URL: https://issues.apache.org/jira/browse/YARN-3391 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch To continue the discussion in YARN-3040, let's figure out the best way to describe the flow. Some key issues that we need to conclude on: - How do we include the flow version in the context so that it gets passed into the collector and to the storage eventually? - Flow run id should be a number as opposed to a generic string? - Default behavior for the flow run id if it is missing (i.e. client did not set it) - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483837#comment-14483837 ] Hadoop QA commented on YARN-3426: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723684/YARN-3426-040715.patch against trunk revision d27e924. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7247//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7247//console This message is automatically generated. Add jdiff support to YARN - Key: YARN-3426 URL: https://issues.apache.org/jira/browse/YARN-3426 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Priority: Blocker Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, YARN-3426-040715.patch Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes
[ https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-3439. -- Resolution: Duplicate bq. IAC, this is a dup of YARN-3055. Agreed, closing as a duplicate. RM fails to renew token when Oozie launcher leaves before sub-job finishes -- Key: YARN-3439 URL: https://issues.apache.org/jira/browse/YARN-3439 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Jason Lowe Assignee: Daryn Sharp Priority: Blocker Attachments: YARN-3439.001.patch When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't linger waiting for the sub-job to finish. At that point the RM stops renewing delegation tokens for the launcher job which wreaks havoc on the sub-job if the sub-job runs long enough for the tokens to expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3452) Bogus token usernames cause many invalid group lookups
[ https://issues.apache.org/jira/browse/YARN-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483861#comment-14483861 ] Jason Lowe commented on YARN-3452: -- The extra lookups started in 2.6 releases, and it appears to be caused by HADOOP-10650. However YARN really should not be using bogus users on tokens anyway in case the RPC layer (or other non-YARN systems) try to do something with those users like HADOOP-10650 did. Bogus token usernames cause many invalid group lookups -- Key: YARN-3452 URL: https://issues.apache.org/jira/browse/YARN-3452 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Jason Lowe YARN uses a number of bogus usernames for tokens, like application attempt IDs for NM tokens or even the hardcoded testing for the container localizer token. These tokens cause the RPC layer to do group lookups on these bogus usernames which will never succeed but can take a long time to perform. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483874#comment-14483874 ] Inigo Goiri commented on YARN-3458: --- For the tests, I checked the original TestWindowsBasedProcessTree and it didn't have related to actually testing the resource monitoring; I'm open to suggestions. Regarding the two warning, I'm not able to understand what this is complaining about; it says that I have fields not accessed but the ones I added are referenced. I think ti refers to Log but I'm not able to parse the error. CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3366: Attachment: YARN-3366.003.patch Uploading patch incorporating code review feedback. Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3366.001.patch, YARN-3366.002.patch, YARN-3366.003.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484008#comment-14484008 ] Li Lu commented on YARN-3426: - Could not reproduce the mvn eclipse:eclipse failure locally. The failure looks to be irrelevant. Add jdiff support to YARN - Key: YARN-3426 URL: https://issues.apache.org/jira/browse/YARN-3426 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Priority: Blocker Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, YARN-3426-040715.patch Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484117#comment-14484117 ] Daryn Sharp commented on YARN-3055: --- On cursory glance, are you sure this isn't going to leak tokens? Ie. does it remove tokens from data structures in all cases or can a token get left in allTokens? The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer -- Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Priority: Blocker Attachments: YARN-3055.001.patch, YARN-3055.002.patch After YARN-2964, there is only one timer to renew the token if it's shared by jobs. In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484032#comment-14484032 ] Zhijie Shen commented on YARN-3448: --- Jonathan, thanks for your contribution. It sounds an interesting proposal. I'd like to take a look at the patch too. Add Rolling Time To Lives Level DB Plugin Capabilities -- Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3448.1.patch For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484201#comment-14484201 ] Daryn Sharp commented on YARN-3055: --- This appears to go back to the really old days of renewing the token for its entire lifetime. Most unfortunate. The renewer looks like it may turn into a DOS weapon. Renewing a token returns the next expiration. The renewer uses a timer to renew 90% before expiration. After the last renewal, the same expiration (the wall) will be returned as before. 90% of the wall eventually becomes a rapid fire renewal. There's an army of 50 threads prepared to fire concurrently. My other concern is that it used to be the first job submitted with a given token that determined if the token is to be cancelled. Now any job can influence the cancelling. This patch didn't specifically break that behavior, but the original YARN-2704 did, which precipitated YARN-2964 to break it differently, and now this jira. The ramification is we used to tell users to make sure the first job set the conf correctly, and essentially don't worry after that. Now they do have to worry. Any sub-job with the default of canceling tokens will kill the overall workflow. Sub-jobs should not have jurisdiction over the tokens. The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer -- Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Priority: Blocker Attachments: YARN-3055.001.patch, YARN-3055.002.patch After YARN-2964, there is only one timer to renew the token if it's shared by jobs. In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3426: -- Target Version/s: 2.8.0 (was: 2.7.0) bq. The bigger question is the duplication of the maven code across Common, YARN and MAPREDUCE. But this may take more time to cleanup. Removing it from 2.7.0 as the effort needed for this cleanup is huge. Add jdiff support to YARN - Key: YARN-3426 URL: https://issues.apache.org/jira/browse/YARN-3426 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Priority: Blocker Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, YARN-3426-040715.patch Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484058#comment-14484058 ] Sidharta Seethana commented on YARN-2424: - Here it is : https://issues.apache.org/jira/browse/YARN-3462 LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Priority: Blocker Fix For: 2.6.0 Attachments: Y2424-1.patch, YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2
Sidharta Seethana created YARN-3462: --- Summary: Patches applied for YARN-2424 are inconsistent between trunk and branch-2 Key: YARN-3462 URL: https://issues.apache.org/jira/browse/YARN-3462 Project: Hadoop YARN Issue Type: Bug Reporter: Sidharta Seethana It looks like the changes for YARN-2424 are not the same for trunk (commit 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning and documentation is a bit different as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484098#comment-14484098 ] Vinod Kumar Vavilapalli commented on YARN-3055: --- [~daryn]/[~jianhe], I briefly looked at the existing patch on this JIRA and it seems like it will work. Can you also take a look? [~hitliuyi], can you see if you can add a test for this in TestDelegationTokenRenewer.java? This is the last blocker on 2.7.0 as of today. Appreciate all the help I can get, thanks all. The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer -- Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Priority: Blocker Attachments: YARN-3055.001.patch, YARN-3055.002.patch After YARN-2964, there is only one timer to renew the token if it's shared by jobs. In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)