[jira] [Updated] (YARN-3874) Combine FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3874: --- Issue Type: Sub-task (was: Bug) Parent: YARN-2928 Combine FS Reader and Writer Implementations Key: YARN-3874 URL: https://issues.apache.org/jira/browse/YARN-3874 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Saxena Combine FS Reader and Writer Implementations and make them consistent with each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3874) Combine FS Reader and Writer Implementations
Varun Saxena created YARN-3874: -- Summary: Combine FS Reader and Writer Implementations Key: YARN-3874 URL: https://issues.apache.org/jira/browse/YARN-3874 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Saxena Combine FS Reader and Writer Implementations and make them consistent with each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609769#comment-14609769 ] zhihai xu commented on YARN-3798: - thanks for the new patch [~ozawa]! sync() is asynchronous sync. The result is returned from AsyncCallback. Should we wait for the result from AsyncCallback to make sure the sync operation is done at ZooKeeper server? Should we also {{createConnection}} for SessionMovedException similar as SessionExpiredException to avoid regression? since ZOOKEEPER-2219 is not fixed yet. Should we sync RM ZK root path {{zkRootNodePath}} for safety purposes? ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1433764310492_7152_01 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at
[jira] [Assigned] (YARN-2953) TestWorkPreservingRMRestart fails on trunk
[ https://issues.apache.org/jira/browse/YARN-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-2953: --- Assignee: nijel TestWorkPreservingRMRestart fails on trunk -- Key: YARN-2953 URL: https://issues.apache.org/jira/browse/YARN-2953 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Sharma K S Assignee: nijel Running org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Tests run: 36, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 337.034 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart testReleasedContainerNotRecovered[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart) Time elapsed: 30.031 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:131) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:670) at org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testReleasedContainerNotRecovered(TestWorkPreservingRMRestart.java:850) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3875) FSSchedulerNode#reserveResource() not printing applicationID
Bibin A Chundatt created YARN-3875: -- Summary: FSSchedulerNode#reserveResource() not printing applicationID Key: YARN-3875 URL: https://issues.apache.org/jira/browse/YARN-3875 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor FSSchedulerNode#reserveResource() {code} LOG.info(Updated reserved container + container.getContainer().getId() + on node + this + for application + application); } else { LOG.info(Reserved container + container.getContainer().getId() + on node + this + for application + application); } {code} update to application.getApplicationId() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3874) Combine FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3874: -- Assignee: Varun Saxena Combine FS Reader and Writer Implementations Key: YARN-3874 URL: https://issues.apache.org/jira/browse/YARN-3874 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Saxena Assignee: Varun Saxena Combine FS Reader and Writer Implementations and make them consistent with each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2953) TestWorkPreservingRMRestart fails on trunk
[ https://issues.apache.org/jira/browse/YARN-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609771#comment-14609771 ] nijel commented on YARN-2953: - Hi [~rohithsharma] This test cases is passing in recent code and i see the time out is increased ( @Test (timeout = 5)). This happened on the following check-in {code} Revision: 5f57b904f550515693d93a2959e663b0d0260696 Author: Jian He jia...@apache.org Date: 31-12-2014 05:05:45 Message: YARN-2492. Added node-labels page on RM web UI. Contributed by Wangda Tan {code} Can you please validate this issue ? TestWorkPreservingRMRestart fails on trunk -- Key: YARN-2953 URL: https://issues.apache.org/jira/browse/YARN-2953 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Sharma K S Running org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Tests run: 36, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 337.034 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart testReleasedContainerNotRecovered[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart) Time elapsed: 30.031 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:131) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:670) at org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testReleasedContainerNotRecovered(TestWorkPreservingRMRestart.java:850) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3830: Attachment: YARN-3830_4.patch Thanks [~devaraj.k] for the suggestion Updated patch with test case Please review AbstractYarnScheduler.createReleaseCache may try to clean a null attempt Key: YARN-3830 URL: https://issues.apache.org/jira/browse/YARN-3830 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, YARN-3830_4.patch org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() {code} protected void createReleaseCache() { // Cleanup the cache after nm expire interval. new Timer().schedule(new TimerTask() { @Override public void run() { for (SchedulerApplicationT app : applications.values()) { T attempt = app.getCurrentAppAttempt(); synchronized (attempt) { for (ContainerId containerId : attempt.getPendingRelease()) { RMAuditLogger.logFailure( {code} Here the attempt can be null since the attempt is created later. So null pointer exception will come {code} 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] threw an Exception. | YarnUncaughtExceptionHandler.java:68 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {code} This will skip the other applications in this run. Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3875) FSSchedulerNode#reserveResource() not printing applicationID
[ https://issues.apache.org/jira/browse/YARN-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3875: --- Attachment: 0001-YARN-3875.patch Currently logs are shown as below {code} Reserved container container_e08_1435660809935_0008_01_000670 on node host: host-10-19-92-117:64318 #containers=6 available=memory:0, vCores:10 used=memory:3072, vCores:6 for application org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@1c1a7108 {code} Patch uploaded please review FSSchedulerNode#reserveResource() not printing applicationID Key: YARN-3875 URL: https://issues.apache.org/jira/browse/YARN-3875 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3875.patch FSSchedulerNode#reserveResource() {code} LOG.info(Updated reserved container + container.getContainer().getId() + on node + this + for application + application); } else { LOG.info(Reserved container + container.getContainer().getId() + on node + this + for application + application); } {code} update to application.getApplicationId() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Attachment: YARN-2681.patch Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HADOOP-2681.02.patch, HADOOP-2681.patch, HADOOP-2681.patch, HdfsTrafficControl_UML.png, Traffic Control Design.png, YARN-2681.patch To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Attachment: (was: HADOOP-2681.02.patch) Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HADOOP-2681.patch, HADOOP-2681.patch, HdfsTrafficControl_UML.png, Traffic Control Design.png, YARN-2681.patch To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610773#comment-14610773 ] Varun Vasudev commented on YARN-2194: - I tested it with multiple local dirs as well. Any chance you can attach the yarn-site.xml you used(or send it to me offline)? Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Wei Yan Priority: Critical Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610821#comment-14610821 ] Sidharta Seethana commented on YARN-2194: - [~kasha] , I have run into such issues when I forgot to rebuild container-executor (requires a different maven profile to be used). So, a shot in the dark : did you re-build the container-executor binary? :) Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Wei Yan Priority: Critical Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3827) Migrate YARN native build to new CMake framework
[ https://issues.apache.org/jira/browse/YARN-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610565#comment-14610565 ] Hudson commented on YARN-3827: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #243 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/243/]) YARN-3827. Migrate YARN native build to new CMake framework (Alan Burlison via Colin P. McCabe) (cmccabe: rev d0cc0380b57db5fdeb41775bb9ca42dac65928b8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt Migrate YARN native build to new CMake framework Key: YARN-3827 URL: https://issues.apache.org/jira/browse/YARN-3827 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3827.001.patch As per HADOOP-12036, the CMake infrastructure should be refactored and made common across all Hadoop components. This bug covers the migration of YARN to the new CMake infrastructure. This change will also add support for building YARN Native components on Solaris. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3841) [Storage implementation] Create HDFS backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3841: - Attachment: YARN-3841.001.patch Attaching a first patch to convert the implementation into FileSystem based one. [~sjlee0] [~zjshen] could you take a look? [Storage implementation] Create HDFS backing storage implementation for ATS writes -- Key: YARN-3841 URL: https://issues.apache.org/jira/browse/YARN-3841 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Attachments: YARN-3841.001.patch HDFS backing storage is useful for following scenarios. 1. For Hadoop clusters which don't run HBase. 2. For fallback from HBase when HBase cluster is temporary unavailable. Quoting ATS design document of YARN-2928: {quote} In the case the HBase storage is not available, the plugin should buffer the writes temporarily (e.g. HDFS), and flush them once the storage comes back online. Reading and writing to hdfs as the the backup storage could potentially use the HDFS writer plugin unless the complexity of generalizing the HDFS writer plugin for this purpose exceeds the benefits of reusing it here. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610626#comment-14610626 ] Varun Saxena commented on YARN-3051: Any reason metrics and events in TimelineEntity are stored in a set ? A map will make some operations easier and optimal in case of FS implementation [Storage abstraction] Create backing storage read interface for ATS readers --- Key: YARN-3051 URL: https://issues.apache.org/jira/browse/YARN-3051 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3051-YARN-2928.003.patch, YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch Per design in YARN-2928, create backing storage read interface that can be implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3823) Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property
[ https://issues.apache.org/jira/browse/YARN-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610571#comment-14610571 ] Hudson commented on YARN-3823: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #243 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/243/]) YARN-3823. Fix mismatch in default values for (devaraj: rev 7405c59799ed1b8ad1a7c6f1b18fabf49d0b92b2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property Key: YARN-3823 URL: https://issues.apache.org/jira/browse/YARN-3823 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Fix For: 2.8.0 Attachments: YARN-3823.001.patch, YARN-3823.002.patch In yarn-default.xml, the property is defined as: XML Property: yarn.scheduler.maximum-allocation-vcores XML Value: 32 In YarnConfiguration.java the corresponding member variable is defined as: Config Name: DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES Config Value: 4 The Config value comes from YARN-193 and the default xml property comes from YARN-2. Should we keep it this way or should one of the values get updated? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) ArrayIndexOutOfBoundsException with empty environment variables
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610569#comment-14610569 ] Hudson commented on YARN-3768: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #243 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/243/]) YARN-3768. ArrayIndexOutOfBoundsException with empty environment variables. (Zhihai Xu via gera) (gera: rev 6f2a41e37d0b36cdafcfff75125165f212c612a6) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java ArrayIndexOutOfBoundsException with empty environment variables --- Key: YARN-3768 URL: https://issues.apache.org/jira/browse/YARN-3768 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Joe Ferner Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3768.000.patch, YARN-3768.001.patch, YARN-3768.002.patch, YARN-3768.003.patch, YARN-3768.004.patch Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range exception occurs if an environment variable is encountered without a value. {code} java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.yarn.util.Apps.setEnvFromInputString(Apps.java:80) {code} I believe this occurs because java will not return empty strings from the split method. Similar to this http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Target Version/s: 2.7.2 Fix Version/s: (was: 2.7.2) 2.7.0 Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HADOOP-2681.patch, HADOOP-2681.patch, Traffic Control Design.png To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3877) YarnClientImpl.submitApplication swallows exceptions
[ https://issues.apache.org/jira/browse/YARN-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3877: -- Assignee: Varun Saxena YarnClientImpl.submitApplication swallows exceptions Key: YARN-3877 URL: https://issues.apache.org/jira/browse/YARN-3877 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.7.2 Reporter: Steve Loughran Assignee: Varun Saxena Priority: Minor When {{YarnClientImpl.submitApplication}} spins waiting for the application to be accepted, any interruption during its Sleep() calls are logged and swallowed. this makes it hard to interrupt the thread during shutdown. Really it should throw some form of exception and let the caller deal with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher
[ https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610668#comment-14610668 ] Varun Saxena commented on YARN-3508: [~leftnoteasy], the timed out test {{TestNodeLabelContainerAllocation}} is unrelated and will be handled by YARN-3848. Maybe you can review that JIRA too :) Preemption processing occuring on the main RM dispatcher Key: YARN-3508 URL: https://issues.apache.org/jira/browse/YARN-3508 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3508.002.patch, YARN-3508.01.patch, YARN-3508.03.patch, YARN-3508.04.patch, YARN-3508.05.patch, YARN-3508.06.patch We recently saw the RM for a large cluster lag far behind on the AsyncDispacher event queue. The AsyncDispatcher thread was consistently blocked on the highly-contended CapacityScheduler lock trying to dispatch preemption-related events for RMContainerPreemptEventDispatcher. Preemption processing should occur on the scheduler event dispatcher thread or a separate thread to avoid delaying the processing of other events in the primary dispatcher queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610715#comment-14610715 ] Varun Saxena commented on YARN-3528: [~brahmareddy], looked at your code. Its somewhat repetitive. IIUC, what you are trying to achieve here is first try a passed port and then randomize. You can change the code as under. Disclaimer : Havent tested it but should work. {code} public static int getPort(int port, int retries) throws IOException { Random rand = new Random(); int tryPort = port; int tries = 0; while (true) { if (tries 0) { tryPort = port + rand.nextInt(65535 - port); } LOG.info(Using port + tryPort); try (ServerSocket s = new ServerSocket(tryPort)) { return tryPort; } catch (IOException e) { tries++; if (tries = retries) { LOG.info(Port is already in use; giving up); throw e; } else { LOG.info(Port is already in use; trying again); } } } } {code} Tests with 12345 as hard-coded port break jenkins - Key: YARN-3528 URL: https://issues.apache.org/jira/browse/YARN-3528 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Environment: ASF Jenkins Reporter: Steve Loughran Assignee: Brahma Reddy Battula Priority: Blocker Labels: test Attachments: YARN-3528-002.patch, YARN-3528.patch A lot of the YARN tests have hard-coded the port 12345 for their services to come up on. This makes it impossible to have scheduled or precommit tests to run consistently on the ASF jenkins hosts. Instead the tests fail regularly and appear to get ignored completely. A quick grep of 12345 shows up many places in the test suite where this practise has developed. * All {{BaseContainerManagerTest}} subclasses * {{TestNodeManagerShutdown}} * {{TestContainerManager}} + others This needs to be addressed through portscanning and dynamic port allocation. Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2004: -- Attachment: 0010-YARN-2004.patch Hi [~leftnoteasy] Thank you very much for sharing the comments. I have updated the patch addressing the comments. - applicationComparator is still kept here. I raised a ticket to remove it, once that is done, i will rebase this patch. - {{FairScheduler#getAppWeight}} I understood the idea very much. I feel we can have this later as an improvement once the base version is done. How do u feel? Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch, 0009-YARN-2004.patch, 0010-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Attachment: (was: HDFS-2681.02.patch) Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HADOOP-2681.patch, HADOOP-2681.patch, HDFS-2681.02.patch, HdfsTrafficControl_UML.png, Traffic Control Design.png To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Attachment: HDFS-2681.02.patch Merged with the latest trunk Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HADOOP-2681.patch, HADOOP-2681.patch, HDFS-2681.02.patch, HdfsTrafficControl_UML.png, Traffic Control Design.png To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3848) TestNodeLabelContainerAllocation is timing out
[ https://issues.apache.org/jira/browse/YARN-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610789#comment-14610789 ] Wangda Tan commented on YARN-3848: -- [~varun_saxena], could you take a look at my previous comment? I want to understand if this is the correct fix. Thanks, TestNodeLabelContainerAllocation is timing out -- Key: YARN-3848 URL: https://issues.apache.org/jira/browse/YARN-3848 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3848.01.patch, test_output.txt A number of builds, pre-commit and otherwise, have been failing recently because TestNodeLabelContainerAllocation has timed out. See https://builds.apache.org/job/Hadoop-Yarn-trunk/969/, YARN-3830, YARN-3802, or YARN-3826 for examples. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Attachment: HADOOP-2681.02.patch Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HADOOP-2681.02.patch, HADOOP-2681.patch, HADOOP-2681.patch, HdfsTrafficControl_UML.png, Traffic Control Design.png To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Attachment: (was: HDFS-2681.02.patch) Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HADOOP-2681.02.patch, HADOOP-2681.patch, HADOOP-2681.patch, HdfsTrafficControl_UML.png, Traffic Control Design.png To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Attachment: (was: HADOOP-2681.patch) Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HdfsTrafficControl_UML.png, Traffic Control Design.png, YARN-2681.patch To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Attachment: (was: HADOOP-2681.patch) Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HdfsTrafficControl_UML.png, Traffic Control Design.png, YARN-2681.patch To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher
[ https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610788#comment-14610788 ] Wangda Tan commented on YARN-3508: -- Thanks for update, [~varun_saxena]. Checkstyle is fine to me, the patch generally looks good. [~jianhe]/[~jlowe], could you take a look also? Preemption processing occuring on the main RM dispatcher Key: YARN-3508 URL: https://issues.apache.org/jira/browse/YARN-3508 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3508.002.patch, YARN-3508.01.patch, YARN-3508.03.patch, YARN-3508.04.patch, YARN-3508.05.patch, YARN-3508.06.patch We recently saw the RM for a large cluster lag far behind on the AsyncDispacher event queue. The AsyncDispatcher thread was consistently blocked on the highly-contended CapacityScheduler lock trying to dispatch preemption-related events for RMContainerPreemptEventDispatcher. Preemption processing should occur on the scheduler event dispatcher thread or a separate thread to avoid delaying the processing of other events in the primary dispatcher queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610815#comment-14610815 ] Wangda Tan commented on YARN-3849: -- Thanks [~sunilg], Some comments: Some comments: 1) It seems we don't need useDominantResourceCalculator/rcDefault/rcDominant in TestP..Policy, pass a boolean parameter to buildPolicy should be enough, you can also overload a buildPolicy to avoid too much changes. 2) testPreemptionWithVCoreResource seems not correct, root.used != A.used + b.used 3) TestP..PolicyFroNodePartitions: One comments is wrong: {code} + (1,1:2,n1,x,20,false); + // 80 * x in n1 b\t // app4 in b + (1,1:2,n2,,80,false); // 20 default in n2 {code} It should be 20 * x and 80 default 4) It seems TestP..PolicyFroNodePartitions setting for DRC is missing, could you check? Too much of preemption activity causing continuos killing of containers across queues - Key: YARN-3849 URL: https://issues.apache.org/jira/browse/YARN-3849 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Priority: Critical Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch Two queues are used. Each queue has given a capacity of 0.5. Dominant Resource policy is used. 1. An app is submitted in QueueA which is consuming full cluster capacity 2. After submitting an app in QueueB, there are some demand and invoking preemption in QueueA 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that all containers other than AM is getting killed in QueueA 4. Now the app in QueueB is trying to take over cluster with the current free space. But there are some updated demand from the app in QueueA which lost its containers earlier, and preemption is kicked in QueueB now. Scenario in step 3 and 4 continuously happening in loop. Thus none of the apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609868#comment-14609868 ] Mohammad Shahid Khan commented on YARN-3840: Hi Devaraj K , Please ignore the first patch. in the current patch have taken the nodemanager web ui issue as well. The current patch is using the natural sort algorithm of the natural,js a plugin used for datatable to sort the data. The natural sort plugin https://github.com/DataTables/Plugins/blob/1.10.7/sorting/natural.js is having the MIT license. As per MIT license we can redistribute the code but we have to keep the license header. The haddop patch verification tool does not allow the auther info As of now i have not removed the @author tag from the path file. Please help me to address this issue. Resource Manager web ui issue when sorting application by id (with application having id ) Key: YARN-3840 URL: https://issues.apache.org/jira/browse/YARN-3840 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: LINTE Assignee: Mohammad Shahid Khan Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch On the WEBUI, the global main view page : http://resourcemanager:8088/cluster/apps doesn't display applications over . With command line it works (# yarn application -list). Regards, Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609965#comment-14609965 ] Hadoop QA commented on YARN-3830: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 47s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 45s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 56s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 88m 38s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743024/YARN-3830_4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7405c59 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8403/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8403/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8403/console | This message was automatically generated. AbstractYarnScheduler.createReleaseCache may try to clean a null attempt Key: YARN-3830 URL: https://issues.apache.org/jira/browse/YARN-3830 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: nijel Assignee: nijel Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, YARN-3830_4.patch org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() {code} protected void createReleaseCache() { // Cleanup the cache after nm expire interval. new Timer().schedule(new TimerTask() { @Override public void run() { for (SchedulerApplicationT app : applications.values()) { T attempt = app.getCurrentAppAttempt(); synchronized (attempt) { for (ContainerId containerId : attempt.getPendingRelease()) { RMAuditLogger.logFailure( {code} Here the attempt can be null since the attempt is created later. So null pointer exception will come {code} 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] threw an Exception. | YarnUncaughtExceptionHandler.java:68 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {code} This will skip the other applications in this run. Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3823) Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property
[ https://issues.apache.org/jira/browse/YARN-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609992#comment-14609992 ] Hudson commented on YARN-3823: -- FAILURE: Integrated in Hadoop-Yarn-trunk #975 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/975/]) YARN-3823. Fix mismatch in default values for (devaraj: rev 7405c59799ed1b8ad1a7c6f1b18fabf49d0b92b2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property Key: YARN-3823 URL: https://issues.apache.org/jira/browse/YARN-3823 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Fix For: 2.8.0 Attachments: YARN-3823.001.patch, YARN-3823.002.patch In yarn-default.xml, the property is defined as: XML Property: yarn.scheduler.maximum-allocation-vcores XML Value: 32 In YarnConfiguration.java the corresponding member variable is defined as: Config Name: DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES Config Value: 4 The Config value comes from YARN-193 and the default xml property comes from YARN-2. Should we keep it this way or should one of the values get updated? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) ArrayIndexOutOfBoundsException with empty environment variables
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609990#comment-14609990 ] Hudson commented on YARN-3768: -- FAILURE: Integrated in Hadoop-Yarn-trunk #975 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/975/]) YARN-3768. ArrayIndexOutOfBoundsException with empty environment variables. (Zhihai Xu via gera) (gera: rev 6f2a41e37d0b36cdafcfff75125165f212c612a6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java * hadoop-yarn-project/CHANGES.txt ArrayIndexOutOfBoundsException with empty environment variables --- Key: YARN-3768 URL: https://issues.apache.org/jira/browse/YARN-3768 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Joe Ferner Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3768.000.patch, YARN-3768.001.patch, YARN-3768.002.patch, YARN-3768.003.patch, YARN-3768.004.patch Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range exception occurs if an environment variable is encountered without a value. {code} java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.yarn.util.Apps.setEnvFromInputString(Apps.java:80) {code} I believe this occurs because java will not return empty strings from the split method. Similar to this http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3827) Migrate YARN native build to new CMake framework
[ https://issues.apache.org/jira/browse/YARN-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609985#comment-14609985 ] Hudson commented on YARN-3827: -- FAILURE: Integrated in Hadoop-Yarn-trunk #975 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/975/]) YARN-3827. Migrate YARN native build to new CMake framework (Alan Burlison via Colin P. McCabe) (cmccabe: rev d0cc0380b57db5fdeb41775bb9ca42dac65928b8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt Migrate YARN native build to new CMake framework Key: YARN-3827 URL: https://issues.apache.org/jira/browse/YARN-3827 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3827.001.patch As per HADOOP-12036, the CMake infrastructure should be refactored and made common across all Hadoop components. This bug covers the migration of YARN to the new CMake infrastructure. This change will also add support for building YARN Native components on Solaris. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) ArrayIndexOutOfBoundsException with empty environment variables
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610373#comment-14610373 ] Hudson commented on YARN-3768: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #233 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/233/]) YARN-3768. ArrayIndexOutOfBoundsException with empty environment variables. (Zhihai Xu via gera) (gera: rev 6f2a41e37d0b36cdafcfff75125165f212c612a6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestApps.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java ArrayIndexOutOfBoundsException with empty environment variables --- Key: YARN-3768 URL: https://issues.apache.org/jira/browse/YARN-3768 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Joe Ferner Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3768.000.patch, YARN-3768.001.patch, YARN-3768.002.patch, YARN-3768.003.patch, YARN-3768.004.patch Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range exception occurs if an environment variable is encountered without a value. {code} java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.yarn.util.Apps.setEnvFromInputString(Apps.java:80) {code} I believe this occurs because java will not return empty strings from the split method. Similar to this http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3823) Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property
[ https://issues.apache.org/jira/browse/YARN-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610356#comment-14610356 ] Hudson commented on YARN-3823: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #233 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/233/]) YARN-3823. Fix mismatch in default values for (devaraj: rev 7405c59799ed1b8ad1a7c6f1b18fabf49d0b92b2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property Key: YARN-3823 URL: https://issues.apache.org/jira/browse/YARN-3823 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Fix For: 2.8.0 Attachments: YARN-3823.001.patch, YARN-3823.002.patch In yarn-default.xml, the property is defined as: XML Property: yarn.scheduler.maximum-allocation-vcores XML Value: 32 In YarnConfiguration.java the corresponding member variable is defined as: Config Name: DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES Config Value: 4 The Config value comes from YARN-193 and the default xml property comes from YARN-2. Should we keep it this way or should one of the values get updated? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3695) ServerProxy (NMProxy, etc.) shouldn't retry forever for non network exception.
[ https://issues.apache.org/jira/browse/YARN-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610352#comment-14610352 ] Hudson commented on YARN-3695: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #233 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/233/]) YARN-3695. ServerProxy (NMProxy, etc.) shouldn't retry forever for non network exception. Contributed by Raju Bairishetti (jianhe: rev 62e583c7dcbb30d95d8b32a4978fbdb3b98d67cc) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java ServerProxy (NMProxy, etc.) shouldn't retry forever for non network exception. -- Key: YARN-3695 URL: https://issues.apache.org/jira/browse/YARN-3695 Project: Hadoop YARN Issue Type: Bug Reporter: Junping Du Assignee: Raju Bairishetti Fix For: 2.8.0 Attachments: YARN-3695.01.patch, YARN-3695.patch YARN-3646 fix the retry forever policy in RMProxy that it only applies on limited exceptions rather than all exceptions. Here, we may need the same fix for ServerProxy (NMProxy). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3827) Migrate YARN native build to new CMake framework
[ https://issues.apache.org/jira/browse/YARN-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610350#comment-14610350 ] Hudson commented on YARN-3827: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #233 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/233/]) YARN-3827. Migrate YARN native build to new CMake framework (Alan Burlison via Colin P. McCabe) (cmccabe: rev d0cc0380b57db5fdeb41775bb9ca42dac65928b8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt Migrate YARN native build to new CMake framework Key: YARN-3827 URL: https://issues.apache.org/jira/browse/YARN-3827 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3827.001.patch As per HADOOP-12036, the CMake infrastructure should be refactored and made common across all Hadoop components. This bug covers the migration of YARN to the new CMake infrastructure. This change will also add support for building YARN Native components on Solaris. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610250#comment-14610250 ] Sunil G commented on YARN-3849: --- Test case failures are not related to this patch. TestNodeLabelContainerAllocation is passing locally in trunk. Too much of preemption activity causing continuos killing of containers across queues - Key: YARN-3849 URL: https://issues.apache.org/jira/browse/YARN-3849 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Priority: Critical Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch Two queues are used. Each queue has given a capacity of 0.5. Dominant Resource policy is used. 1. An app is submitted in QueueA which is consuming full cluster capacity 2. After submitting an app in QueueB, there are some demand and invoking preemption in QueueA 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that all containers other than AM is getting killed in QueueA 4. Now the app in QueueB is trying to take over cluster with the current free space. But there are some updated demand from the app in QueueA which lost its containers earlier, and preemption is kicked in QueueB now. Scenario in step 3 and 4 continuously happening in loop. Thus none of the apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) ArrayIndexOutOfBoundsException with empty environment variables
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610331#comment-14610331 ] Hudson commented on YARN-3768: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2172 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2172/]) YARN-3768. ArrayIndexOutOfBoundsException with empty environment variables. (Zhihai Xu via gera) (gera: rev 6f2a41e37d0b36cdafcfff75125165f212c612a6) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestApps.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java ArrayIndexOutOfBoundsException with empty environment variables --- Key: YARN-3768 URL: https://issues.apache.org/jira/browse/YARN-3768 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Joe Ferner Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3768.000.patch, YARN-3768.001.patch, YARN-3768.002.patch, YARN-3768.003.patch, YARN-3768.004.patch Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range exception occurs if an environment variable is encountered without a value. {code} java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.yarn.util.Apps.setEnvFromInputString(Apps.java:80) {code} I believe this occurs because java will not return empty strings from the split method. Similar to this http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-3830: Hadoop Flags: Reviewed +1, latest patch looks good to me, will commit it shortly. AbstractYarnScheduler.createReleaseCache may try to clean a null attempt Key: YARN-3830 URL: https://issues.apache.org/jira/browse/YARN-3830 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: nijel Assignee: nijel Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, YARN-3830_4.patch org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() {code} protected void createReleaseCache() { // Cleanup the cache after nm expire interval. new Timer().schedule(new TimerTask() { @Override public void run() { for (SchedulerApplicationT app : applications.values()) { T attempt = app.getCurrentAppAttempt(); synchronized (attempt) { for (ContainerId containerId : attempt.getPendingRelease()) { RMAuditLogger.logFailure( {code} Here the attempt can be null since the attempt is created later. So null pointer exception will come {code} 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] threw an Exception. | YarnUncaughtExceptionHandler.java:68 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {code} This will skip the other applications in this run. Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3823) Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property
[ https://issues.apache.org/jira/browse/YARN-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610314#comment-14610314 ] Hudson commented on YARN-3823: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2172 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2172/]) YARN-3823. Fix mismatch in default values for (devaraj: rev 7405c59799ed1b8ad1a7c6f1b18fabf49d0b92b2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property Key: YARN-3823 URL: https://issues.apache.org/jira/browse/YARN-3823 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Fix For: 2.8.0 Attachments: YARN-3823.001.patch, YARN-3823.002.patch In yarn-default.xml, the property is defined as: XML Property: yarn.scheduler.maximum-allocation-vcores XML Value: 32 In YarnConfiguration.java the corresponding member variable is defined as: Config Name: DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES Config Value: 4 The Config value comes from YARN-193 and the default xml property comes from YARN-2. Should we keep it this way or should one of the values get updated? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3695) ServerProxy (NMProxy, etc.) shouldn't retry forever for non network exception.
[ https://issues.apache.org/jira/browse/YARN-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610310#comment-14610310 ] Hudson commented on YARN-3695: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2172 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2172/]) YARN-3695. ServerProxy (NMProxy, etc.) shouldn't retry forever for non network exception. Contributed by Raju Bairishetti (jianhe: rev 62e583c7dcbb30d95d8b32a4978fbdb3b98d67cc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java ServerProxy (NMProxy, etc.) shouldn't retry forever for non network exception. -- Key: YARN-3695 URL: https://issues.apache.org/jira/browse/YARN-3695 Project: Hadoop YARN Issue Type: Bug Reporter: Junping Du Assignee: Raju Bairishetti Fix For: 2.8.0 Attachments: YARN-3695.01.patch, YARN-3695.patch YARN-3646 fix the retry forever policy in RMProxy that it only applies on limited exceptions rather than all exceptions. Here, we may need the same fix for ServerProxy (NMProxy). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3827) Migrate YARN native build to new CMake framework
[ https://issues.apache.org/jira/browse/YARN-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610308#comment-14610308 ] Hudson commented on YARN-3827: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2172 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2172/]) YARN-3827. Migrate YARN native build to new CMake framework (Alan Burlison via Colin P. McCabe) (cmccabe: rev d0cc0380b57db5fdeb41775bb9ca42dac65928b8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt Migrate YARN native build to new CMake framework Key: YARN-3827 URL: https://issues.apache.org/jira/browse/YARN-3827 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3827.001.patch As per HADOOP-12036, the CMake infrastructure should be refactored and made common across all Hadoop components. This bug covers the migration of YARN to the new CMake infrastructure. This change will also add support for building YARN Native components on Solaris. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3770) SerializedException should also handle java.lang.Error
[ https://issues.apache.org/jira/browse/YARN-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610320#comment-14610320 ] Hudson commented on YARN-3770: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2172 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2172/]) YARN-3770. SerializedException should also handle java.lang.Error on de-serialization. Contributed by Lavkesh Lahngir (jianhe: rev 4672315e2d6abe1cee0210cf7d3e8ab114ba933c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/SerializedExceptionPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/TestSerializedExceptionPBImpl.java SerializedException should also handle java.lang.Error --- Key: YARN-3770 URL: https://issues.apache.org/jira/browse/YARN-3770 Project: Hadoop YARN Issue Type: Bug Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Fix For: 2.8.0 Attachments: YARN-3770.1.patch, YARN-3770.patch IN SerializedExceptionPBImpl deserialize() method {code} Class classType = null; if (YarnException.class.isAssignableFrom(realClass)) { classType = YarnException.class; } else if (IOException.class.isAssignableFrom(realClass)) { classType = IOException.class; } else if (RuntimeException.class.isAssignableFrom(realClass)) { classType = RuntimeException.class; } else { classType = Exception.class; } return instantiateException(realClass.asSubclass(classType), getMessage(), cause == null ? null : cause.deSerialize()); } {code} if realClass is a subclass of java.lang.Error deSerialize() throws ClassCastException. in the last else statement classType should be equal to Trowable.class instead of Exception.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3770) SerializedException should also handle java.lang.Error
[ https://issues.apache.org/jira/browse/YARN-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610362#comment-14610362 ] Hudson commented on YARN-3770: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #233 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/233/]) YARN-3770. SerializedException should also handle java.lang.Error on de-serialization. Contributed by Lavkesh Lahngir (jianhe: rev 4672315e2d6abe1cee0210cf7d3e8ab114ba933c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/SerializedExceptionPBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/TestSerializedExceptionPBImpl.java SerializedException should also handle java.lang.Error --- Key: YARN-3770 URL: https://issues.apache.org/jira/browse/YARN-3770 Project: Hadoop YARN Issue Type: Bug Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Fix For: 2.8.0 Attachments: YARN-3770.1.patch, YARN-3770.patch IN SerializedExceptionPBImpl deserialize() method {code} Class classType = null; if (YarnException.class.isAssignableFrom(realClass)) { classType = YarnException.class; } else if (IOException.class.isAssignableFrom(realClass)) { classType = IOException.class; } else if (RuntimeException.class.isAssignableFrom(realClass)) { classType = RuntimeException.class; } else { classType = Exception.class; } return instantiateException(realClass.asSubclass(classType), getMessage(), cause == null ? null : cause.deSerialize()); } {code} if realClass is a subclass of java.lang.Error deSerialize() throws ClassCastException. in the last else statement classType should be equal to Trowable.class instead of Exception.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Burlison updated YARN-3844: Attachment: (was: YARN-3844.004.patch) Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Attachments: YARN-3844.001.patch, YARN-3844.002.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Burlison updated YARN-3844: Attachment: YARN-3844.005.patch Updated patch with ILP32/LP64 independent casts printf formats Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.005.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3875) FSSchedulerNode#reserveResource() not printing applicationID
[ https://issues.apache.org/jira/browse/YARN-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610269#comment-14610269 ] Hadoop QA commented on YARN-3875: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 12s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 4s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 47s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 51m 16s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 91m 18s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743057/0002-YARN-3875.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7405c59 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8406/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8406/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8406/console | This message was automatically generated. FSSchedulerNode#reserveResource() not printing applicationID Key: YARN-3875 URL: https://issues.apache.org/jira/browse/YARN-3875 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3875.patch, 0002-YARN-3875.patch FSSchedulerNode#reserveResource() {code} LOG.info(Updated reserved container + container.getContainer().getId() + on node + this + for application + application); } else { LOG.info(Reserved container + container.getContainer().getId() + on node + this + for application + application); } {code} update to application.getApplicationId() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3875) FSSchedulerNode#reserveResource() not printing applicationID
[ https://issues.apache.org/jira/browse/YARN-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-3875: Target Version/s: 2.8.0 FSSchedulerNode#reserveResource() not printing applicationID Key: YARN-3875 URL: https://issues.apache.org/jira/browse/YARN-3875 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3875.patch, 0002-YARN-3875.patch FSSchedulerNode#reserveResource() {code} LOG.info(Updated reserved container + container.getContainer().getId() + on node + this + for application + application); } else { LOG.info(Reserved container + container.getContainer().getId() + on node + this + for application + application); } {code} update to application.getApplicationId() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3875) FSSchedulerNode#reserveResource() not printing applicationID
[ https://issues.apache.org/jira/browse/YARN-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610296#comment-14610296 ] Devaraj K commented on YARN-3875: - The failed test doesn't seem to be related to the patch. +1 for the trivial change. FSSchedulerNode#reserveResource() not printing applicationID Key: YARN-3875 URL: https://issues.apache.org/jira/browse/YARN-3875 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3875.patch, 0002-YARN-3875.patch FSSchedulerNode#reserveResource() {code} LOG.info(Updated reserved container + container.getContainer().getId() + on node + this + for application + application); } else { LOG.info(Reserved container + container.getContainer().getId() + on node + this + for application + application); } {code} update to application.getApplicationId() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3869) Add app name to RM audit log
[ https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609812#comment-14609812 ] nijel commented on YARN-3869: - hi [~roji] i would like to work on this improvement. Please let me know is you already started the work Add app name to RM audit log Key: YARN-3869 URL: https://issues.apache.org/jira/browse/YARN-3869 Project: Hadoop YARN Issue Type: Improvement Reporter: Shay Rojansky Priority: Minor The YARN resource manager audit log currently includes useful info such as APPID, USER, etc. One crucial piece of information missing is the user-supplied application name. Users are familiar with their application name as shown in the YARN UI, etc. It's vital for something like logstash to be able to associated logs with the application name for later searching in something like kibana. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3869) Add app name to RM audit log
[ https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-3869: --- Assignee: nijel Add app name to RM audit log Key: YARN-3869 URL: https://issues.apache.org/jira/browse/YARN-3869 Project: Hadoop YARN Issue Type: Improvement Reporter: Shay Rojansky Assignee: nijel Priority: Minor The YARN resource manager audit log currently includes useful info such as APPID, USER, etc. One crucial piece of information missing is the user-supplied application name. Users are familiar with their application name as shown in the YARN UI, etc. It's vital for something like logstash to be able to associate logs with the application name for later searching in something like kibana. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3846) RM Web UI queue filter is not working
[ https://issues.apache.org/jira/browse/YARN-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609937#comment-14609937 ] Mohammad Shahid Khan commented on YARN-3846: you are right 2.7 is not having the issue. My mistake i have raised in the wrong branch. Affected version should be changed. The issue is there in trunk. The trunk is having the Queue: change. This change only has induced the filter issue and even you only have handled https://issues.apache.org/jira/browse/YARN-3707. But if we have to keep the lbel Queue: the current issue should be fixed in the trunk. RM Web UI queue filter is not working - Key: YARN-3846 URL: https://issues.apache.org/jira/browse/YARN-3846 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan Attachments: scheduler queue issue.png, scheduler queue positive behavior.png Click on root queue will show the complete applications But click on the leaf queue is not filtering the application related to the the clicked queue. The regular expression seems to be wrong {code} q = '^' + q.substr(q.lastIndexOf(':') + 2) + '$';, {code} For example 1. Suppose queue name is b them the above expression will try to substr at index 1 q.lastIndexOf(':') = -1 -1+2= 1 which is wrong. its should look at the 0 index. 2. if queue name is ab.x then it will parse it to .x but it should be x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3869) Add app name to RM audit log
[ https://issues.apache.org/jira/browse/YARN-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Rojansky updated YARN-3869: Description: The YARN resource manager audit log currently includes useful info such as APPID, USER, etc. One crucial piece of information missing is the user-supplied application name. Users are familiar with their application name as shown in the YARN UI, etc. It's vital for something like logstash to be able to associate logs with the application name for later searching in something like kibana. was: The YARN resource manager audit log currently includes useful info such as APPID, USER, etc. One crucial piece of information missing is the user-supplied application name. Users are familiar with their application name as shown in the YARN UI, etc. It's vital for something like logstash to be able to associated logs with the application name for later searching in something like kibana. Add app name to RM audit log Key: YARN-3869 URL: https://issues.apache.org/jira/browse/YARN-3869 Project: Hadoop YARN Issue Type: Improvement Reporter: Shay Rojansky Priority: Minor The YARN resource manager audit log currently includes useful info such as APPID, USER, etc. One crucial piece of information missing is the user-supplied application name. Users are familiar with their application name as shown in the YARN UI, etc. It's vital for something like logstash to be able to associate logs with the application name for later searching in something like kibana. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3827) Migrate YARN native build to new CMake framework
[ https://issues.apache.org/jira/browse/YARN-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610001#comment-14610001 ] Hudson commented on YARN-3827: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #245 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/245/]) YARN-3827. Migrate YARN native build to new CMake framework (Alan Burlison via Colin P. McCabe) (cmccabe: rev d0cc0380b57db5fdeb41775bb9ca42dac65928b8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt Migrate YARN native build to new CMake framework Key: YARN-3827 URL: https://issues.apache.org/jira/browse/YARN-3827 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3827.001.patch As per HADOOP-12036, the CMake infrastructure should be refactored and made common across all Hadoop components. This bug covers the migration of YARN to the new CMake infrastructure. This change will also add support for building YARN Native components on Solaris. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3823) Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property
[ https://issues.apache.org/jira/browse/YARN-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610008#comment-14610008 ] Hudson commented on YARN-3823: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #245 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/245/]) YARN-3823. Fix mismatch in default values for (devaraj: rev 7405c59799ed1b8ad1a7c6f1b18fabf49d0b92b2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property Key: YARN-3823 URL: https://issues.apache.org/jira/browse/YARN-3823 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Fix For: 2.8.0 Attachments: YARN-3823.001.patch, YARN-3823.002.patch In yarn-default.xml, the property is defined as: XML Property: yarn.scheduler.maximum-allocation-vcores XML Value: 32 In YarnConfiguration.java the corresponding member variable is defined as: Config Name: DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES Config Value: 4 The Config value comes from YARN-193 and the default xml property comes from YARN-2. Should we keep it this way or should one of the values get updated? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) ArrayIndexOutOfBoundsException with empty environment variables
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610006#comment-14610006 ] Hudson commented on YARN-3768: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #245 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/245/]) YARN-3768. ArrayIndexOutOfBoundsException with empty environment variables. (Zhihai Xu via gera) (gera: rev 6f2a41e37d0b36cdafcfff75125165f212c612a6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestApps.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java * hadoop-yarn-project/CHANGES.txt ArrayIndexOutOfBoundsException with empty environment variables --- Key: YARN-3768 URL: https://issues.apache.org/jira/browse/YARN-3768 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Joe Ferner Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3768.000.patch, YARN-3768.001.patch, YARN-3768.002.patch, YARN-3768.003.patch, YARN-3768.004.patch Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range exception occurs if an environment variable is encountered without a value. {code} java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.yarn.util.Apps.setEnvFromInputString(Apps.java:80) {code} I believe this occurs because java will not return empty strings from the split method. Similar to this http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan updated YARN-3840: --- Attachment: YARN-3840-2.patch Resource Manager web ui issue when sorting application by id (with application having id ) Key: YARN-3840 URL: https://issues.apache.org/jira/browse/YARN-3840 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: LINTE Assignee: Mohammad Shahid Khan Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch On the WEBUI, the global main view page : http://resourcemanager:8088/cluster/apps doesn't display applications over . With command line it works (# yarn application -list). Regards, Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609883#comment-14609883 ] Varun Vasudev commented on YARN-2194: - +1 for the latest patch. Tested it on my machine and it handles the comma issue. I'll commit it tomorrow if there are no objections. Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Wei Yan Priority: Critical Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609944#comment-14609944 ] Hadoop QA commented on YARN-3840: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 20s | Pre-patch trunk has 3 extant Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | @author | 0m 0s | The patch appears to contain 1 @author tags which the Hadoop community has agreed to not allow in code contributions. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 48s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 17s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 36s | The applied patch generated 1 new checkstyle issues (total was 8, now 9). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 41s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 10s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 6m 4s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 58m 9s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743037/YARN-3840-2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7405c59 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8404/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/8404/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8404/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8404/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8404/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8404/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8404/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8404/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8404/console | This message was automatically generated. Resource Manager web ui issue when sorting application by id (with application having id ) Key: YARN-3840 URL: https://issues.apache.org/jira/browse/YARN-3840 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: LINTE Assignee: Mohammad Shahid Khan Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch On the WEBUI, the global main view page : http://resourcemanager:8088/cluster/apps doesn't display applications over . With command line it works (# yarn application -list). Regards, Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3846) RM Web UI queue filter is not working
[ https://issues.apache.org/jira/browse/YARN-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan updated YARN-3846: --- Affects Version/s: (was: 2.7.0) 2.8.0 3.0.0 RM Web UI queue filter is not working - Key: YARN-3846 URL: https://issues.apache.org/jira/browse/YARN-3846 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.0.0, 2.8.0 Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan Attachments: scheduler queue issue.png, scheduler queue positive behavior.png Click on root queue will show the complete applications But click on the leaf queue is not filtering the application related to the the clicked queue. The regular expression seems to be wrong {code} q = '^' + q.substr(q.lastIndexOf(':') + 2) + '$';, {code} For example 1. Suppose queue name is b them the above expression will try to substr at index 1 q.lastIndexOf(':') = -1 -1+2= 1 which is wrong. its should look at the 0 index. 2. if queue name is ab.x then it will parse it to .x but it should be x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-3830: Target Version/s: 2.8.0 Component/s: scheduler AbstractYarnScheduler.createReleaseCache may try to clean a null attempt Key: YARN-3830 URL: https://issues.apache.org/jira/browse/YARN-3830 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: nijel Assignee: nijel Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, YARN-3830_4.patch org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() {code} protected void createReleaseCache() { // Cleanup the cache after nm expire interval. new Timer().schedule(new TimerTask() { @Override public void run() { for (SchedulerApplicationT app : applications.values()) { T attempt = app.getCurrentAppAttempt(); synchronized (attempt) { for (ContainerId containerId : attempt.getPendingRelease()) { RMAuditLogger.logFailure( {code} Here the attempt can be null since the attempt is created later. So null pointer exception will come {code} 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] threw an Exception. | YarnUncaughtExceptionHandler.java:68 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {code} This will skip the other applications in this run. Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3875) FSSchedulerNode#reserveResource() not printing applicationID
[ https://issues.apache.org/jira/browse/YARN-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609913#comment-14609913 ] Hadoop QA commented on YARN-3875: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 44s | The applied patch generated 2 new checkstyle issues (total was 2, now 4). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 49s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 88m 46s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743029/0001-YARN-3875.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7405c59 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8402/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8402/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8402/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8402/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8402/console | This message was automatically generated. FSSchedulerNode#reserveResource() not printing applicationID Key: YARN-3875 URL: https://issues.apache.org/jira/browse/YARN-3875 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3875.patch FSSchedulerNode#reserveResource() {code} LOG.info(Updated reserved container + container.getContainer().getId() + on node + this + for application + application); } else { LOG.info(Reserved container + container.getContainer().getId() + on node + this + for application + application); } {code} update to application.getApplicationId() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3875) FSSchedulerNode#reserveResource() not printing applicationID
[ https://issues.apache.org/jira/browse/YARN-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3875: --- Attachment: 0002-YARN-3875.patch Update patch after formating FSSchedulerNode#reserveResource() not printing applicationID Key: YARN-3875 URL: https://issues.apache.org/jira/browse/YARN-3875 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3875.patch, 0002-YARN-3875.patch FSSchedulerNode#reserveResource() {code} LOG.info(Updated reserved container + container.getContainer().getId() + on node + this + for application + application); } else { LOG.info(Reserved container + container.getContainer().getId() + on node + this + for application + application); } {code} update to application.getApplicationId() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610034#comment-14610034 ] Varun Saxena commented on YARN-3528: [~brahmareddy], kindly use spaces instead of tabs Tests with 12345 as hard-coded port break jenkins - Key: YARN-3528 URL: https://issues.apache.org/jira/browse/YARN-3528 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Environment: ASF Jenkins Reporter: Steve Loughran Assignee: Brahma Reddy Battula Priority: Blocker Labels: test Attachments: YARN-3528-002.patch, YARN-3528.patch A lot of the YARN tests have hard-coded the port 12345 for their services to come up on. This makes it impossible to have scheduled or precommit tests to run consistently on the ASF jenkins hosts. Instead the tests fail regularly and appear to get ignored completely. A quick grep of 12345 shows up many places in the test suite where this practise has developed. * All {{BaseContainerManagerTest}} subclasses * {{TestNodeManagerShutdown}} * {{TestContainerManager}} + others This needs to be addressed through portscanning and dynamic port allocation. Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3876) get_executable() assumes everything is Linux
Alan Burlison created YARN-3876: --- Summary: get_executable() assumes everything is Linux Key: YARN-3876 URL: https://issues.apache.org/jira/browse/YARN-3876 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.7.0 Reporter: Alan Burlison get_executable() in container-executor.c is non-portable and is hard-coded to assume Linux's /proc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3846) RM Web UI queue filter is not working
[ https://issues.apache.org/jira/browse/YARN-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609939#comment-14609939 ] Mohammad Shahid Khan commented on YARN-3846: also in branch-2 RM Web UI queue filter is not working - Key: YARN-3846 URL: https://issues.apache.org/jira/browse/YARN-3846 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan Attachments: scheduler queue issue.png, scheduler queue positive behavior.png Click on root queue will show the complete applications But click on the leaf queue is not filtering the application related to the the clicked queue. The regular expression seems to be wrong {code} q = '^' + q.substr(q.lastIndexOf(':') + 2) + '$';, {code} For example 1. Suppose queue name is b them the above expression will try to substr at index 1 q.lastIndexOf(':') = -1 -1+2= 1 which is wrong. its should look at the 0 index. 2. if queue name is ab.x then it will parse it to .x but it should be x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3846) RM Web UI queue filter is not working
[ https://issues.apache.org/jira/browse/YARN-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan updated YARN-3846: --- Target Version/s: 3.0.0, 2.8.0 (was: 2.7.2) RM Web UI queue filter is not working - Key: YARN-3846 URL: https://issues.apache.org/jira/browse/YARN-3846 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.0.0, 2.8.0 Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan Attachments: scheduler queue issue.png, scheduler queue positive behavior.png Click on root queue will show the complete applications But click on the leaf queue is not filtering the application related to the the clicked queue. The regular expression seems to be wrong {code} q = '^' + q.substr(q.lastIndexOf(':') + 2) + '$';, {code} For example 1. Suppose queue name is b them the above expression will try to substr at index 1 q.lastIndexOf(':') = -1 -1+2= 1 which is wrong. its should look at the 0 index. 2. if queue name is ab.x then it will parse it to .x but it should be x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3846) RM Web UI queue filter is not working
[ https://issues.apache.org/jira/browse/YARN-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609946#comment-14609946 ] Mohammad Shahid Khan commented on YARN-3846: have changed the affected and target version. RM Web UI queue filter is not working - Key: YARN-3846 URL: https://issues.apache.org/jira/browse/YARN-3846 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.0.0, 2.8.0 Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan Attachments: scheduler queue issue.png, scheduler queue positive behavior.png Click on root queue will show the complete applications But click on the leaf queue is not filtering the application related to the the clicked queue. The regular expression seems to be wrong {code} q = '^' + q.substr(q.lastIndexOf(':') + 2) + '$';, {code} For example 1. Suppose queue name is b them the above expression will try to substr at index 1 q.lastIndexOf(':') = -1 -1+2= 1 which is wrong. its should look at the 0 index. 2. if queue name is ab.x then it will parse it to .x but it should be x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3849: -- Attachment: 0002-YARN-3849.patch Thank you [~leftnoteasy] for the comments. I have uploaded a patch by addressing the comments. Kindly check. Too much of preemption activity causing continuos killing of containers across queues - Key: YARN-3849 URL: https://issues.apache.org/jira/browse/YARN-3849 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Priority: Critical Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch Two queues are used. Each queue has given a capacity of 0.5. Dominant Resource policy is used. 1. An app is submitted in QueueA which is consuming full cluster capacity 2. After submitting an app in QueueB, there are some demand and invoking preemption in QueueA 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that all containers other than AM is getting killed in QueueA 4. Now the app in QueueB is trying to take over cluster with the current free space. But there are some updated demand from the app in QueueA which lost its containers earlier, and preemption is kicked in QueueB now. Scenario in step 3 and 4 continuously happening in loop. Thus none of the apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609947#comment-14609947 ] Devaraj K commented on YARN-3840: - You can probably package the plugin js file which is compatible for the dt-1.9.4 release(currently YARN uses) similar to the one we have already in hadoop-yarn-project\hadoop-yarn\hadoop-yarn-common\src\main\resources\webapps\static\dt-1.9.4\js\jquery.dataTables.min.js.gz. Resource Manager web ui issue when sorting application by id (with application having id ) Key: YARN-3840 URL: https://issues.apache.org/jira/browse/YARN-3840 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: LINTE Assignee: Mohammad Shahid Khan Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch On the WEBUI, the global main view page : http://resourcemanager:8088/cluster/apps doesn't display applications over . With command line it works (# yarn application -list). Regards, Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3793) Several NPEs when deleting local files on NM recovery
[ https://issues.apache.org/jira/browse/YARN-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610401#comment-14610401 ] Jason Lowe commented on YARN-3793: -- +1 lgtm. Will commit later today if no objections. Several NPEs when deleting local files on NM recovery - Key: YARN-3793 URL: https://issues.apache.org/jira/browse/YARN-3793 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Varun Saxena Attachments: YARN-3793.01.patch, YARN-3793.02.patch When NM work-preserving restart is enabled, we see several NPEs on recovery. These seem to correspond to sub-directories that need to be deleted. I wonder if null pointers here mean incorrect tracking of these resources and a potential leak. This JIRA is to investigate and fix anything required. Logs show: {noformat} 2015-05-18 07:06:10,225 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : null 2015-05-18 07:06:10,224 ERROR org.apache.hadoop.yarn.server.nodemanager.DeletionService: Exception during execution of task in DeletionService java.lang.NullPointerException at org.apache.hadoop.fs.FileContext.fixRelativePart(FileContext.java:274) at org.apache.hadoop.fs.FileContext.delete(FileContext.java:755) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.deleteAsUser(DefaultContainerExecutor.java:458) at org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:293) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Attachment: (was: yarn-site.xml.example) Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.2 Attachments: HADOOP-2681.patch, HADOOP-2681.patch, Traffic Control Design.png To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) ArrayIndexOutOfBoundsException with empty environment variables
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610538#comment-14610538 ] Hudson commented on YARN-3768: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2191 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2191/]) YARN-3768. ArrayIndexOutOfBoundsException with empty environment variables. (Zhihai Xu via gera) (gera: rev 6f2a41e37d0b36cdafcfff75125165f212c612a6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestApps.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java ArrayIndexOutOfBoundsException with empty environment variables --- Key: YARN-3768 URL: https://issues.apache.org/jira/browse/YARN-3768 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Joe Ferner Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3768.000.patch, YARN-3768.001.patch, YARN-3768.002.patch, YARN-3768.003.patch, YARN-3768.004.patch Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range exception occurs if an environment variable is encountered without a value. {code} java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.yarn.util.Apps.setEnvFromInputString(Apps.java:80) {code} I believe this occurs because java will not return empty strings from the split method. Similar to this http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3827) Migrate YARN native build to new CMake framework
[ https://issues.apache.org/jira/browse/YARN-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610534#comment-14610534 ] Hudson commented on YARN-3827: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2191 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2191/]) YARN-3827. Migrate YARN native build to new CMake framework (Alan Burlison via Colin P. McCabe) (cmccabe: rev d0cc0380b57db5fdeb41775bb9ca42dac65928b8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt * hadoop-yarn-project/CHANGES.txt Migrate YARN native build to new CMake framework Key: YARN-3827 URL: https://issues.apache.org/jira/browse/YARN-3827 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Reporter: Alan Burlison Assignee: Alan Burlison Fix For: 2.8.0 Attachments: YARN-3827.001.patch As per HADOOP-12036, the CMake infrastructure should be refactored and made common across all Hadoop components. This bug covers the migration of YARN to the new CMake infrastructure. This change will also add support for building YARN Native components on Solaris. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3823) Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property
[ https://issues.apache.org/jira/browse/YARN-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610540#comment-14610540 ] Hudson commented on YARN-3823: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2191 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2191/]) YARN-3823. Fix mismatch in default values for (devaraj: rev 7405c59799ed1b8ad1a7c6f1b18fabf49d0b92b2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property Key: YARN-3823 URL: https://issues.apache.org/jira/browse/YARN-3823 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Fix For: 2.8.0 Attachments: YARN-3823.001.patch, YARN-3823.002.patch In yarn-default.xml, the property is defined as: XML Property: yarn.scheduler.maximum-allocation-vcores XML Value: 32 In YarnConfiguration.java the corresponding member variable is defined as: Config Name: DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES Config Value: 4 The Config value comes from YARN-193 and the default xml property comes from YARN-2. Should we keep it this way or should one of the values get updated? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610394#comment-14610394 ] Hudson commented on YARN-3830: -- SUCCESS: Integrated in Hadoop-trunk-Commit #8105 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8105/]) YARN-3830. AbstractYarnScheduler.createReleaseCache may try to clean a (devaraj: rev 80a68d60560e505b5f8e01969dc3c168a1e5a7f3) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java AbstractYarnScheduler.createReleaseCache may try to clean a null attempt Key: YARN-3830 URL: https://issues.apache.org/jira/browse/YARN-3830 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: nijel Assignee: nijel Fix For: 2.8.0 Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, YARN-3830_4.patch org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() {code} protected void createReleaseCache() { // Cleanup the cache after nm expire interval. new Timer().schedule(new TimerTask() { @Override public void run() { for (SchedulerApplicationT app : applications.values()) { T attempt = app.getCurrentAppAttempt(); synchronized (attempt) { for (ContainerId containerId : attempt.getPendingRelease()) { RMAuditLogger.logFailure( {code} Here the attempt can be null since the attempt is created later. So null pointer exception will come {code} 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] threw an Exception. | YarnUncaughtExceptionHandler.java:68 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {code} This will skip the other applications in this run. Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610475#comment-14610475 ] Wei Yan commented on YARN-2194: --- Thanks, [~vvasudev]. Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Wei Yan Priority: Critical Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610546#comment-14610546 ] Hadoop QA commented on YARN-3844: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 13s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | yarn tests | 6m 20s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 21m 34s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743073/YARN-3844.005.patch | | Optional Tests | javac unit | | git revision | trunk / 80a68d6 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8407/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8407/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8407/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8407/console | This message was automatically generated. Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.005.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Fix Version/s: 2.7.2 Component/s: (was: capacityscheduler) (was: resourcemanager) Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.2 Attachments: HADOOP-2681.patch, HADOOP-2681.patch, Traffic Control Design.png, yarn-site.xml.example To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan updated YARN-3840: --- Attachment: YARN-3840-3.patch Resource Manager web ui issue when sorting application by id (with application having id ) Key: YARN-3840 URL: https://issues.apache.org/jira/browse/YARN-3840 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: LINTE Assignee: Mohammad Shahid Khan Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, YARN-3840-3.patch On the WEBUI, the global main view page : http://resourcemanager:8088/cluster/apps doesn't display applications over . With command line it works (# yarn application -list). Regards, Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610825#comment-14610825 ] Sidharta Seethana commented on YARN-2140: - Hi [~dheeren] , we only address network bandwidth resource isolation in the design doc that is attached, not isolating the network stack itself. I recommend taking a look at YARN-3611 for new docker related functionality and please file a JIRA with requirements that you have. Add support for network IO isolation/scheduling for containers -- Key: YARN-2140 URL: https://issues.apache.org/jira/browse/YARN-2140 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Sidharta Seethana Attachments: NetworkAsAResourceDesign.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-313: - Attachment: YARN-313-v4.patch Updating v3 to latest trunk Add Admin API for supporting node resource configuration in command line Key: YARN-313 URL: https://issues.apache.org/jira/browse/YARN-313 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Junping Du Assignee: Junping Du Priority: Critical Labels: BB2015-05-TBR Attachments: YARN-313-sample.patch, YARN-313-v1.patch, YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch We should provide some admin interface, e.g. yarn rmadmin -refreshResources to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611034#comment-14611034 ] Jian He commented on YARN-2004: --- - authenticateApplicationPriority : IIUC, all it does is just to take the config from yarn-site.xml (not capacity-scheduler.xml) and check the priority against that. I don't see much need of explicitly exposing an API in scheduler and inject the check there. Or this method has more responsibility than that ? - Given that YARN-2003 is just the API of YARN-2004 and we anyways have to review the two altogether, we may merge the two into a single patch ? This is easier for review and you also do not need to split the patch and upload in two different places. And you can actually split the part about updating application priority at runtime and state store changes into a different patch. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch, 0006-YARN-2004.patch, 0007-YARN-2004.patch, 0008-YARN-2004.patch, 0009-YARN-2004.patch, 0010-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611059#comment-14611059 ] Hadoop QA commented on YARN-2681: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 24m 25s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 12 new or modified test files. | | {color:green}+1{color} | javac | 10m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 13m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 32s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 4m 2s | The applied patch generated 1 new checkstyle issues (total was 221, now 221). | | {color:green}+1{color} | whitespace | 0m 33s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 2m 11s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 13m 54s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 11m 32s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | mapreduce tests | 2m 39s | Tests passed in hadoop-mapreduce-client-core. | | {color:green}+1{color} | yarn tests | 0m 34s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 32s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 6m 5s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 60m 12s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 155m 25s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | | | hadoop.yarn.server.nodemanager.TestEventFlow | | | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | | hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor | | | hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch | | | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy | | | hadoop.yarn.server.resourcemanager.TestAppManager | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743129/YARN-2681.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / eac1d18 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8411/artifact/patchprocess/trunkFindbugsWarningshadoop-mapreduce-client-app.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8411/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-YARN-Build/8411/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-mapreduce-client-core test log | https://builds.apache.org/job/PreCommit-YARN-Build/8411/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8411/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8411/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8411/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8411/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8411/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8411/console | This message was automatically generated. Support bandwidth enforcement for containers while reading
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611046#comment-14611046 ] Hadoop QA commented on YARN-313: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 40s | Findbugs (version 3.0.0) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:red}-1{color} | javac | 2m 20s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743143/YARN-313-v4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b5cdf78 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8413/console | This message was automatically generated. Add Admin API for supporting node resource configuration in command line Key: YARN-313 URL: https://issues.apache.org/jira/browse/YARN-313 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Junping Du Assignee: Junping Du Priority: Critical Labels: BB2015-05-TBR Attachments: YARN-313-sample.patch, YARN-313-v1.patch, YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch We should provide some admin interface, e.g. yarn rmadmin -refreshResources to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3830) AbstractYarnScheduler.createReleaseCache may try to clean a null attempt
[ https://issues.apache.org/jira/browse/YARN-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610576#comment-14610576 ] Hudson commented on YARN-3830: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #243 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/243/]) YARN-3830. AbstractYarnScheduler.createReleaseCache may try to clean a (devaraj: rev 80a68d60560e505b5f8e01969dc3c168a1e5a7f3) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java AbstractYarnScheduler.createReleaseCache may try to clean a null attempt Key: YARN-3830 URL: https://issues.apache.org/jira/browse/YARN-3830 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: nijel Assignee: nijel Fix For: 2.8.0 Attachments: YARN-3830_1.patch, YARN-3830_2.patch, YARN-3830_3.patch, YARN-3830_4.patch org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.createReleaseCache() {code} protected void createReleaseCache() { // Cleanup the cache after nm expire interval. new Timer().schedule(new TimerTask() { @Override public void run() { for (SchedulerApplicationT app : applications.values()) { T attempt = app.getCurrentAppAttempt(); synchronized (attempt) { for (ContainerId containerId : attempt.getPendingRelease()) { RMAuditLogger.logFailure( {code} Here the attempt can be null since the attempt is created later. So null pointer exception will come {code} 2015-06-19 09:29:16,195 | ERROR | Timer-3 | Thread Thread[Timer-3,5,main] threw an Exception. | YarnUncaughtExceptionHandler.java:68 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler$1.run(AbstractYarnScheduler.java:457) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {code} This will skip the other applications in this run. Can add a null check and continue with other applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3841) [Storage implementation] Create HDFS backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610614#comment-14610614 ] Hadoop QA commented on YARN-3841: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743115/YARN-3841.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2ac87df | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8408/console | This message was automatically generated. [Storage implementation] Create HDFS backing storage implementation for ATS writes -- Key: YARN-3841 URL: https://issues.apache.org/jira/browse/YARN-3841 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Attachments: YARN-3841.001.patch HDFS backing storage is useful for following scenarios. 1. For Hadoop clusters which don't run HBase. 2. For fallback from HBase when HBase cluster is temporary unavailable. Quoting ATS design document of YARN-2928: {quote} In the case the HBase storage is not available, the plugin should buffer the writes temporarily (e.g. HDFS), and flush them once the storage comes back online. Reading and writing to hdfs as the the backup storage could potentially use the HDFS writer plugin unless the complexity of generalizing the HDFS writer plugin for this purpose exceeds the benefits of reusing it here. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610644#comment-14610644 ] cntic commented on YARN-2681: - As the side doesn't allow attaching pdf files, the development guide of this feature can be found in the following link: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HADOOP-2681.patch, HADOOP-2681.patch, HDFS-2681.02.patch, HdfsTrafficControl_UML.png, Traffic Control Design.png To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610643#comment-14610643 ] cntic commented on YARN-2681: - As the side doesn't allow attaching pdf files, the development guide of this feature can be found in the following link: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HADOOP-2681.patch, HADOOP-2681.patch, HDFS-2681.02.patch, HdfsTrafficControl_UML.png, Traffic Control Design.png To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3823) Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property
[ https://issues.apache.org/jira/browse/YARN-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610674#comment-14610674 ] Ray Chiang commented on YARN-3823: -- Thanks for the review and commit! Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property Key: YARN-3823 URL: https://issues.apache.org/jira/browse/YARN-3823 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Fix For: 2.8.0 Attachments: YARN-3823.001.patch, YARN-3823.002.patch In yarn-default.xml, the property is defined as: XML Property: yarn.scheduler.maximum-allocation-vcores XML Value: 32 In YarnConfiguration.java the corresponding member variable is defined as: Config Name: DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES Config Value: 4 The Config value comes from YARN-193 and the default xml property comes from YARN-2. Should we keep it this way or should one of the values get updated? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3877) YarnClientImpl.submitApplication swallows exceptions
Steve Loughran created YARN-3877: Summary: YarnClientImpl.submitApplication swallows exceptions Key: YARN-3877 URL: https://issues.apache.org/jira/browse/YARN-3877 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.7.2 Reporter: Steve Loughran Priority: Minor When {{YarnClientImpl.submitApplication}} spins waiting for the application to be accepted, any interruption during its Sleep() calls are logged and swallowed. this makes it hard to interrupt the thread during shutdown. Really it should throw some form of exception and let the caller deal with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610649#comment-14610649 ] Dheeren Beborrtha commented on YARN-2140: - How do you support port level isolation for Docker containers? For example, lets say I would like to run multiple docker containers on the same Datanode. If each of the conatiners needs to be long running and need to advertise their ports, what is the mechanism for doing so? Add support for network IO isolation/scheduling for containers -- Key: YARN-2140 URL: https://issues.apache.org/jira/browse/YARN-2140 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Sidharta Seethana Attachments: NetworkAsAResourceDesign.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610725#comment-14610725 ] Karthik Kambatla commented on YARN-2194: I tried the latest patch, and still run into the same issue (logs below). Did anyone try the patch with multiple local directories? {noformat} 15/07/01 10:51:32 INFO mapreduce.Job: Job job_1435771879097_0003 failed with state FAILED due to: Application application_1435771879097_0003 failed 2 times due to AM Container for appattempt_1435771879097_0003_02 exited with exitCode: -1000 For more detailed output, check application tracking page:http://krhel7-1.vpc.cloudera.com:8088/proxy/application_1435771879097_0003/Then, click on links to logs of each attempt. Diagnostics: Application application_1435771879097_0003 initialization failed (exitCode=20) with output: main : command provided 0 main : user is nobody main : requested yarn user is systest Failed to create directory /data/yarn/nm%/data1/yarn/nm/usercache/systest - No such file or directory Failing this attempt. Failing the application. {noformat} Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Wei Yan Priority: Critical Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Attachment: HdfsTrafficControl_UML.png HDFS-2681.02.patch This patch delivers full features of YARN-2681 Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HADOOP-2681.patch, HADOOP-2681.patch, HDFS-2681.02.patch, HdfsTrafficControl_UML.png, Traffic Control Design.png To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS
[ https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cntic updated YARN-2681: Description: To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf was: To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Support bandwidth enforcement for containers while reading from HDFS Key: YARN-2681 URL: https://issues.apache.org/jira/browse/YARN-2681 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.5.1 Environment: Linux Reporter: cntic Labels: BB2015-05-TBR Fix For: 2.7.0 Attachments: HADOOP-2681.patch, HADOOP-2681.patch, HDFS-2681.02.patch, HdfsTrafficControl_UML.png, Traffic Control Design.png To read/write data from HDFS on data node, applications establise TCP/IP connections with the datanode. The HDFS read can be controled by setting Linux Traffic Control (TC) subsystem on the data node to make filters on appropriate connections. The current cgroups net_cls concept can not be applied on the node where the container is launched, netheir on data node since: - TC hanldes outgoing bandwidth only, so it can be set on container node (HDFS read = incoming data for the container) - Since HDFS data node is handled by only one process, it is not possible to use net_cls to separate connections from different containers to the datanode. Tasks: 1) Extend Resource model to define bandwidth enforcement rate 2) Monitor TCP/IP connection estabilised by container handling process and its child processes 3) Set Linux Traffic Control rules on data node base on address:port pairs in order to enforce bandwidth of outgoing data Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf Implementation: http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)