[jira] [Updated] (YARN-2963) Helper library that allows requesting containers from multiple queues
[ https://issues.apache.org/jira/browse/YARN-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2963: --- Attachment: yarn-2963-preview.patch Here is a preview of what I have in mind. Appreciate any early feedback. I ll post another patch with tests and any API simplification. Helper library that allows requesting containers from multiple queues - Key: YARN-2963 URL: https://issues.apache.org/jira/browse/YARN-2963 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2963-preview.patch As proposed on the mailing list (yarn-dev), it would be nice to have a way for YARN applications to request containers from multiple queues. e.g. Oozie might want to run a single AM for all user jobs and request one container per launcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3071) Remove invalid char from sample conf in doc of FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283741#comment-14283741 ] Hudson commented on YARN-3071: -- FAILURE: Integrated in Hadoop-Yarn-trunk #813 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/813/]) YARN-3071. Remove invalid char from sample conf in doc of FairScheduler. (Contributed by Masatake Iwasaki) (aajisaka: rev 4a5c3a4cfee6b8008a722801821e64850582a985) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm Remove invalid char from sample conf in doc of FairScheduler Key: YARN-3071 URL: https://issues.apache.org/jira/browse/YARN-3071 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.5.0, 2.6.0 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3071.001.patch, YARN-3071.002.patch copying and pasting conf causes failure on RM startup {code} Caused by: org.xml.sax.SAXParseException; systemId: file:/home/iwasakims/dist/hadoop-2.6.0/etc/hadoop/fair-scheduler.xml; lineNumber: 18; columnNumber: 5; The content of elements must consist of well-formed character data or markup. at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:250) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1275) ... 9 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3015) yarn classpath command should support same options as hadoop classpath.
[ https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283742#comment-14283742 ] Hudson commented on YARN-3015: -- FAILURE: Integrated in Hadoop-Yarn-trunk #813 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/813/]) YARN-3015. yarn classpath command should support same options as hadoop classpath. Contributed by Varun Saxena. (cnauroth: rev cb0a15d20180c7ca3799e63a2d53aa8dee800abd) * hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/CHANGES.txt yarn classpath command should support same options as hadoop classpath. --- Key: YARN-3015 URL: https://issues.apache.org/jira/browse/YARN-3015 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Chris Nauroth Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-3015-branch-2.patch, YARN-3015.002.patch, YARN-3015.003.patch, YARN-3015.004.patch, YARN-3015.005.patch HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional expansion of the wildcards and bundling the classpath into a jar file containing a manifest with the Class-Path attribute. The other classpath commands should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283740#comment-14283740 ] Hudson commented on YARN-2933: -- FAILURE: Integrated in Hadoop-Yarn-trunk #813 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/813/]) YARN-2933. Capacity Scheduler preemption policy should only consider capacity without labels temporarily. Contributed by Mayank Bansal (wangda: rev 0a2d3e717d9c42090a32ff177991a222a1e34132) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Fix For: 2.7.0 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch, YARN-2933-9.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3071) Remove invalid char from sample conf in doc of FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283687#comment-14283687 ] Hudson commented on YARN-3071: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #79 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/79/]) YARN-3071. Remove invalid char from sample conf in doc of FairScheduler. (Contributed by Masatake Iwasaki) (aajisaka: rev 4a5c3a4cfee6b8008a722801821e64850582a985) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm Remove invalid char from sample conf in doc of FairScheduler Key: YARN-3071 URL: https://issues.apache.org/jira/browse/YARN-3071 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.5.0, 2.6.0 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3071.001.patch, YARN-3071.002.patch copying and pasting conf causes failure on RM startup {code} Caused by: org.xml.sax.SAXParseException; systemId: file:/home/iwasakims/dist/hadoop-2.6.0/etc/hadoop/fair-scheduler.xml; lineNumber: 18; columnNumber: 5; The content of elements must consist of well-formed character data or markup. at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:250) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1275) ... 9 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283686#comment-14283686 ] Hudson commented on YARN-2933: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #79 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/79/]) YARN-2933. Capacity Scheduler preemption policy should only consider capacity without labels temporarily. Contributed by Mayank Bansal (wangda: rev 0a2d3e717d9c42090a32ff177991a222a1e34132) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Fix For: 2.7.0 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch, YARN-2933-9.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3015) yarn classpath command should support same options as hadoop classpath.
[ https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283688#comment-14283688 ] Hudson commented on YARN-3015: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #79 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/79/]) YARN-3015. yarn classpath command should support same options as hadoop classpath. Contributed by Varun Saxena. (cnauroth: rev cb0a15d20180c7ca3799e63a2d53aa8dee800abd) * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * hadoop-yarn-project/CHANGES.txt yarn classpath command should support same options as hadoop classpath. --- Key: YARN-3015 URL: https://issues.apache.org/jira/browse/YARN-3015 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Chris Nauroth Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-3015-branch-2.patch, YARN-3015.002.patch, YARN-3015.003.patch, YARN-3015.004.patch, YARN-3015.005.patch HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional expansion of the wildcards and bundling the classpath into a jar file containing a manifest with the Class-Path attribute. The other classpath commands should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283734#comment-14283734 ] Steve Loughran commented on YARN-1039: -- I've always envisaged the flag could switch on some different policies, though with container-preservation across restarts, labels, log aggregation and windows for failure tracking, much of that is dealt with. Otherwise, the longevity flag could be of use in # RM UI. There's no percentage done any more, more live/not-live. This already causes confusion for our slider users. # placement: do you want 100% of a node capacity to be for long-lived stuff, at the expense of being able to run anything short-lived there? # pre-emption. The cost of pre-emption may be higher, but at the same time long-lived containers are the ones you may want to pre-empt the most, because the scheduler knows they won't go away any time soon. The easy target is the UI, as that doesn't need scheduling changes, and the current percentage done view doesn't work. Something to indicate live/not live makes more sense (though not red/green unless you don't want colour blind people using your app) Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3003: --- Attachment: (was: YARN-3003.001.patch) Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3003: --- Attachment: (was: YARN-3003.001.patch) Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3003: --- Attachment: YARN-3003.001.patch Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3003: --- Attachment: YARN-3003.001.patch Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283878#comment-14283878 ] Hudson commented on YARN-2933: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2011 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2011/]) YARN-2933. Capacity Scheduler preemption policy should only consider capacity without labels temporarily. Contributed by Mayank Bansal (wangda: rev 0a2d3e717d9c42090a32ff177991a222a1e34132) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Fix For: 2.7.0 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch, YARN-2933-9.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3015) yarn classpath command should support same options as hadoop classpath.
[ https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283880#comment-14283880 ] Hudson commented on YARN-3015: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2011 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2011/]) YARN-3015. yarn classpath command should support same options as hadoop classpath. Contributed by Varun Saxena. (cnauroth: rev cb0a15d20180c7ca3799e63a2d53aa8dee800abd) * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * hadoop-yarn-project/CHANGES.txt yarn classpath command should support same options as hadoop classpath. --- Key: YARN-3015 URL: https://issues.apache.org/jira/browse/YARN-3015 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Chris Nauroth Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-3015-branch-2.patch, YARN-3015.002.patch, YARN-3015.003.patch, YARN-3015.004.patch, YARN-3015.005.patch HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional expansion of the wildcards and bundling the classpath into a jar file containing a manifest with the Class-Path attribute. The other classpath commands should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3071) Remove invalid char from sample conf in doc of FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283879#comment-14283879 ] Hudson commented on YARN-3071: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2011 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2011/]) YARN-3071. Remove invalid char from sample conf in doc of FairScheduler. (Contributed by Masatake Iwasaki) (aajisaka: rev 4a5c3a4cfee6b8008a722801821e64850582a985) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm Remove invalid char from sample conf in doc of FairScheduler Key: YARN-3071 URL: https://issues.apache.org/jira/browse/YARN-3071 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.5.0, 2.6.0 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3071.001.patch, YARN-3071.002.patch copying and pasting conf causes failure on RM startup {code} Caused by: org.xml.sax.SAXParseException; systemId: file:/home/iwasakims/dist/hadoop-2.6.0/etc/hadoop/fair-scheduler.xml; lineNumber: 18; columnNumber: 5; The content of elements must consist of well-formed character data or markup. at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:250) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1275) ... 9 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3071) Remove invalid char from sample conf in doc of FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283866#comment-14283866 ] Hudson commented on YARN-3071: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #76 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/76/]) YARN-3071. Remove invalid char from sample conf in doc of FairScheduler. (Contributed by Masatake Iwasaki) (aajisaka: rev 4a5c3a4cfee6b8008a722801821e64850582a985) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm Remove invalid char from sample conf in doc of FairScheduler Key: YARN-3071 URL: https://issues.apache.org/jira/browse/YARN-3071 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.5.0, 2.6.0 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3071.001.patch, YARN-3071.002.patch copying and pasting conf causes failure on RM startup {code} Caused by: org.xml.sax.SAXParseException; systemId: file:/home/iwasakims/dist/hadoop-2.6.0/etc/hadoop/fair-scheduler.xml; lineNumber: 18; columnNumber: 5; The content of elements must consist of well-formed character data or markup. at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:250) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1275) ... 9 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3015) yarn classpath command should support same options as hadoop classpath.
[ https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283867#comment-14283867 ] Hudson commented on YARN-3015: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #76 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/76/]) YARN-3015. yarn classpath command should support same options as hadoop classpath. Contributed by Varun Saxena. (cnauroth: rev cb0a15d20180c7ca3799e63a2d53aa8dee800abd) * hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/bin/yarn yarn classpath command should support same options as hadoop classpath. --- Key: YARN-3015 URL: https://issues.apache.org/jira/browse/YARN-3015 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Chris Nauroth Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-3015-branch-2.patch, YARN-3015.002.patch, YARN-3015.003.patch, YARN-3015.004.patch, YARN-3015.005.patch HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional expansion of the wildcards and bundling the classpath into a jar file containing a manifest with the Class-Path attribute. The other classpath commands should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283865#comment-14283865 ] Hudson commented on YARN-2933: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #76 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/76/]) YARN-2933. Capacity Scheduler preemption policy should only consider capacity without labels temporarily. Contributed by Mayank Bansal (wangda: rev 0a2d3e717d9c42090a32ff177991a222a1e34132) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/CHANGES.txt Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Fix For: 2.7.0 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch, YARN-2933-9.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283926#comment-14283926 ] Hudson commented on YARN-2933: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2030 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2030/]) YARN-2933. Capacity Scheduler preemption policy should only consider capacity without labels temporarily. Contributed by Mayank Bansal (wangda: rev 0a2d3e717d9c42090a32ff177991a222a1e34132) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/CHANGES.txt Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Fix For: 2.7.0 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch, YARN-2933-9.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3015) yarn classpath command should support same options as hadoop classpath.
[ https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283927#comment-14283927 ] Hudson commented on YARN-3015: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2030 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2030/]) YARN-3015. yarn classpath command should support same options as hadoop classpath. Contributed by Varun Saxena. (cnauroth: rev cb0a15d20180c7ca3799e63a2d53aa8dee800abd) * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * hadoop-yarn-project/CHANGES.txt yarn classpath command should support same options as hadoop classpath. --- Key: YARN-3015 URL: https://issues.apache.org/jira/browse/YARN-3015 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Chris Nauroth Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-3015-branch-2.patch, YARN-3015.002.patch, YARN-3015.003.patch, YARN-3015.004.patch, YARN-3015.005.patch HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional expansion of the wildcards and bundling the classpath into a jar file containing a manifest with the Class-Path attribute. The other classpath commands should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3015) yarn classpath command should support same options as hadoop classpath.
[ https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283983#comment-14283983 ] Varun Saxena commented on YARN-3015: Thanks [~cnauroth] for the review and commit yarn classpath command should support same options as hadoop classpath. --- Key: YARN-3015 URL: https://issues.apache.org/jira/browse/YARN-3015 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Chris Nauroth Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-3015-branch-2.patch, YARN-3015.002.patch, YARN-3015.003.patch, YARN-3015.004.patch, YARN-3015.005.patch HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional expansion of the wildcards and bundling the classpath into a jar file containing a manifest with the Class-Path attribute. The other classpath commands should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283903#comment-14283903 ] Hudson commented on YARN-2933: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #80 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/80/]) YARN-2933. Capacity Scheduler preemption policy should only consider capacity without labels temporarily. Contributed by Mayank Bansal (wangda: rev 0a2d3e717d9c42090a32ff177991a222a1e34132) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/CHANGES.txt Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Fix For: 2.7.0 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch, YARN-2933-9.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3003: --- Attachment: (was: YARN-3003.001.patch) Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3003: --- Attachment: YARN-3003.001.patch Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284196#comment-14284196 ] Hadoop QA commented on YARN-3003: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12693309/YARN-3003.001.patch against trunk revision c94c0d2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.TestNetworkedJob org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.nodelabels.TestRMNodeLabelsManager org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6365//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6365//console This message is automatically generated. Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284205#comment-14284205 ] Ted Yu commented on YARN-3003: -- For messgae LabelsToNodeIdProto, should it be named LabelsToNodeIdsProto since nodeId field is repeated ? Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284209#comment-14284209 ] Zhijie Shen commented on YARN-3030: --- bq. If that is not feasible, I'd say run test-patch.sh by hand to ensure basic issues are caught +1. Sounds a good idea. I took a look at the patch, and had some thoughts: 1. The aggregator may have two responsibilities (perhaps except RM aggregator): 1. collecting the timeline data from the application, and 2. putting it into a scalable storage. I can see BaseAggregatorService is the abstraction for the latter piece. However, for the former piece we don't have such kind of abstraction, but the implementation embedded in the per-node aggregator. It's fine now, but once we move on with per-app aggregator, we need to copy paste the same collecting logic. IMHO, we should have an abstraction for collecting the timeline data from app too, and make a REST based implementation now. In the future, we can even replace it with a RPC based implementation. Per-node and per-app aggregators are assembled with both collecting and aggregating services, while RM aggregator only consists of aggregating service because it pulls its internal data only. 2. For per-node aggregator, we make want to implement {{AuxiliaryService}} interface, such that it can be installed to NM as the auxiliary service. The interface provides similar life cycle we want to use, such as init app and stop app. set up ATS writer with basic request serving structure and lifecycle Key: YARN-3030 URL: https://issues.apache.org/jira/browse/YARN-3030 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3030.001.patch Per design in YARN-2928, create an ATS writer as a service, and implement the basic service structure including the lifecycle management. Also, as part of this JIRA, we should come up with the ATS client API for sending requests to this ATS writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3003: --- Attachment: YARN-3003.001.patch Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3015) yarn classpath command should support same options as hadoop classpath.
[ https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283904#comment-14283904 ] Hudson commented on YARN-3015: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #80 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/80/]) YARN-3015. yarn classpath command should support same options as hadoop classpath. Contributed by Varun Saxena. (cnauroth: rev cb0a15d20180c7ca3799e63a2d53aa8dee800abd) * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd yarn classpath command should support same options as hadoop classpath. --- Key: YARN-3015 URL: https://issues.apache.org/jira/browse/YARN-3015 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Chris Nauroth Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-3015-branch-2.patch, YARN-3015.002.patch, YARN-3015.003.patch, YARN-3015.004.patch, YARN-3015.005.patch HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional expansion of the wildcards and bundling the classpath into a jar file containing a manifest with the Class-Path attribute. The other classpath commands should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284104#comment-14284104 ] Carlo Curino commented on YARN-1039: I am happy the conversation is re-ignited. As I was mentioning in [above | https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14048345page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14048345], the long-lived tag is a coarse grained version of the notion of duration we added to the ReservationRequest (which tracks very closely ResourceRequest) in YARN-1051. The idea is that the AM could provide an estimate of the task duration, enabling (beyond what Steve already listed above) optimistic scheduling decisions like the one in YARN-2877 very short tasks (we run several experiments and the potential for increased utilization is substantial). Given a duration parameter, expressing long-lived can be done by setting duration to a large value (or MAX_INT, or -1 or whatever convention). Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284099#comment-14284099 ] Wei Yan commented on YARN-1021: --- Thanks for the reply, [~kasha]. [~fcolzada], sorry for the late reply. I just checked hadoop-2.6.0 using the sample-conf and sample-data, it works fine. The first exception is because the web module not loaded yet. Just need to wait 2-3 seconds after you start the simulator. This exception would not affect the simulator running. For the second one, could u send you config and workload files to me? I can look into it. [~kasha], could u help to review YARN-1393, which provides a quick-start tutorial. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284130#comment-14284130 ] Varun Saxena commented on YARN-3047: [~sjlee0], wanted to know whether we will have a single instance of ATS Reader which the clients will connect to for querying. And this instance can in turn launch multiple threads to query storage in parallel. Or a global instance which distributes requests to multiple instances of ATS readers ? set up ATS reader with basic request serving structure and lifecycle Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284325#comment-14284325 ] Craig Welch commented on YARN-1039: --- Another thought - if we do need this kind of flag, I think we should detach the notion from duration or long life as such - I think it's more about service vs batch - where a service's duration is not necessarily related to any preset notion of a work item it will start, work on, and complete - it will be started to handle work which is given to it, of unknown quantity ( potentially many different items) and stopped when no longer needed - it's not so much about the duration as the lifecycle (a batch operation may have a longer runtime than a service, for example). So, I'd suggest dropping the temporal flavor and going with service vs batch, or something along those lines. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk
[ https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3074: -- Assignee: Varun Saxena Nodemanager dies when localizer runner tries to write to a full disk Key: YARN-3074 URL: https://issues.apache.org/jira/browse/YARN-3074 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena When a LocalizerRunner tries to write to a full disk it can bring down the nodemanager process. Instead of failing the whole process we should fail only the container and make a best attempt to keep going. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2731) Fixed RegisterApplicationMasterResponsePBImpl to properly invoke maybeInitBuilder
[ https://issues.apache.org/jira/browse/YARN-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284579#comment-14284579 ] Hudson commented on YARN-2731: -- FAILURE: Integrated in Hadoop-trunk-Commit #6895 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6895/]) YARN-2731. Fixed RegisterApplicationMasterResponsePBImpl to properly invoke maybeInitBuilder. (Contributed by Carlo Curino) (wangda: rev f250ad1773b19713d6aea81ae290ebb4c90fd44b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/RegisterApplicationMasterResponsePBImpl.java * hadoop-yarn-project/CHANGES.txt Fixed RegisterApplicationMasterResponsePBImpl to properly invoke maybeInitBuilder - Key: YARN-2731 URL: https://issues.apache.org/jira/browse/YARN-2731 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.7.0 Attachments: YARN-2731.patch If I am not mistaken in RegisterApplicationMasterResponsePBImpl we are missing to initialize the builder in setNMTokensFromPreviousAttempts(), and we initialize the builder in the wrong place in: setClientToAMTokenMasterKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support
[ https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284431#comment-14284431 ] Eric Payne commented on YARN-2896: -- Thanks, [~sunilg], for working on this feature and posting this patch to support PB framework. I have a general question about why job priorities need labels. Why can't they just be number based? It seems like extra work to label them, pass the labels, and then interpret them. {{PriorityLabelsPerQueue.java}}: {code} public String toString() { return Max priority label: + this.getMaxPriorityLabel() + , + Default priority label: + this.getDefaultPriorityLabel(); } {code} This is just a nit, but in {{PriorityLabelsPerQueue#toString}}, the space should be after the comma. Currently, this will output something like this: {code} Max priority label: foo ,Default priority label: bar {code} {code} public int compareTo(PriorityLabelsPerQueue priorityLabelsPerQueue) { int defltLabelCompare = this.getDefaultPriorityLabel().compareTo( priorityLabelsPerQueue.getDefaultPriorityLabel()); if (defltLabelCompare == 0) { return this.getMaxPriorityLabel().compareTo( priorityLabelsPerQueue.getMaxPriorityLabel()); } else { return defltLabelCompare; } } {code} {{PriorityLabelsPerQueue#compareTo}} should probably check for NULL for {{priorityLabelsPerQueue}}, {{this.getDefaultPriorityLabel()}}, and {{this.getMaxPriorityLabel()}}. If {{priorityLabelsPerQueue}} is NULL, {{this.getDefaultPriorityLabel()}} returns NULL, or {{this.getMaxPriorityLabel()}} returns NULL, {{compareTo}} will throw NPE. {{yarn_protos.proto}}: {code} message ApplicationPriorityProto { optional string app_priority = 1; } {code} Where is this referenced? Server side PB changes for Priority Label Manager and Admin CLI support --- Key: YARN-2896 URL: https://issues.apache.org/jira/browse/YARN-2896 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 0003-YARN-2896.patch, 0004-YARN-2896.patch Common changes: * PB support changes required for Admin APIs * PB support for File System store (Priority Label Store) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3020) n similar addContainerRequest()s produce n*(n+1)/2 containers
[ https://issues.apache.org/jira/browse/YARN-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284451#comment-14284451 ] Wangda Tan commented on YARN-3020: -- [~peterdkirchner], The expected usage of AMRMClient is (Thanks for input from [~hitesh] and [~jianhe]): When you received newly allocated containers from RM, you should manually call {{removeContainerRequest}} to remove pending container requests. AMRMClient itself will not automatically deduct #pendingContainerRequests. The reason is, when a container allocated from RM, AMRMClient doesn't know the container allocated from which ResourceRequest. You may think container has priority, capacity and resourceName, so that AMRMClient can get ResourceRequest via {{getMatchingRequests}}. But it is possible some applications may use the container for other propose (AMRMClient cannot understand application's specific logic). So AM should call {{removeContainerRequest}} itself. To improve this, I think 1) we need add this behavior to YARN doc -- people should better understand how to use AMRMClient. And 2) maybe we should add a default implementation to deduct pending resource requests by prioirty/resource-name/capacity of allocated containers automatically (User can disable this default behavior, implement their own logic to deduct pending resource requests.) Does this make sense to you? Thanks, Wangda n similar addContainerRequest()s produce n*(n+1)/2 containers - Key: YARN-3020 URL: https://issues.apache.org/jira/browse/YARN-3020 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2 Reporter: Peter D Kirchner Original Estimate: 24h Remaining Estimate: 24h BUG: If the application master calls addContainerRequest() n times, but with the same priority, I get up to 1+2+3+...+n containers = n*(n+1)/2 . The most containers are requested when the interval between calls to addContainerRequest() exceeds the heartbeat interval of calls to allocate() (in AMRMClientImpl's run() method). If the application master calls addContainerRequest() n times, but with a unique priority each time, I get n containers (as I intended). Analysis: There is a logic problem in AMRMClientImpl.java. Although AMRMClientImpl.java, allocate() does an ask.clear() , on subsequent calls to addContainerRequest(), addResourceRequest() finds the previous matching remoteRequest and increments the container count rather than starting anew, and does an addResourceRequestToAsk() which defeats the ask.clear(). From documentation and code comments, it was hard for me to discern the intended behavior of the API, but the inconsistency reported in this issue suggests one case or the other is implemented incorrectly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2731) RegisterApplicationMasterResponsePBImpl: not properly initialized builder
[ https://issues.apache.org/jira/browse/YARN-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284491#comment-14284491 ] Wangda Tan commented on YARN-2731: -- Good finding, [~curino]! LGTM, will commit once Jenkins get back. RegisterApplicationMasterResponsePBImpl: not properly initialized builder - Key: YARN-2731 URL: https://issues.apache.org/jira/browse/YARN-2731 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.7.0 Attachments: YARN-2731.patch If I am not mistaken in RegisterApplicationMasterResponsePBImpl we are missing to initialize the builder in setNMTokensFromPreviousAttempts(), and we initialize the builder in the wrong place in: setClientToAMTokenMasterKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3076) YarnClient implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3076: --- Summary: YarnClient implementation to retrieve label to node mapping (was: YarnClient related changes to retrieve label to node mapping) YarnClient implementation to retrieve label to node mapping --- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3076) Client side implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284599#comment-14284599 ] Wangda Tan commented on YARN-3076: -- Converted this and YARN-3075 to sub task of YARN-2492. And suggest to rename their titles: They're not simply client/server, should be YarnClient and NodeLabelsManager, YarnClient itself contains client and server (ClientRMService). Client side implementation to retrieve label to node mapping Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284446#comment-14284446 ] Jian Fang commented on YARN-1039: - The duration concept comes with a good intention, but what I really am afraid of is that it could introduce a huge complex to YARN if it is not designed properly. First, there are so many moving parts under the hook for the estimation, for example, the time of a 30 node cluster may be significantly different from the one of a 300 node cluster. Getting into the measurement and estimation business is very much like walking into benchmark comparison business, which is very hard in reality. Secondly, the duration probably relies on hadoop customers to provide a proper value for it if YARN is not smart enough to derive the value by itself, which could be impractical for many customers. Remember that many hadoop users are not even developers. Many of them rely on high level components such as pig and hive to run hadoop jobs. They probably don't know or care about the estimation. As a result, at least, the duration should only be an enhancement if the value is provided. YARN should still work properly without such a value. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3076) YarnClient related changes to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3076: --- Summary: YarnClient related changes to retrieve label to node mapping (was: Client side implementation to retrieve label to node mapping) YarnClient related changes to retrieve label to node mapping Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2466) Umbrella issue for Yarn launched Docker Containers
[ https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284637#comment-14284637 ] Max commented on YARN-2466: --- It would be helpful but it should be switchable. It should be possible to activate for debugin. Turning it off will increase security for production systems. Umbrella issue for Yarn launched Docker Containers -- Key: YARN-2466 URL: https://issues.apache.org/jira/browse/YARN-2466 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Abin Shahab Assignee: Abin Shahab Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to package their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). In addition to software isolation mentioned above, Docker containers will provide resource, network, and user-namespace isolation. Docker provides resource isolation through cgroups, similar to LinuxContainerExecutor. This prevents one job from taking other jobs resource(memory and CPU) on the same hadoop cluster. User-namespace isolation will ensure that the root on the container is mapped an unprivileged user on the host. This is currently being added to Docker. Network isolation will ensure that one user’s network traffic is completely isolated from another user’s network traffic. Last but not the least, the interaction of Docker and Kerberos will have to be worked out. These Docker containers must work in a secure hadoop environment. Additional details are here: https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk
[ https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284587#comment-14284587 ] Chris Douglas commented on YARN-3074: - bq. catch FSError since it will be a common and recoverable error in this case. +1 Nodemanager dies when localizer runner tries to write to a full disk Key: YARN-3074 URL: https://issues.apache.org/jira/browse/YARN-3074 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena When a LocalizerRunner tries to write to a full disk it can bring down the nodemanager process. Instead of failing the whole process we should fail only the container and make a best attempt to keep going. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2466) Umbrella issue for Yarn launched Docker Containers
[ https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284598#comment-14284598 ] Chen He commented on YARN-2466: --- Hi [~ashahab], do we need to add a configuration parameter that can enable the container in interactive mode. Such as: yarn.docker.interactive. Then, user can attach to the running container just for debugging concern. Umbrella issue for Yarn launched Docker Containers -- Key: YARN-2466 URL: https://issues.apache.org/jira/browse/YARN-2466 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Abin Shahab Assignee: Abin Shahab Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to package their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). In addition to software isolation mentioned above, Docker containers will provide resource, network, and user-namespace isolation. Docker provides resource isolation through cgroups, similar to LinuxContainerExecutor. This prevents one job from taking other jobs resource(memory and CPU) on the same hadoop cluster. User-namespace isolation will ensure that the root on the container is mapped an unprivileged user on the host. This is currently being added to Docker. Network isolation will ensure that one user’s network traffic is completely isolated from another user’s network traffic. Last but not the least, the interaction of Docker and Kerberos will have to be worked out. These Docker containers must work in a secure hadoop environment. Additional details are here: https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2466) Umbrella issue for Yarn launched Docker Containers
[ https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284898#comment-14284898 ] Chen He commented on YARN-2466: --- Yes, that is what I mean. Thank you for clarifying it, [~mikhmv]. Umbrella issue for Yarn launched Docker Containers -- Key: YARN-2466 URL: https://issues.apache.org/jira/browse/YARN-2466 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Abin Shahab Assignee: Abin Shahab Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to package their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). In addition to software isolation mentioned above, Docker containers will provide resource, network, and user-namespace isolation. Docker provides resource isolation through cgroups, similar to LinuxContainerExecutor. This prevents one job from taking other jobs resource(memory and CPU) on the same hadoop cluster. User-namespace isolation will ensure that the root on the container is mapped an unprivileged user on the host. This is currently being added to Docker. Network isolation will ensure that one user’s network traffic is completely isolated from another user’s network traffic. Last but not the least, the interaction of Docker and Kerberos will have to be worked out. These Docker containers must work in a secure hadoop environment. Additional details are here: https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284920#comment-14284920 ] Konstantinos Karanasos commented on YARN-1039: -- Let me add my thoughts regarding whether we should allow duration to be reported instead of just a boolean switch for short tasks. I am actively involved on adding distributed scheduling capabilities ([YARN-2877]). We have performed an extensive experimental evaluation that has shown significant performance improvements in terms of throughput and latency, especially when short tasks are concerned. In that scenario, having the ability to specify the duration of the task is crucial (for deciding what type of container to use [[YARN-2882]], for estimating the waiting time in the NMs [[YARN-2886]], etc.). I understand the concerns that have been raised about how to properly provide the right task duration. However, this can be done either based on historical information (previous waves of this task type or previous execution of the same job) or on application level knowledge. We are already experimenting with ways of how to deal with imprecise task durations. That said, I definitely agree with [~john.jian.fang] that the user should not *have to* provide any task duration (i.e., the system should work properly in case no durations are provided), but on the other hand, in case she does, we should be able to take advantage of it. Moreover, as [~curino] pointed out, if the API exposes an integer instead of a boolean, we can simulate the boolean switch (e.g., by setting the value to MAX_INT for long tasks), but if we simply use a boolean, we would have to change the API in the future to support duration. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2731) RegisterApplicationMasterResponsePBImpl: not properly initialized builder
[ https://issues.apache.org/jira/browse/YARN-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284553#comment-14284553 ] Hadoop QA commented on YARN-2731: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676492/YARN-2731.patch against trunk revision dd0228b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6368//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6368//console This message is automatically generated. RegisterApplicationMasterResponsePBImpl: not properly initialized builder - Key: YARN-2731 URL: https://issues.apache.org/jira/browse/YARN-2731 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.7.0 Attachments: YARN-2731.patch If I am not mistaken in RegisterApplicationMasterResponsePBImpl we are missing to initialize the builder in setNMTokensFromPreviousAttempts(), and we initialize the builder in the wrong place in: setClientToAMTokenMasterKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2731) Fixed RegisterApplicationMasterResponsePBImpl to properly invoke maybeInitBuilder
[ https://issues.apache.org/jira/browse/YARN-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284575#comment-14284575 ] Wangda Tan commented on YARN-2731: -- Committed to trunk and branch-2, thanks Carlo! Fixed RegisterApplicationMasterResponsePBImpl to properly invoke maybeInitBuilder - Key: YARN-2731 URL: https://issues.apache.org/jira/browse/YARN-2731 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.7.0 Attachments: YARN-2731.patch If I am not mistaken in RegisterApplicationMasterResponsePBImpl we are missing to initialize the builder in setNMTokensFromPreviousAttempts(), and we initialize the builder in the wrong place in: setClientToAMTokenMasterKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2731) Fixed RegisterApplicationMasterResponsePBImpl to properly invoke maybeInitBuilder
[ https://issues.apache.org/jira/browse/YARN-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2731: - Summary: Fixed RegisterApplicationMasterResponsePBImpl to properly invoke maybeInitBuilder (was: RegisterApplicationMasterResponsePBImpl: not properly initialized builder) Fixed RegisterApplicationMasterResponsePBImpl to properly invoke maybeInitBuilder - Key: YARN-2731 URL: https://issues.apache.org/jira/browse/YARN-2731 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.7.0 Attachments: YARN-2731.patch If I am not mistaken in RegisterApplicationMasterResponsePBImpl we are missing to initialize the builder in setNMTokensFromPreviousAttempts(), and we initialize the builder in the wrong place in: setClientToAMTokenMasterKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3076) Client side implementation to retrieve label to node mapping
Varun Saxena created YARN-3076: -- Summary: Client side implementation to retrieve label to node mapping Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3030: -- Attachment: YARN-3030.002.patch Posted patch v.2. Basically switched PerNodeAggregator to AuxiliaryService. Ran the test-patch script: {color:green}+1 overall{color}. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version ) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. set up ATS writer with basic request serving structure and lifecycle Key: YARN-3030 URL: https://issues.apache.org/jira/browse/YARN-3030 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3030.001.patch, YARN-3030.002.patch Per design in YARN-2928, create an ATS writer as a service, and implement the basic service structure including the lifecycle management. Also, as part of this JIRA, we should come up with the ATS client API for sending requests to this ATS writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285164#comment-14285164 ] Varun Saxena commented on YARN-3077: Sure...go ahead. I have unassigned it RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3077: --- Assignee: (was: Varun Saxena) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3078) LogCLIHelpers lacks of a blank space before string 'does not exist'
sam liu created YARN-3078: - Summary: LogCLIHelpers lacks of a blank space before string 'does not exist' Key: YARN-3078 URL: https://issues.apache.org/jira/browse/YARN-3078 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: trunk-win Reporter: sam liu Priority: Minor LogCLIHelpers lacks of a blank space before string 'does not exist' and it will bring incorrect return message. For example, I ran command 'yarn logs -applicationId application_1421742816585_0003', and the return message includes 'logs/application_1421742816585_0003does not exist'. Obviously it's incorrect and the correct return message should be 'logs/application_1421742816585_0003 does not exist' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285174#comment-14285174 ] Chun Chen commented on YARN-3077: - Thanks. :) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3037) create HBase cluster backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285004#comment-14285004 ] Sangjin Lee commented on YARN-3037: --- Agreed. We should definitely work together on this. I think this depends on the object model (YARN-3041) and since the new object model will be different than the existing one (esp. regarding flows, etc.), I suspect the implementation would be quite different. Nonetheless, we should be able to use learnings from the previous work that's done. We should have more discussions on this as you said. create HBase cluster backing storage implementation for ATS writes -- Key: YARN-3037 URL: https://issues.apache.org/jira/browse/YARN-3037 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Per design in YARN-2928, create a backing storage implementation for ATS writes based on a full HBase cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285161#comment-14285161 ] Chun Chen commented on YARN-3077: - [~varun_saxena] I would like to do it myself. RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Assignee: Varun Saxena If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3078) LogCLIHelpers lacks of a blank space before string 'does not exist'
[ https://issues.apache.org/jira/browse/YARN-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam liu updated YARN-3078: -- Attachment: YARN-3078.001.patch This fix will resolve the issue. Could any expert help please review the patch and assign this jira to me? Thanks! LogCLIHelpers lacks of a blank space before string 'does not exist' --- Key: YARN-3078 URL: https://issues.apache.org/jira/browse/YARN-3078 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: trunk-win Reporter: sam liu Priority: Minor Attachments: YARN-3078.001.patch LogCLIHelpers lacks of a blank space before string 'does not exist' and it will bring incorrect return message. For example, I ran command 'yarn logs -applicationId application_1421742816585_0003', and the return message includes 'logs/application_1421742816585_0003does not exist'. Obviously it's incorrect and the correct return message should be 'logs/application_1421742816585_0003 does not exist' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3078) LogCLIHelpers lacks of a blank space before string 'does not exist'
[ https://issues.apache.org/jira/browse/YARN-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285160#comment-14285160 ] Hadoop QA commented on YARN-3078: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12693495/YARN-3078.001.patch against trunk revision 73b72a0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6369//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6369//console This message is automatically generated. LogCLIHelpers lacks of a blank space before string 'does not exist' --- Key: YARN-3078 URL: https://issues.apache.org/jira/browse/YARN-3078 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: trunk-win Reporter: sam liu Priority: Minor Attachments: YARN-3078.001.patch LogCLIHelpers lacks of a blank space before string 'does not exist' and it will bring incorrect return message. For example, I ran command 'yarn logs -applicationId application_1421742816585_0003', and the return message includes 'logs/application_1421742816585_0003does not exist'. Obviously it's incorrect and the correct return message should be 'logs/application_1421742816585_0003 does not exist' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
Chun Chen created YARN-3077: --- Summary: RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-2868: - Attachment: YARN-2868.006.patch Implement changes based on feedback. Add metric for initial container launch time Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3075) Server side implementation to retrieve label to node mapping
Varun Saxena created YARN-3075: -- Summary: Server side implementation to retrieve label to node mapping Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3075) Server side implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3075: - Issue Type: Sub-task (was: Task) Parent: YARN-2492 Server side implementation to retrieve label to node mapping Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3076) Client side implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3076: - Issue Type: Sub-task (was: Task) Parent: YARN-2492 Client side implementation to retrieve label to node mapping Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3028) Better syntax for replace label CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3028: - Attachment: 0001-YARN-3028.patch Better syntax for replace label CLI --- Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized
[ https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285287#comment-14285287 ] Xuan Gong commented on YARN-3024: - Yes, I will take a look shortly. LocalizerRunner should give DIE action when all resources are localized --- Key: YARN-3024 URL: https://issues.apache.org/jira/browse/YARN-3024 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3024.01.patch, YARN-3024.02.patch, YARN-3024.03.patch We have observed that {{LocalizerRunner}} always gives a LIVE action at the end of localization process. The problem is {{findNextResource()}} can return null even when {{pending}} was not empty prior to the call. This method removes localized resources from {{pending}}, therefore we should check the return value, and gives DIE action when it returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3028) Better syntax for replace label CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285256#comment-14285256 ] Hadoop QA commented on YARN-3028: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12693515/0001-YARN-3028.patch against trunk revision 6b17eb9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.api.impl.TestYarnClient Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6370//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6370//console This message is automatically generated. Better syntax for replace label CLI --- Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chen updated YARN-3077: Attachment: YARN-3077.patch RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Attachments: YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3057) Need update apps' runnability when reloading allocation files for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3057: --- Attachment: YARN-3057.patch Need update apps' runnability when reloading allocation files for FairScheduler --- Key: YARN-3057 URL: https://issues.apache.org/jira/browse/YARN-3057 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3057.patch If we submit a app and the number of running app in its corresponding leaf queue has reached its max limit, the app will be put into 'nonRunnableApps'. And its runnabiltiy will only be updated when removing a appattempt(FairScheduler will call `updateRunnabilityOnAppRemoval` at that time). Suppose there are only service apps running, they will not finish, so the submitted app will not be scheduled even we change leaf queue's max limit. I think we need update apps' runnability when reloading allocation files for FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support
[ https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285090#comment-14285090 ] Sunil G commented on YARN-2896: --- Thank you [~eepayne] and [~wangda] for the comments The idea of keeping Application Priority as a string is for better handling and for easiness from user perspective. Internally RM will have a corresponding integer mapping, and only that will be used by Schedulers. Hence as wangda mentioned, it can be operated just like an integer with user priority etc. A rough idea is like, user is submitting a job with priority as “High” and scheduler will be treating as an integer namely “3”. Priority Label Manager will act as an interface to User and Scheduler and can give the priority as string or integer accordingly. Now coming to the advantages, admin can operate on names or labels for priority, it will be easier. Also it can be displayed in UI very easily. Also admin can config the priority label as per his norms along by defining corresponding integer mapping associated with each label. For eg: {noformat} yarn.cluster.priority-labels=low:1,medium:3,high:5 {noformat} Configuring ACLs based on a priority label name will be more easier. {noformat} yarn.scheduler.capacity.root.queueA.High.acl=user1,user2 {noformat} Please share your thoughts. I will address the other comments from Eric and will update a patch. Server side PB changes for Priority Label Manager and Admin CLI support --- Key: YARN-2896 URL: https://issues.apache.org/jira/browse/YARN-2896 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 0003-YARN-2896.patch, 0004-YARN-2896.patch Common changes: * PB support changes required for Admin APIs * PB support for File System store (Priority Label Store) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3077: -- Assignee: Varun Saxena RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Assignee: Varun Saxena If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3028) Better syntax for replace label CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285213#comment-14285213 ] Rohith commented on YARN-3028: -- Attached the patch for supporting both of them, kindly review Better syntax for replace label CLI --- Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized
[ https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285270#comment-14285270 ] Chengbing Liu commented on YARN-3024: - [~xgong] Could you please review the changed logic described in my last comment? This patch will save at least one second for each localization process. LocalizerRunner should give DIE action when all resources are localized --- Key: YARN-3024 URL: https://issues.apache.org/jira/browse/YARN-3024 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3024.01.patch, YARN-3024.02.patch, YARN-3024.03.patch We have observed that {{LocalizerRunner}} always gives a LIVE action at the end of localization process. The problem is {{findNextResource()}} can return null even when {{pending}} was not empty prior to the call. This method removes localized resources from {{pending}}, therefore we should check the return value, and gives DIE action when it returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285306#comment-14285306 ] Hadoop QA commented on YARN-2868: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12693520/YARN-2868.006.patch against trunk revision 6b17eb9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6371//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6371//console This message is automatically generated. Add metric for initial container launch time Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk
Jason Lowe created YARN-3074: Summary: Nodemanager dies when localizer runner tries to write to a full disk Key: YARN-3074 URL: https://issues.apache.org/jira/browse/YARN-3074 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe When a LocalizerRunner tries to write to a full disk it can bring down the nodemanager process. Instead of failing the whole process we should fail only the container and make a best attempt to keep going. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk
[ https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284243#comment-14284243 ] Jason Lowe commented on YARN-3074: -- Sample stacktrace: {noformat} 2015-01-16 12:06:56,399 [LocalizerRunner for container_1416815736267_3849544_01_000817] FATAL yarn.YarnUncaughtExceptionHandler: Thread Thread[LocalizerRunner for container_1416815736267_3849544_01_000817,5,main] threw an Error. Shutting down now... org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.FilterOutputStream.close(FilterOutputStream.java:157) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.close(ChecksumFs.java:366) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1125) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1068) Caused by: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:318) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224) ... 10 more {noformat} Looks like the Hadoop filesystem layer helpfully changed what was originally an IOException into an FSError. FSError is _not_ an Exception, so the try...catch(Exception) block in LocalizerRunner.run doesn't catch it. It then bubbles up to the top of the thread, and the uncaught exception handler kills the whole process. We should consider catching Throwable rather than Exception in LocalizerRunner.run, or at least also catch FSError since it will be a common and recoverable error in this case. Nodemanager dies when localizer runner tries to write to a full disk Key: YARN-3074 URL: https://issues.apache.org/jira/browse/YARN-3074 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe When a LocalizerRunner tries to write to a full disk it can bring down the nodemanager process. Instead of failing the whole process we should fail only the container and make a best attempt to keep going. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284290#comment-14284290 ] Wangda Tan commented on YARN-1039: -- For task placement for long-lived request, YARN-796 could take care of deciding which instance should run for a specific long-lived request. User can either manually specify label they want for such long-lived containers, or add some rules in scheduler side to configure and add labels automatically to such long-lived requests. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3003: --- Attachment: YARN-3003.002.patch Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch, YARN-3003.002.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284224#comment-14284224 ] Wangda Tan commented on YARN-3003: -- Thanks for providing your thoughts! To Naga's first point: I think performance is one concern (as mentioned by Varun, we may need to rewrite some part of code to make it's efficient). And we need more solid use case / input for that. For first step, we can focus on simply receive a set of labels as input and return a labels-nodes mapping. To second point, I think it's important to keep it in memory, not much extra memory needed to keep such mapping in memory. And thanks for [~varun_saxena] working on this, one suggestion is, could you split it to two parts, one is adding getLabelsToNodes API and implementation to NodeLabelsManager. Another one is adding API and implementation to YarnClient. (latter one will depends on previous one). Sounds good? Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284231#comment-14284231 ] Varun Saxena commented on YARN-3003: [~leftnoteasy], you mean break this JIRA into 2 parts ? Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284232#comment-14284232 ] Jian Fang commented on YARN-1039: - Thanks Steve for your clarification. Seems the long lived concept makes sense now if this flag is associated with policy switch in YARN. I think the above is only one part of the story. Cluster infrastructure itself probably is another part that we need to consider. Just like the spot instance feature in EC2 as mentioned in this JIRA. The long lived concept should have more impacts on hadoop clusters in a cloud environment. For example instance type could affect the container scheduling. We should also take this concept into consideration for some elastic features such as graceful expansion and shrink of a cluster in cloud. On the other side, I still think YARN-796 should be used together with the long lived concept. For example, how would resource manager know which instance should run a long lived daemon/task? There should be a mapping between the long lived concept and the tags/labels provided by instance. Right? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284288#comment-14284288 ] Wangda Tan commented on YARN-1039: -- I agree with Carlo for this point. Duration can include long-lived or short-lived. It may hard to estimate the exact time of a container running, but a rough estimate can help scheduler make better decision and provide corresponding information to user which mentioned by Steve. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3020) n similar addContainerRequest()s produce n*(n+1)/2 containers
[ https://issues.apache.org/jira/browse/YARN-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284301#comment-14284301 ] Peter D Kirchner commented on YARN-3020: https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ywskycn : Please take a look at this snippet modifying distributedShell, and the output, and perhaps you will get my point. Observe that the accounting behind what gets sent to the RM on heartbeats following either addContainerRequest() or removeContainerRequest() is defective. 100 containers are assigned as the result of this code that ostensibly requests only 10: 10 adds, with interleaved heartbeats, followed by 10 removes that with interleaved heartbeats should be no-ops. 55 containers result from the adds (1+2+3+4+5+6+7+8+9+10). 45 additional containers are requested as the result of the 10 calls to remove (9+8+7+6+5+4+3+2+1). for (int i=0; i20; i++){ try { ContainerRequest containerAsk = setupContainerAskForRM(); if(i10) { amRMClient.addContainerRequest(containerAsk); } else { amRMClient.removeContainerRequest(containerAsk); } Thread.sleep(1500); List list1 = amRMClient.getMatchingRequests(containerAsk.getPriority(), *, containerAsk.getCapability()); LinkedHashSet set1 = (java.util.LinkedHashSet)(list1.get(0)); System.out.println(i=+i+ outstanding=+set1.size()); } catch (InterruptedException e1) { e1.printStackTrace(); } } DEBUG [Thread-7] (AMRMClientImpl.java:585) - addResourceRequest: applicationId= priority=0 resourceName=* numContainers=1 #asks=1 i=0 outstanding=1 DEBUG [Thread-7] (AMRMClientImpl.java:585) - addResourceRequest: applicationId= priority=0 resourceName=* numContainers=2 #asks=1 i=1 outstanding=2 DEBUG [Thread-7] (AMRMClientImpl.java:585) - addResourceRequest: applicationId= priority=0 resourceName=* numContainers=3 #asks=1 i=2 outstanding=3 DEBUG [Thread-7] (AMRMClientImpl.java:585) - addResourceRequest: applicationId= priority=0 resourceName=* numContainers=4 #asks=1 i=3 outstanding=4 DEBUG [Thread-7] (AMRMClientImpl.java:585) - addResourceRequest: applicationId= priority=0 resourceName=* numContainers=5 #asks=1 i=4 outstanding=5 DEBUG [Thread-7] (AMRMClientImpl.java:585) - addResourceRequest: applicationId= priority=0 resourceName=* numContainers=6 #asks=1 i=5 outstanding=6 DEBUG [Thread-7] (AMRMClientImpl.java:585) - addResourceRequest: applicationId= priority=0 resourceName=* numContainers=7 #asks=1 i=6 outstanding=7 DEBUG [Thread-7] (AMRMClientImpl.java:585) - addResourceRequest: applicationId= priority=0 resourceName=* numContainers=8 #asks=1 i=7 outstanding=8 DEBUG [Thread-7] (AMRMClientImpl.java:585) - addResourceRequest: applicationId= priority=0 resourceName=* numContainers=9 #asks=1 i=8 outstanding=9 DEBUG [Thread-7] (AMRMClientImpl.java:585) - addResourceRequest: applicationId= priority=0 resourceName=* numContainers=10 #asks=1 i=9 outstanding=10 DEBUG [Thread-7] (AMRMClientImpl.java:619) - BEFORE decResourceRequest: applicationId= priority=0 resourceName=* numContainers=10 #asks=0 INFO [Thread-7] (AMRMClientImpl.java:652) - AFTER decResourceRequest: applicationId= priority=0 resourceName=* numContainers=9 #asks=1 i=10 outstanding=10 DEBUG [Thread-7] (AMRMClientImpl.java:619) - BEFORE decResourceRequest: applicationId= priority=0 resourceName=* numContainers=9 #asks=0 INFO [Thread-7] (AMRMClientImpl.java:652) - AFTER decResourceRequest: applicationId= priority=0 resourceName=* numContainers=8 #asks=1 i=11 outstanding=10 DEBUG [Thread-7] (AMRMClientImpl.java:619) - BEFORE decResourceRequest: applicationId= priority=0 resourceName=* numContainers=8 #asks=0 INFO [Thread-7] (AMRMClientImpl.java:652) - AFTER decResourceRequest: applicationId= priority=0 resourceName=* numContainers=7 #asks=1 i=12 outstanding=10 DEBUG [Thread-7] (AMRMClientImpl.java:619) - BEFORE decResourceRequest: applicationId= priority=0 resourceName=* numContainers=7 #asks=0 INFO [Thread-7] (AMRMClientImpl.java:652) - AFTER decResourceRequest: applicationId= priority=0 resourceName=* numContainers=6 #asks=1 i=13 outstanding=10 DEBUG [Thread-7] (AMRMClientImpl.java:619) - BEFORE decResourceRequest: applicationId= priority=0 resourceName=* numContainers=6 #asks=0 INFO [Thread-7] (AMRMClientImpl.java:652) - AFTER decResourceRequest: applicationId= priority=0 resourceName=* numContainers=5 #asks=1 i=14 outstanding=10 DEBUG [Thread-7] (AMRMClientImpl.java:619) - BEFORE decResourceRequest: applicationId= priority=0 resourceName=* numContainers=5 #asks=0 INFO [Thread-7] (AMRMClientImpl.java:652) - AFTER decResourceRequest: applicationId= priority=0 resourceName=* numContainers=4 #asks=1 i=15 outstanding=10 DEBUG [Thread-7] (AMRMClientImpl.java:619) - BEFORE decResourceRequest: applicationId= priority=0 resourceName=* numContainers=4 #asks=0 INFO [Thread-7] (AMRMClientImpl.java:652) - AFTER
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284304#comment-14284304 ] Craig Welch commented on YARN-1039: --- As I understand it (and, I may be wrong on this...) the original intent of this jira was to provide a boolean switch to control a set of behaviors expected to be important for a long running service - among other things, what sort of nodes to schedule on and how to handle logs. This could be on a sliding scale based on duration, but I'm not sure that works so well - at what duration do we start to change how we handle logs and / or where we schedule things? While related, I think that converting this from a boolean to a range will make it more difficult to use it for the intended usecase. I also think that packing together all of these behaviors into one parameter might be a negative overall. I do think, to [~john.jian.fang] 's point, as of now using this to determine where to schedule tasks to avoid spot instances and the like has really been superseded by Node Labels and I do not think we should add additional functionality for that here - Node Labels is really the way to handle that part of the usecase. That leaves, potentially among other things, affinity/anti-affinity issues (not scheduling long running tasks together/scheduling them together) and log handling (how do we tell the system we want log handling for a long running service, if, in fact, the system needs to be told that). I submit that it would be better to have separate solutions to each of these needs which can be bundled together to achieve the overall usecase, as I think that will provide better control without adding too much complexity for the end user. Which means that we would break this out into affinity/anti-affinity and logging configuration. We could always have a single parameter (like this one) which set's the others for convenience, I'm not sure we'll actually need it, but I do think that splitting out the bundled functionality into individual items (some of which may already be being worked on elsewhere) is the way to go. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)