[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling
[ https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201113#comment-14201113 ] Aditya Kishore commented on YARN-2791: -- Great! I think this JIRA should be added as a sub-task as non of the other sub-tasks cover this aspect. > Add Disk as a resource for scheduling > - > > Key: YARN-2791 > URL: https://issues.apache.org/jira/browse/YARN-2791 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Swapnil Daingade >Assignee: Yuliya Feldman > > Currently, the number of disks present on a node is not considered a factor > while scheduling containers on that node. Having large amount of memory on a > node can lead to high number of containers being launched on that node, all > of which compete for I/O bandwidth. This multiplexing of I/O across > containers can lead to slower overall progress and sub-optimal resource > utilization as containers starved for I/O bandwidth hold on to other > resources like cpu and memory. This problem can be solved by considering disk > as a resource and including it in deciding how many containers can be > concurrently run on a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling
[ https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200929#comment-14200929 ] Aditya Kishore commented on YARN-2791: -- I can see how YARN-2139 and this JIRA could be seen as related, especially with the terse summary of this JIRA, however they aim to address two different concerns. YARN-2139 is about disk resource scheduling isolation and throttling at the execution time while this one is in the capacity planning/resource allocation phase. So, either 1) this JIRA could continue on its own with its own design discussion here since the concerns are different from those discussed on YARN-2139, or 2) we widen the scope of YARN-2139 and add this as a sub-task. I, as I see a clear separation of concern, would prefer the first choice however would be okay with second too. > Add Disk as a resource for scheduling > - > > Key: YARN-2791 > URL: https://issues.apache.org/jira/browse/YARN-2791 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Swapnil Daingade >Assignee: Yuliya Feldman > > Currently, the number of disks present on a node is not considered a factor > while scheduling containers on that node. Having large amount of memory on a > node can lead to high number of containers being launched on that node, all > of which compete for I/O bandwidth. This multiplexing of I/O across > containers can lead to slower overall progress and sub-optimal resource > utilization as containers starved for I/O bandwidth hold on to other > resources like cpu and memory. This problem can be solved by considering disk > as a resource and including it in deciding how many containers can be > concurrently run on a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling
[ https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200823#comment-14200823 ] Aditya Kishore commented on YARN-2791: -- I think the summary of this JIRA may seem as duplicate of YARN-2139 but they are not. YARN-2139 aims to address throttling/isolation of disk IO on individual container basis. However, from the description it seems that the purpose of this JIRA is to include the node's disks as a parameter in the capacity calculation of the node alongside with its memory and CPU cores. May be the summary should be reworded to reflect this. > Add Disk as a resource for scheduling > - > > Key: YARN-2791 > URL: https://issues.apache.org/jira/browse/YARN-2791 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Swapnil Daingade >Assignee: Yuliya Feldman > > Currently, the number of disks present on a node is not considered a factor > while scheduling containers on that node. Having large amount of memory on a > node can lead to high number of containers being launched on that node, all > of which compete for I/O bandwidth. This multiplexing of I/O across > containers can lead to slower overall progress and sub-optimal resource > utilization as containers starved for I/O bandwidth hold on to other > resources like cpu and memory. This problem can be solved by considering disk > as a resource and including it in deciding how many containers can be > concurrently run on a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)