[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2015-01-14 Thread Swapnil Daingade (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277979#comment-14277979
 ] 

Swapnil Daingade commented on YARN-2791:


Hi [~kasha]] and [~vinodkv]],
Went over the latest design doc for YARN-2139 and posted my comments.


 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2015-01-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276107#comment-14276107
 ] 

Vinod Kumar Vavilapalli commented on YARN-2791:
---

Okay folks, I've read the design docs on both YARN-2791 (this JIRA) and 
YARN-2139. This is indeed part of YARN-2139, and a direct dup of YARN-2618 and 
other tickets.

Yes YARN-2139 is a much larger effort but it encompasses both scheduling and 
isolation. The important tickets of YARN-2139 already were created before this 
JIRA. I am going to close this as a duplicate in a day unless I see specific 
tasks that are not covered under YARN-2139. If there are things that are not 
covered indeed, I urge  Swapnil Daingade, Santosh Marellaand and Yuliya Feldman 
to file sub-tasks under YARN-2791.

As Karthik appealed before, let's have the design discussion over at YARN-2139, 
merging things that are only here and missing in that JIRA. Due credit will be 
given to all contributors to the design and implementation there.

I am oblivious who contributes code, but let's work together please!

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2015-01-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276113#comment-14276113
 ] 

Vinod Kumar Vavilapalli commented on YARN-2791:
---

Oh and if there is existing code around this JIRA, I urge you to either post it 
to the corresponding sub-task at YARN-2139 or file a new ticket if there isn't 
one already. We can discuss about arch and code designs there.Thanks.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2015-01-13 Thread Yuliya Feldman (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276156#comment-14276156
 ] 

Yuliya Feldman commented on YARN-2791:
--

Thank you Vinod for your comment, as [~sdaingade] suggested in his previous 
comment let's make this JIRA as subtask of [YARN-2139 | 
https://issues.apache.org/jira/browse/YARN-2139 ] and do design discussion and 
code walk through if necessary to determine whether it fits into existing 
subtask or it will continue being it's own.
Yes we definitely understand you opened YARN-2139 first. 

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2015-01-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276218#comment-14276218
 ] 

Karthik Kambatla commented on YARN-2791:


I have no reservations against making this a sub-task of YARN-2139, I just did 
so. However, I still am not clear on what we are accomplishing in this subtask. 
I was hoping (so is Vinod) that we could identify items that are not covered 
under YARN-2139 and file sub-tasks for those and give due credit for filing and 
fixing them. 



 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2015-01-12 Thread M. C. Srivas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274007#comment-14274007
 ] 

M. C. Srivas commented on YARN-2791:


The scope in https://issues.apache.org/jira/browse/YARN-2139 is just too 
bloated. We have this problem immediately with YARN overprovisioning since it 
doesn't take into account how performance is impacted by the number of disks on 
each node. We need this fix now, not later. YARN-2139 is too elaborate, and is 
trying to do too much. On the the other hand, it doesn't take into account how 
running DataNodes on the same spindles will impact shuffle performance. I would 
say get this piece of work done, and we can wait on YARN-2139 whenever it gets 
done.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2015-01-12 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274171#comment-14274171
 ] 

Wei Yan commented on YARN-2791:
---

[~srivas], YARN-2618 is a simple solution that limits the maximum allowed 
running containers on each node. Each DataNode is configured with a maximum 
disk value, and YARN treats each container's disk request is 1.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206993#comment-14206993
 ] 

Karthik Kambatla commented on YARN-2791:


Thanks [~sdaingade] for sharing the design doc. Well articulated. 

The designs on YARN-2139 and YARN-2791 are very similar, except for the disk 
resources are called vdisks in YARN-2139 and spindles in YARN-2791. In addition 
to the items specified here, YARN-2139 talks about isolation as well. Other 
than that, do you see any major items YARN-2791 covers that YARN-2139? The 
WebUI is good and very desirable, we should definitely include it. Also, 

I suggest we make this (as is - or split into multiple JIRAs) a sub-task of 
YARN-2139. Discussing the high-level details on one JIRA helps with aligning on 
one final design doc based on everyone's suggestions. 



 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-11 Thread Swapnil Daingade (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207080#comment-14207080
 ] 

Swapnil Daingade commented on YARN-2791:


Thanks Karthik Kambatla. Sure, lets make this a sub-task of YARN-2139.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-06 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200739#comment-14200739
 ] 

Wei Yan commented on YARN-2791:
---

Hi, [~yufeldman], this jira is same to YARN-2139.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman

 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-06 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200823#comment-14200823
 ] 

Aditya Kishore commented on YARN-2791:
--

I think the summary of this JIRA may seem as duplicate of YARN-2139 but they 
are not. YARN-2139 aims to address throttling/isolation of disk IO on 
individual container basis.

However, from the description it seems that the purpose of this JIRA is to 
include the node's disks as a parameter in the capacity calculation of the node 
alongside with its memory and CPU cores. May be the summary should be reworded 
to reflect this.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman

 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200829#comment-14200829
 ] 

Karthik Kambatla commented on YARN-2791:


[~adityakishore] - from reading the description, I believe there is at least a 
significant overlap between the two JIRAs. I think we would benefit from 
consolidating them and working together, than take multiple paths. 

[~sdaingade] - nice to know you have something working. Could you look through 
the design doc on YARN-2139 so we can refine it. If it is significantly 
different from what is posted there, can you also post your design so we can 
evaluate which one is better and move forward. 

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman

 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-06 Thread Yuliya Feldman (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200849#comment-14200849
 ] 

Yuliya Feldman commented on YARN-2791:
--

[~kasha] I agree that consolidating both and working together is a way to go 
here. Let's initiate this.
What I don't quite understand is how 
https://issues.apache.org/jira/browse/YARN-2817 is different from this one or 
from https://issues.apache.org/jira/browse/YARN-2139

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman

 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200900#comment-14200900
 ] 

Karthik Kambatla commented on YARN-2791:


bq. how https://issues.apache.org/jira/browse/YARN-2817 is different from this 
one or from https://issues.apache.org/jira/browse/YARN-2139

I personally don't think it is. Even if we want to address a simpler usecase, I 
would suggest we do to that as part of YARN-2139. 

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman

 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-06 Thread Swapnil Daingade (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200961#comment-14200961
 ] 

Swapnil Daingade commented on YARN-2791:


Karthik Kambatla - I'll go through the design doc for YARN-2139 and post our 
design document here as well. We can then decide if we should have two JIRA's 
or combine them as Aditya Kishore suggested. However I fully support combining 
our efforts on this.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman

 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201101#comment-14201101
 ] 

Karthik Kambatla commented on YARN-2791:


[~adityakishore] - unfortunately, YARN-2139 didn't have a description this far. 
I just added a very succinct one. If you look at the design doc itself and the 
sub-tasks created, you would see that YARN-2139 includes adding disk as a 
resource to the resource-request-vector, the node-capacities, and the 
scheduler. 

[~sdaingade] - looking forward to you hear your thoughts and see a design doc.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman

 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-06 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201113#comment-14201113
 ] 

Aditya Kishore commented on YARN-2791:
--

Great! I think this JIRA should be added as a sub-task as non of the other 
sub-tasks cover this aspect.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman

 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201291#comment-14201291
 ] 

Wangda Tan commented on YARN-2791:
--

Mistakenly assigned to myself, assigned back.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman

 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-10-31 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192536#comment-14192536
 ] 

Wei Yan commented on YARN-2791:
---

Hi, [~sdaingade]. Is this a duplicate of YARN-2139? We have worked on the disk 
I/O scheduling for a while, and will post patches soon after more experiments.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade

 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-10-31 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192537#comment-14192537
 ] 

Wei Yan commented on YARN-2791:
---

And very welcome for your comments for the design doc at YARN-2139.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade

 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)