[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082312#comment-15082312 ] Subru Krishnan commented on YARN-3870: -- Regarding the ID, I am in principle fine with asking the AM to set it. We do have the option of reusing the _responseID_ of *AllocateRequest* which both the RM and AM maintain today. It would be good to also link the _responseID_ to the actual allocated container in *AllocateResponse* as this is a useful hint for the AMs. In fact has been requested by [~markus.weimer] to simplify certain bookkeeping for the [REEF | http://reef.apache.org/ ] AM. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo >Assignee: Karthik Kambatla > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081242#comment-15081242 ] Lei Guo commented on YARN-3870: --- I am not against to combine this JIRA and YARN-371, There is common ground between these two JIRAs. And more likely the final technical solution will be single solution to cover both, though it's not necessary. Maybe we can view YARN-371 as a technical speculation and YARN-3870 as one related use case (if YARN-371 is resolved, YARN-3870 should be covered). >From another angle, YARN-3870 could be resolved via approaches without ID. The >scheduling is more care about the current snapshot of resource requests from >applications. It's not mandatory to have the ID, as long as the snapshot can >provide detailed resource request information, scheduler can do fine >scheduling. The ID will mainly help to prevent/handle issues from asynchronous >protocol. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo >Assignee: Karthik Kambatla > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080363#comment-15080363 ] Karthik Kambatla commented on YARN-3870: I am not saying we shouldn't do this. I am only saying we should likely discuss this on YARN-371 so folks there are aware of this conversation. bq. even without an ID, what happens today if an application makes each request with a different Priority ? In that case, we would still run into too many ResourceRequests, however this is unlikely to happen on a cluster that isn't compromised. Outside of this, today, we hold a lot more information about all running containers. Storing resource-requests for outstanding requests might not be prohibitively expensive. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo >Assignee: Karthik Kambatla > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15079334#comment-15079334 ] Karthik Kambatla commented on YARN-3870: [~leftnoteasy] - agree that YARN-4485 doesn't necessarily need ID-ing all requests corresponding to one task. The JIRAs are related only due to the data-structures in AppSchedulingInfo: # If we add IDs as discussed here, we don't need any other data-structure changes except adding a timestamp to each ResourceRequest. # If we don't add IDs, we might need to store the resource-requests as Map>>>. This is required to be able to handle cases where the AM retracts some of the container requests. Refer to my example above with 3, 7 and 2 containers. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo >Assignee: Karthik Kambatla > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076775#comment-15076775 ] Wangda Tan commented on YARN-3870: -- [~kasha], bq. Was fleshing this out further. The number of IDs and hence ResourceRequests could be O(num. outstanding containers) which could pose problems as outlined in YARN-371. In fact, this JIRA is a duplicate of YARN-371: may be, we should close this and continue the discussion there. As I mentioned above, I think we shouldn't combine YARN-371 and YARN-4485 together: YARN-4485 is more like an internal change of scheduler to me: Let's say an AM originally requests 1000 container (T1), then AM requests 1200 containers (T2), then after scheduler allocated 100 containers, AM requests 1200 containers again (T3). For the original request, scheduler records: T1, 1000. After T2, scheduler records: T1, 1000; T2, 200. After T3, scheduler records: T1, 900 (scheduler allocates 100 containers); T2, 200; T3, 100. Instead recording timestamps for all resource requests, AM only needs to record timestamp to #pending-requests. And scheduler will "dequeue" from the timestamp to #pending-requests (sorted by time) when container allocated. Like what you said, it will be hard to ask AM to set the ID, but scheduler should easily set it. But this solution needs more work if we want to save these timestamps when RM restart. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo >Assignee: Karthik Kambatla > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076746#comment-15076746 ] Arun Suresh commented on YARN-3870: --- Hmmm... not sure entirely sure the argument holds, or I might be missing something. For eg. even without an ID, what happens today if an application makes each request with a different Priority ? > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo >Assignee: Karthik Kambatla > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076743#comment-15076743 ] Karthik Kambatla commented on YARN-3870: Was fleshing this out further. The number of IDs and hence ResourceRequests could be O(num. outstanding containers) which could pose problems as outlined in YARN-371. In fact, this JIRA is a duplicate of YARN-371: may be, we should close this and continue the discussion there. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo >Assignee: Karthik Kambatla > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076642#comment-15076642 ] Arun Suresh commented on YARN-3870: --- +1 to having the AM create the ID (which I guess is required if the AM is to co-relate a resourcereq with a container response). Can we set the ID that is something of a combination of {{app_attempt_id + seq_id}}, where seq_id is incremented monotonically per app attempt ? to distinguish between different app attempts. Also, I guess the outstanding requests for an app needs to be stored in the state store so subsequent app attempts can be intimated of unfullfulled reqs > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo >Assignee: Karthik Kambatla > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076629#comment-15076629 ] Karthik Kambatla commented on YARN-3870: bq. Karthik Kambatla, I am thinking whether we also need some update for the response part to correlate it with the ResourceRequest ID. As the scheduling is asynchronous, AM will also need to know the relation between response and request. bq. If the AM doesn't add an ID, the RM could add one. Or, we could have the RM add the IDs and return them to the AM for help with book keeping. Thought more about this. Since one AllocateRequest could have multiple ResourceRequests, the protocol becomes quite complicated if the RM creates an ID instead of the AM. How about we expect the AM to set this ID? If the AM doesn't set, we treat the requests the same way we do today (ID = -1). In the AllocateResponse, the RM could send the last received ResourceRequest. The AM could look at this ACK to see if it has to resend the requests? The AMRMClient and MR-AM could be updated to do this. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075991#comment-15075991 ] Lei Guo commented on YARN-3870: --- [~kasha], I am thinking whether we also need some update for the response part to correlate it with the ResourceRequest ID. As the scheduling is asynchronous, AM will also need to know the relation between response and request. Another question is whether we should make this detailed information as parameter controlled, as it may increase the AM/RM communication overhead. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075740#comment-15075740 ] Karthik Kambatla commented on YARN-3870: bq. I think this JIRA and YARN-4485 are different: bq. I think we cannot simply update timestamp when new resource request arrives. For example, at T1, AM asks 100 * 1G container; after 2 mins (T2), assume there's no container allocated, AM asks 100 * 1G container, we cannot say the resource request is added at T2. Instead, we should only set new timestamp for incremental asks For YARN-4485, I was planning on taking this exact approach. While the two JIRAs and their purposes are different, the ability to identify a set of requests that arrived at one point in time requires similar updates to the data structures we use in AppSchedulingInfo. bq. what do you think of using for id? Timestamps for Ids might not be a good idea especially when an AM can restart. Also, there might be merit to differentiating two ResourceRequests (say, at different priorities) received at the same time. Discussed this with [~asuresh] and [~subru] offline. We felt the following changes would help us address multiple JIRAs (as [~xinxianyin] listed): # Add an ID field to ResourceRequest - this can be a sequence number for each application. On AM restarts, a subsequent attempt could choose to resume from appropriate sequence number. If the AM doesn't add an ID, the RM could add one. Or, we could have the RM add the IDs and return them to the AM for help with book keeping. # YARN-4485 would likely want to add a timestamp in addition to this. Given the IDs, we likely don't have to do special delta handling. # In case the number of containers in the existing ResourceRequest increases, the delta is given a new ID. e.g - e.g. App increases request from 3 containers to 7 containers of same capability etc., the first three would have ID '1' and the next four would have ID '2'. # In case the number of containers corresponding to an existing ResourceRequest decreases, the number of containers is reduced from the largest ID to the smallest ID until the decrease is accounted for. e.g. If an app asks for 3, 7 and 2 containers in subsequent allocate calls, once these calls are processed, the app has 2 containers with ID '1'. # The resource-request data structure in AppSchedulingInfo will be this {{Map>>>}}. This would help YARN-314 as well. YARN-314 will need a few more changes to fix up the matching in each of the schedulers. # Note that we will still be expanding a ResourceRequest to node-local, rack-local and ANY requests. These would now be tied with an ID and hence can be updated correctly. If folks feel this would address all requirements, I could take a stab at the first patch. [~asuresh] and [~subru] have graciously offered to iterate on my prelim patch to fix up any issues in FairScheduler and CapacityScheduler. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purp
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074755#comment-15074755 ] Xianyin Xin commented on YARN-3870: --- +1 for adding an unique id for a resource request, but i would suggest we consider these kind of problems in a more systematic way, considering YARN-314, YARN-1042, YARN-371, YARN-4485 and this. Like my comment in YARN-314, a natural way the scheduler works should like a factory, it receives orders, and prepare for that. Once we accept the work philosophy, we'll find it's natural and necessary for a resource order has the following dimensions 1. order id, which can identify an order, and can get overdue, or has a time limit; 2. priority; 3. a collection of request unit, each specifies a kind of resource request,that should have a coordinate of ; 4. relaxLocality; 5. canbeDecomposed, or ifGangScheduling; 6. ... Scheduler do scheduling based on order form, and should not swallow any information passed from the app. Any thoughts? > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074685#comment-15074685 ] Wangda Tan commented on YARN-3870: -- Hi [~kasha]/[~asuresh], I think this JIRA and YARN-4485 are different: - This JIRA is focus on "resource request group", like the example in description. {code} c1: host1, host2, host4 c2: host2, host3, host5 {code} Current resource request cannot store "group by" info. - YARN-4485 is focus on timestamp: Some random thoughts for YARN-4485, I think we cannot simply update timestamp when new resource request arrives. For example, at T1, AM asks 100 * 1G container; after 2 mins (T2), assume there's no container allocated, AM asks 100 * 1G container, we cannot say the resource request is added at T2. Instead, we should only set new timestamp for incremental asks (e.g. AM asks 120 * 1G container at T2, we should say, 100 resource requests from T1, and 20 resource requests from T2). > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074658#comment-15074658 ] Karthik Kambatla commented on YARN-3870: +1 to improving the way we are storing ResourceRequests in AppSchedulingInfo. In the context of YARN-4485, I would like to put a timestamp on when each ResourceRequest was received. We can use this to determine the container allocation latency at allocation-time. Today, we store all ResourceRequests for the same priority and locality together irrespective of whether the requests came all together or separate. [~asuresh] - what do you think of using for id? I forget the details - how do we handle an AM restart, do we create a new AppSchedulingInfo? If so, we can just use the timestamp as is. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070279#comment-15070279 ] Arun Suresh commented on YARN-3870: --- [~leftnoteasy], With respect to the AM, I was thinking.. just having it as a field in the ReseourceRequest as well as the Container (returned by the allocate call) would suffice. >From the perspective of the Scheduler, yes, {{Map = >>}} was the direction I was thinking.. Correct me if I >am wrong, but, currently, there is an implicit understanding that all >resources requests for the same resource requirement should have the same >priority. Having an explicit request id would allow us to remove that >constraint as well.. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070186#comment-15070186 ] Wangda Tan commented on YARN-3870: -- Hi [~grey], Thanks for raising this, we definitely need such mechanism to better describe our resource request. [~asuresh], I'm not sure how the unique id works? Are you planing to add it as a key to AppSchedulingInfo resource requests map? (e.g. {{Map = >>}}) > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070178#comment-15070178 ] Subru Krishnan commented on YARN-3870: -- +1 on this. Thanks [~grey] for raising this. I have been having offline discussions with [~asuresh] and [~curino] around Distributed Scheduling (YARN-2877) and Federation (YARN-2915). In both scenarios, sending the raw container request and letting the RM expand will save us a lot of pain as currently we are finding it very difficult to route requests correctly in the AMRMProxy (YARN-2844) > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069208#comment-15069208 ] Arun Suresh commented on YARN-3870: --- Thank you for starting this discussion [~grey] Correct me if I am wrong, what you are proposing, I guess is some way for the Scheduler to co-relate the expanded Resource Requests. I do feel this would be genuinely useful, not only from a Scheduling perspective for eg. making affinity / anti-afinity scheduling decisions viz. YARN-1042. This will also greatly help improving pre-emption decisions in the FairScheduler viz. YARN-2154.. This would also be extremely useful for AMs too. Currently the MRAM does the book keeping and matches an allocated container to ResourceRequest. AMs can be generally relieved of this job if an allocated Container Token can easily be matched against a Resource Request. One possible approach could be to have the AMClient generate a unique id for a Resource request and tag each of the expanded requests (Node, Rack and ANY) with this id. This Id can then be passed around in the Container/ContainerTokenIdentifier. [~ka...@cloudera.com], [~vinodkv], [~leftnoteasy], Thoughts ? > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)