shuai.xu created FLINK-7870:
-------------------------------

             Summary: SlotPool should cancel the slot request to RM if not need 
any more.
                 Key: FLINK-7870
                 URL: https://issues.apache.org/jira/browse/FLINK-7870
             Project: Flink
          Issue Type: Bug
          Components: Cluster Management
            Reporter: shuai.xu
            Assignee: shuai.xu


1. SlotPool will request slot to rm if its slots are not enough.
2. If a slot request is not fulfilled in a certain time, SlotPool will treat 
the request as timeout and send a new slot request by triggering a failover in 
JobMaster, the previous request is not needed any more, but rm does not know it.
3. This may cause the rm request much more resource than the job really need.
For example:
1. A job need 100 slots. RM request 100 container to YARN.
2. But YARN is busy now, it has no resource for the job.
3. The job failover as the resource request not fulfilled in time.
4. It ask 100 slots again, now RM request 200 container to YARN.
5. If failover server time, the containers request  will become more and more.
6. Now YARN has resource, it will find that the job may need thousands of 
containers. This is a waste of resources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to