Xintong Song created FLINK-10640:
------------------------------------

             Summary: Enable Slot Resource Profile for Resource Management
                 Key: FLINK-10640
                 URL: https://issues.apache.org/jira/browse/FLINK-10640
             Project: Flink
          Issue Type: New Feature
          Components: ResourceManager
            Reporter: Xintong Song


Motivation & Backgrounds
 * The existing concept of task slots roughly represents how many pipeline of 
tasks a TaskManager can hold. However, it does not consider the differences in 
resource needs and usage of individual tasks. Enabling resource profiles of 
slots may allow Flink to better allocate execution resources according to tasks 
fine-grained resource needs.
 * The community version Flink already contains APIs and some implementation 
for slot resource profile. However, such logic is not truly used. 
(ResourceProfile of slot requests is by default set to UNKNOWN with negative 
values, thus matches any given slot.)

Preliminary Design
 * Slot Management
 A slot represents a certain amount of resources for a single pipeline of tasks 
to run in on a TaskManager. Initially, a TaskManager does not have any slots 
but a total amount of resources. When allocating, the ResourceManager finds 
proper TMs to generate new slots for the tasks to run according to the slot 
requests. Once generated, the slot's size (resource profile) does not change 
until it's freed. ResourceManager can apply different, portable strategies to 
allocate slots from TaskManagers.
 * TM Management
 The size and number of TaskManagers and when to start them can also be 
flexible. TMs can be started and released dynamically, and may have different 
sizes. We may have many different, portable strategies. E.g., an elastic 
session that can run multiple jobs like the session mode while dynamically 
adjusting the size of session (number of TMs) according to the realtime working 
load.
 * About Slot Sharing
 Slot sharing is a good heuristic to easily calculate how many slots needed to 
get the job running and get better utilization when there is no resource 
profile in slots. However, with resource profiles enabling finer-grained 
resource management, each individual task has its specific resource need and it 
does not make much sense to have multiple tasks sharing the resource of the 
same slot. Instead, we may introduce locality preferences/constraints to 
support the semantics of putting tasks in same/different TMs in a more general 
way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to