[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135591#comment-17135591 ] Till Rohrmann commented on FLINK-17579: --- You are now assigned [~karmagyz]. > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Assignee: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135561#comment-17135561 ] Yangze Guo commented on FLINK-17579: Agreed. [~trohrmann] Could you assign this to me? I'd like to prepare a PR according to the latest consensus. > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135527#comment-17135527 ] Till Rohrmann commented on FLINK-17579: --- This sounds good to me [~karmagyz]. Maybe we don't need the full length of a {{UUID}} to make the name unique. Otherwise the names might become quite lengthy. > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135451#comment-17135451 ] Yangze Guo commented on FLINK-17579: Yes, I think it could work. So, it seems that it is still necessary to allow the user to set an arbitrary prefix. To summarize, the proposed changes are: - Add a config option "taskmanager.resource-id.prefix". - In standalone mode, if "taskmanager.resource-id.prefix" is defined, the {{ResourceID}} of the {{taskexecutor}} should be {{prefix-uuid}}. WDYT? [~trohrmann][~azagrebin][~fly_in_gis] > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134718#comment-17134718 ] Till Rohrmann commented on FLINK-17579: --- In the standalone case, users would need to use a third party tool to restart the TMs. This tool would then have to set the right resource ids which should be ok as long as we support setting some part of the resource id explicitly. I believe this should be the case. > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133867#comment-17133867 ] Yangze Guo commented on FLINK-17579: I ask because I do not know whether we plan to support local recovery in standalone mode as well. If we plan to, it seems we could not restart a {{TaskManager}} with the fixed {{ResourceID}} (hostname could be fixed but uuid would be different each time). Do you have some suggestions/ideas to achieve it? BTW, it seems we do not support to restart a TM with a fixed {{ResourceID}} in the standalone mode now. I think this proposal will not introduce any regression. > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133328#comment-17133328 ] Till Rohrmann commented on FLINK-17579: --- I think the support for persistent volumes is not super relevant for the standalone mode [~karmagyz]. But it could still work if the each {{TaskManager}} process is being started with a fixed {{ResourceID}} (fixed across restarts). > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132864#comment-17132864 ] Yangze Guo commented on FLINK-17579: Thanks for the feedback, [~trohrmann], [~fly_in_gis] and [~azagrebin]. Here are some of my thoughts: - Since I do not see any drawback or regression in using {{hostname-uuid}} as the {{ResourceID}}, I think it probably makes sense to just change the default behavior. We may not need a config option for it. - Regarding Till's concern, I think it would not obstruct it because, in Kubernetes scenario, the {{hostname}} is unique and constant. Correct me if I mistake [~fly_in_gis]. - However, it may not work in standalone scenario since the {{hostname}} is not ensured to be unique. Do we also plan to support local recovery in standalone mode? [~trohrmann] > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130635#comment-17130635 ] Till Rohrmann commented on FLINK-17579: --- One thing to consider is that we wanted to add support for persistent volumes and local recovery at some point which could vastly improve recovery speed of Flink jobs. One idea how to solve the problem was to give TM processes a unique and constant {{ResourceID}}. If now a TM is always started with the same persistent volume, then we could achieve local recovery by simply redeploying tasks to the TM with the same {{ResourceID}} as before. This could also work if we only match on a unique and constant prefix of course. The important bit would that we keep this idea in mind and try not to obstruct it if possible. > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128805#comment-17128805 ] Yang Wang commented on FLINK-17579: --- [~azagrebin] [~karmagyz] Thanks for the fruitful discussion. First i want to add more background of this ticket. > Why the users want to specify the TaskManager instance name? More and more users are deploying Flink in container environment, especially K8s. When they have started a standalone session/job cluster, they need a easier way to find the corresponding pod for a specific TaskManager. So then they could tunnel in and debug the process(e.g. jmap). > Use env or config option I have no preference. Either of them makes sense to me. If we could provide a unified approach to set Flink options via environment variables, it will be great. > How to generate the TaskManager name? I think an config option whether to use {{hostname-uuid}} for TaskManager name is enough. I agree that the uuid is necessary to avoid duplication. It could be a short string, maybe 6 characters are enough. I am not sure whether {{hostname-uuid}} is a too long string. In our production environment, the full qualified hostname is usually no more than 45 characters(e.g. xyz011177171118.na610.aliyun.com). In K8s, the podname could not be more than 63 characters. So i think maybe it is similar to Yarn container id(e.g. container_e04_1591199811063_0665_01_02). All in all, providing a meaningful name for each TaskManager will make the log more human readable and help with debugging. > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128327#comment-17128327 ] Andrey Zagrebin commented on FLINK-17579: - Ok, couple of more questions. Do you think the option should be to configure an arbitrary prefix for the id or the option can just say whether to use its hostname? What about the Master id? Also, the `-` can be quite a long string for logs. I suppose we use in many places. Maybe, it would be better to output it once in relevant places, like first connection and then just ids? cc [~trohrmann] > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128044#comment-17128044 ] Yangze Guo commented on FLINK-17579: [~azagrebin] Thanks for the advice. +1 to use config option instead. +1 to append a random id at the end of it. > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127970#comment-17127970 ] Andrey Zagrebin commented on FLINK-17579: - [~karmagyz] Thanks for looking into this! Why do you think it should be an environment variable and not a Flink option? The Flink option can be also changed as a dynamic property argument (-D) for taskmanager.sh if you want to share flink-conf.yaml among TMs in standalone. Moreover, there are plans in the community to introduce a unified approach to set Flink options via environment variables. I would suggest to avoid multiplying ways of Flink configuration for better maintainability. I also think it is dangerous to allow users to set fixed ids. We assume that all ids are unique everywhere in Flink. If some TMs accidently get the same id, it can lead to unpredictable failures. Also, it might be the case that if the same TM rejoins cluster, we assume that it will have another id to avoid collisions with its previous run in the system. Therefore, I would consider to keep ids always random and unique. The id could consist of a fixed part and still some random prefix: `-`. > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_STANDALONE_TASK_EXECUTOR_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17579) Set the resource id of taskexecutor according to environment variable if exist in standalone mode
[ https://issues.apache.org/jira/browse/FLINK-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102561#comment-17102561 ] Yang Wang commented on FLINK-17579: --- It will be a very useful feature when deploying Flink standalone cluster in the K8s cluster. Each taskmanager will have a dedicated hostname and could be used to register with jobmanager. For users, it will be easier to identify the taskmanager and help with profiling and debuging. > Set the resource id of taskexecutor according to environment variable if > exist in standalone mode > - > > Key: FLINK-17579 > URL: https://issues.apache.org/jira/browse/FLINK-17579 > Project: Flink > Issue Type: Sub-task >Reporter: Yangze Guo >Priority: Major > > Allow user to specify the resource id of TaskExecutor through the environment > variable in standalone mode. The name of that variable could be > {{FLINK_TASKEXECUTOR_RESOURCE_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)