Alex created FLINK-11632:
----------------------------

             Summary: Make TaskManager automatic bind address picking more 
explicit (by default) and more configurable
                 Key: FLINK-11632
                 URL: https://issues.apache.org/jira/browse/FLINK-11632
             Project: Flink
          Issue Type: Improvement
          Components: Distributed Coordination, Network, TaskManager
            Reporter: Alex


Currently, there is an optional {{taskmanager.host}} configuration option in 
{{flink-conf.yaml}} that allows users of Flink to "statically" pre-define what 
should be a bind address for TaskManager to listen on (note: it's also possible 
to override this option by passing corresponding command line option to Flink).

In case when the option is not set, TaskManager would try [heuristically pick 
up a bind 
address|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L421-L442].

The resulting address (hostname) is used to advertise different service 
endpoints (running in TM) to the JobManager. Also it would be resolved to an 
{{[InetAddress|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L359]}}
 later that used as binding address for TMs inner node communication.

This proposal is to minimize usage of heuristics (by default) by introducing a 
new configuration option (for example, {{taskmanager.host.bind-policy}}) with 
possible values:
 * {{"hostname"}} - default, use TM's host's name ({{== 
InetAddress.getLocalHost().getHostName()}};
 * {{"ip"}} - use TM's host's ip address ({{== 
InetAddress.getLocalHost().getHostAddress()}});
 * {{"auto-detect-hostname"}} - use the heuristics based detection mechanism.

*Note:* the configuration key and values could be named better and open for 
proposals.
*Note 2:* in the future, the configuration option _may_ require to be extended 
to allow choosing some specific network interface, or preference of ipv6 vs 
ipv4.
h3. Rationale

[The heuristics 
mechanism|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/net/ConnectionUtils.java#L364-L475]
 tries to establish a probe connection to {{jobmanager.rpc.address}} from 
different network interface addresses. 
 In case of parallel setups (when JM and multiple TMs start simultaneously, in 
parallel), this depends on timing, assigned network ip addresses and may end up 
with "non-uniform" address bindings of TMs (some may be "lucky" to pick up non 
default network interface, some would fallback to 
{{InetAddress.getLocalHost().getHostName()}}. At the end, it's less obvious and 
transparent which binding address a TM picks up.

In practice, it's possible that in majority of cases (in well setup 
environments) the heuristics mechanism returns a result that matches 
{{InetAddress.getLocalHost()}}. The proposal is to stick with this more simpler 
and explicit binding (by default), avoiding non-determinism of heuristics.

The old mechanism is kept available, in case if it is useful in some setups. 
But would require explicit configuration setting.

Additionally, this proposal extends "auto configuration" option by allowing 
users to choose the host's ip address (instead of hostname). This may be 
convenient in situations where the TMs' machines are not necessary reachable 
via DNS (for example in a Kubernetes setup).









--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to