Hi everyone,

I'm currently testing data local computing of Flink on XtreemFS (I'm one of
the developers). We have implemented our adapter using the hadoop
FileSystem interface and all works well. However upon closer inspection, I
found that only remote splits are assigned, which is strange, as XtreemFS
stores files split across multiple nodes and reports the hostnames for each
split. Specifically, I'm receiving the warning message issued in:
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/instance/InstanceConnectionInfo.java#L103

So each TaskManager cannot resolve their hostname from their IP, so the
input split assigner cannot connect nodes to splits. This is because the
nodes identify with their IPs (and not their hostnames), but the splits
identify with hostnames, so no connection can be made, resulting in
(mostly) non-local computing. I tracked the issue down and it turns out
that the default name lookup mechanism in Java seems to be faulty on my
cluster configuration. When passing in "env.java.opts:
-Dsun.net.spi.nameservice.provider.1=dns,sun" (a non-default nameservice)
in flink-conf.yaml, then the IP addresses are resolved to hostnames
properly.

I know that this is probably not directly related to Flink, but given the
fact that you specifically handle the case where hostname resolution is not
possible, I was wondering whether you have experienced such cases, and if
so, how you overcame the issue. I'm not particularly fond of performing way
too many reverse lookups, when the normal strategy using files should work
as well (note that nslookup <IP-OF-NODE> works as expected, and when
strace'ing the command, it does not even connect to the nameserver).

Thanks in advance for your help
Robert

-- 
My GPG Key ID: 336E2680

Reply via email to