I suspect this relates to: https://issues.apache.org/jira/browse/FLINK-5030

For which there was a PR at some point but nothing has been done so far. It
seems the current code explicitly uses the IP vs Hostname for Netty SSL
configuration.

Without that I'm really wondering how people are reasonably using SSL on a
Kubernetes Flink-based cluster as every time a pod is (re-started) it can
theoretically take a different IP? Or do I miss something?

--
Christophe

On Tue, Mar 27, 2018 at 3:24 PM, Edward Alexander Rojas Clavijo <
edward.roja...@gmail.com> wrote:

> Hi all,
>
> Currently I have a Flink 1.4 cluster running on kubernetes and with SSL
> configuration based on https://ci.apache.org/proje
> cts/flink/flink-docs-master/ops/security-ssl.html.
>
> However, as the IP of the nodes are dynamic (from the nature of
> kubernetes), we are using only the DNS which we can control using
> kubernetes services. So we add to the Subject Alternative Name(SAN) the
> flink-jobmanager DNS and also the DNS for the task managers
> *.flink-taskmanager-svc (each task manager has a DNS in the form
> flink-taskmanager-0.flink-taskmanager-svc).
>
> Additionally we set the jobmanager.rpc.address property on all the nodes
> and each task manager sets the taskmanager.host property, all matching the
> ones on the certificate.
>
> This is working well when using Job with Parallelism set to 1. The SSL
> validations are good and the Jobmanager can communicate with Task manager
> and vice versa.
>
> But when we set the parallelism to more than 1 we have exceptions on the
> SSL validation like this:
>
> Caused by: java.security.cert.CertificateException: No subject
> alternative names matching IP address 172.30.247.163 found
> at sun.security.util.HostnameChecker.matchIP(HostnameChecker.java:168)
> at sun.security.util.HostnameChecker.match(HostnameChecker.java:94)
> at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509Trus
> tManagerImpl.java:455)
> at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509Trus
> tManagerImpl.java:436)
> at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509Trust
> ManagerImpl.java:252)
> at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X50
> 9TrustManagerImpl.java:136)
> at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHa
> ndshaker.java:1601)
> ... 21 more
>
>
> From the logs I see the Jobmanager is correctly registering the
> taskmanagers:
>
> org.apache.flink.runtime.instance.InstanceManager   - Registered
> TaskManager at flink-taskmanager-1 (akka.ssl.tcp://flink@taiga-fl
> ink-taskmanager-1.flink-taskmanager-svc.default.svc.cluster.local:6122/user/taskmanager)
> as 1a3f59693cec8b3929ed8898edcc2700. Current number of registered hosts
> is 3. Current number of alive task slots is 6.
>
> And also each taskmanager is correctly registered to use the hostname for
> communication:
>
> org.apache.flink.runtime.taskmanager.TaskManager   - TaskManager will use
> hostname/address 'flink-taskmanager-1.flink-tas
> kmanager-svc.default.svc.cluster.local' (172.30.247.163) for
> communication.
> ...
> akka.remote.Remoting   - Remoting started; listening on addresses
> :[akka.ssl.tcp://flink@flink-taskmanager-1.flink-taskmanager
> -svc.default.svc.cluster.local:6122]
> ...
> org.apache.flink.runtime.io.network.netty.NettyConfig   - NettyConfig
> [server address: flink-taskmanager-1.flink-task
> manager-svc.default.svc.cluster.local/172.30.247.163, server port: 6121,
> ssl enabled: true, memory segment size (bytes): 32768, transport type: NIO,
> number of server threads: 2 (manual), number of client threads: 2 (manual),
> server connect backlog: 0 (use Netty's default), client connect timeout
> (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]
> ...
> org.apache.flink.runtime.taskmanager.TaskManager   - TaskManager data
> connection information: bf4a9b50e57c99c17049adb66d65f685 @
> flink-taskmanager-1.flink-taskmanager-svc.default.svc.cluster.local
> (dataPort=6121)
>
>
>
> But even with that, it seems like the taskmanagers are using the IP
> communicate between them and the SSL validation fails.
>
> Do you know if it's possible to make the taskmanagers to use the hostname
> to communicate instead of the IP ?
> or
> Do you have any advice to get the SSL configuration to work on this
> environment ?
>
> Thanks in advance.
>
> Regards,
> Edward
>



-- 
Christophe

Reply via email to