[
https://issues.apache.org/jira/browse/FLINK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419213#comment-16419213
]
ASF GitHub Bot commented on FLINK-9103:
---
GitHub user EAlexRojas opened a pull request:
https://github.com/apache/flink/pull/5789
[FLINK-9103] Using CanonicalHostName instead of IP for SSL connection on
NettyClient
## What is the purpose of the change
This pull request makes the NettyClient use the CanonicalHostName instead
of the IP address for SSL communication. That way dynamic environments like
kubernetes can be fully supported as certificates with wildcard DNS can be used.
## Brief change log
- Use CanonicalHostName instead of HostNameAddress to identify the server
on the NettyClient
## Verifying this change
This change is already covered by existing tests, such as:
NettyClientServerSslTest (org.apache.flink.runtime.io.network.netty)
- testValidSslConnection
- testSslHandshakeError
Also manually verified the change by running a 4 node kubernetes cluster
with 1 JobManagers and 3 TaskManagers, using wildcard DNS certificates and
executing a stateful streaming program with parallelism set to 2 and verifying
that all nodes are able to communicate to each other successfully.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: no
- The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/EAlexRojas/flink release-1.4
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5789.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5789
commit 202672da7901fe7df912e6a057d6d0c29ccaf0fd
Author: EAlexRojas
Date: 2018-03-29T14:01:24Z
Using CanonicalHostName instead of IP for SSL coonection on NettyClient
> SSL verification on TaskManager when parallelism > 1
>
>
> Key: FLINK-9103
> URL: https://issues.apache.org/jira/browse/FLINK-9103
> Project: Flink
> Issue Type: Bug
> Components: Docker, Security
>Affects Versions: 1.4.0
>Reporter: Edward Rojas
>Priority: Major
> Attachments: job.log, task0.log
>
>
> In dynamic environments like Kubernetes, the SSL certificates can be
> generated to use only the DNS addresses for validation of the identity of
> servers, given that the IP can change eventually.
>
> In this cases when executing Jobs with Parallelism set to 1, the SSL
> validations are good and the Jobmanager can communicate with Task manager and
> vice versa.
>
> But with parallelism set to more than 1, SSL validation fails when Task
> Managers communicate to each other as it seems to try to validate against IP
> address:
> Caused by: java.security.cert.CertificateException: No subject alternative
> names matching IP address 172.xx.xxx.xxx found
> at sun.security.util.HostnameChecker.matchIP(HostnameChecker.java:168)
> at sun.security.util.HostnameChecker.match(HostnameChecker.java:94)
> at
> sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:455)
>
> at
> sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:436)
>
> at
> sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:252)
>
> at
> sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136)
>
> at
> sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1601)
>
> ... 21 more
>
> From the logs, it seems the task managers register successfully its full
> address to Netty, but still the IP is used.
>
> Attached pertinent logs from JobManager and a TaskManager.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)