[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660287#comment-16660287 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK closed pull request #6838: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6838 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/_includes/generated/security_configuration.html b/docs/_includes/generated/security_configuration.html index 680c1c02434..8999336926f 100644 --- a/docs/_includes/generated/security_configuration.html +++ b/docs/_includes/generated/security_configuration.html @@ -12,11 +12,21 @@ "TLS_RSA_WITH_AES_128_CBC_SHA" The comma separated list of standard SSL algorithms to be supported. Read more http://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html#ciphersuites;>here + +security.ssl.internal.close-notify-flush-timeout +-1 +The timeout (in ms) for flushing the `close_notify` that was triggered by closing a channel. If the `close_notify` was not flushed in the given timeout the channel will be closed forcibly. (-1 = use system default) + security.ssl.internal.enabled false Turns on SSL for internal network communication. Optionally, specific components may override this through their own settings (rpc, data transport, REST, etc). + +security.ssl.internal.handshake-timeout +-1 +The timeout (in ms) during SSL handshake. (-1 = use system default) + security.ssl.internal.key-password (none) @@ -32,6 +42,16 @@ (none) The secret to decrypt the keystore file for Flink's for Flink's internal endpoints (rpc, data transport, blob server). + +security.ssl.internal.session-cache-size +-1 +The size of the cache used for storing SSL session objects. According to https://github.com/netty/netty/issues/832, you should always set this to an appropriate number to not run into a bug with stalling IO threads during garbage collection. (-1 = use system default). + + +security.ssl.internal.session-timeout +-1 +The timeout (in ms) for the cached SSL session objects. (-1 = use system default) + security.ssl.internal.truststore (none) diff --git a/docs/ops/security-ssl.md b/docs/ops/security-ssl.md index 6ea686203ee..4e3716218d2 100644 --- a/docs/ops/security-ssl.md +++ b/docs/ops/security-ssl.md @@ -22,6 +22,9 @@ specific language governing permissions and limitations under the License. --> +* ToC +{:toc} + This page provides instructions on how to enable TLS/SSL authentication and encryption for network communication with and between Flink processes. ## Internal and External Connectivity @@ -37,7 +40,7 @@ For more flexibility, security for internal and external connectivity can be ena - Internal Connectivity +### Internal Connectivity Internal connectivity includes: @@ -54,7 +57,7 @@ is not needed by any other party to interact with Flink, and can be simply added *Note: Because internal connections are mutually authenticated with shared certificates, Flink can skip hostname verification. This makes container-based setups easier.* - External / REST Connectivity +### External / REST Connectivity All external connectivity is exposed via an HTTP/REST endpoint, used for example by the web UI and the CLI: @@ -71,7 +74,7 @@ Examples for proxies that Flink users have deployed are [Envoy Proxy](https://ww The rationale behind delegating authentication to a proxy is that such proxies offer a wide variety of authentication options and thus better integration into existing infrastructures. - Queryable State +### Queryable State Connections to the queryable state endpoints is currently not authenticated or encrypted. @@ -92,13 +95,13 @@ When `security.ssl.internal.enabled` is set to `true`, you can set the following - `blob.service.ssl.enabled`: Transport of BLOBs from JobManager to TaskManager - `akka.ssl.enabled`: Akka-based RPC connections between JobManager / TaskManager / ResourceManager - Keystores and Truststores +### Keystores and Truststores The SSL configuration requires to configure a **keystore** and a **truststore**. The *keystore* contains the public certificate (public key) and the private key, while the truststore contains the trusted certificates or
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659401#comment-16659401 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK closed pull request #6895: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6895 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/_includes/generated/security_configuration.html b/docs/_includes/generated/security_configuration.html index 680c1c02434..8999336926f 100644 --- a/docs/_includes/generated/security_configuration.html +++ b/docs/_includes/generated/security_configuration.html @@ -12,11 +12,21 @@ "TLS_RSA_WITH_AES_128_CBC_SHA" The comma separated list of standard SSL algorithms to be supported. Read more http://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html#ciphersuites;>here + +security.ssl.internal.close-notify-flush-timeout +-1 +The timeout (in ms) for flushing the `close_notify` that was triggered by closing a channel. If the `close_notify` was not flushed in the given timeout the channel will be closed forcibly. (-1 = use system default) + security.ssl.internal.enabled false Turns on SSL for internal network communication. Optionally, specific components may override this through their own settings (rpc, data transport, REST, etc). + +security.ssl.internal.handshake-timeout +-1 +The timeout (in ms) during SSL handshake. (-1 = use system default) + security.ssl.internal.key-password (none) @@ -32,6 +42,16 @@ (none) The secret to decrypt the keystore file for Flink's for Flink's internal endpoints (rpc, data transport, blob server). + +security.ssl.internal.session-cache-size +-1 +The size of the cache used for storing SSL session objects. According to https://github.com/netty/netty/issues/832, you should always set this to an appropriate number to not run into a bug with stalling IO threads during garbage collection. (-1 = use system default). + + +security.ssl.internal.session-timeout +-1 +The timeout (in ms) for the cached SSL session objects. (-1 = use system default) + security.ssl.internal.truststore (none) diff --git a/docs/ops/security-ssl.md b/docs/ops/security-ssl.md index 6ea686203ee..4e3716218d2 100644 --- a/docs/ops/security-ssl.md +++ b/docs/ops/security-ssl.md @@ -22,6 +22,9 @@ specific language governing permissions and limitations under the License. --> +* ToC +{:toc} + This page provides instructions on how to enable TLS/SSL authentication and encryption for network communication with and between Flink processes. ## Internal and External Connectivity @@ -37,7 +40,7 @@ For more flexibility, security for internal and external connectivity can be ena - Internal Connectivity +### Internal Connectivity Internal connectivity includes: @@ -54,7 +57,7 @@ is not needed by any other party to interact with Flink, and can be simply added *Note: Because internal connections are mutually authenticated with shared certificates, Flink can skip hostname verification. This makes container-based setups easier.* - External / REST Connectivity +### External / REST Connectivity All external connectivity is exposed via an HTTP/REST endpoint, used for example by the web UI and the CLI: @@ -71,7 +74,7 @@ Examples for proxies that Flink users have deployed are [Envoy Proxy](https://ww The rationale behind delegating authentication to a proxy is that such proxies offer a wide variety of authentication options and thus better integration into existing infrastructures. - Queryable State +### Queryable State Connections to the queryable state endpoints is currently not authenticated or encrypted. @@ -92,13 +95,13 @@ When `security.ssl.internal.enabled` is set to `true`, you can set the following - `blob.service.ssl.enabled`: Transport of BLOBs from JobManager to TaskManager - `akka.ssl.enabled`: Akka-based RPC connections between JobManager / TaskManager / ResourceManager - Keystores and Truststores +### Keystores and Truststores The SSL configuration requires to configure a **keystore** and a **truststore**. The *keystore* contains the public certificate (public key) and the private key, while the truststore contains the trusted certificates or
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659376#comment-16659376 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on issue #6838: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6838#issuecomment-431836355 thanks @pnowojski for the update - looks like the rebase onto latest master will be the same changes I did for #6895 earlier - I'll use these, will address your comments and merge This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.4, 1.6.3, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659377#comment-16659377 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on a change in pull request #6838: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6838#discussion_r226988384 ## File path: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java ## @@ -88,6 +90,7 @@ private final HistoryServerArchiveFetcher archiveFetcher; + @Nullable Review comment: not changing that one with this PR, sorry FYI: my IntelliJ is actually giving me a warning using `Optional` as a field or a method parameter. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.4, 1.6.3, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659288#comment-16659288 ] ASF GitHub Bot commented on FLINK-9878: --- pnowojski commented on a change in pull request #6838: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6838#discussion_r226291127 ## File path: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java ## @@ -88,6 +90,7 @@ private final HistoryServerArchiveFetcher archiveFetcher; + @Nullable Review comment: `Optional`? :( This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.4, 1.6.3, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659289#comment-16659289 ] ASF GitHub Bot commented on FLINK-9878: --- pnowojski commented on a change in pull request #6838: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6838#discussion_r226293240 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/SSLHandlerFactory.java ## @@ -36,29 +38,66 @@ private final boolean clientMode; - final boolean clientAuthentication; + private final boolean clientAuthentication; + + private final int handshakeTimeoutMs; + + private final int closeNotifyFlushTimeoutMs; - public SSLEngineFactory( + /** +* Create a new SSLEngine factory. Review comment: nit: comment is outdated (`SSLEngine`). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.4, 1.6.3, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649106#comment-16649106 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK opened a new pull request #6838: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6838 ## What is the purpose of the change This is mostly to tackle bugs like https://github.com/netty/netty/issues/832 (JDK issue during garbage collection when the SSL session cache is not limited). We add the following low-level configuration options for the user to fine-tune their system, i.e. the Flink-internal communication: - SSL session cache size - SSL session timeout - SSL handshake timeout - SSL close notify flush timeout FYI: I'll also merge this into `master` if accepted. ## Brief change log - add `security.ssl.internal.session-cache-size` and `security.ssl.internal.session-timeout` configuration parameters -> configure these for `SSLContext`s created by `SSLUtil` - add `security.ssl.internal.handshake-timeout` and `security.ssl.internal.close-notify-flush-timeout` -> configure these for `SslHandler`s created by `SSLHandlerFactory` (previously `SSLEngineFactory`) - rename/refactor `SSLEngineFactory` to `SSLHandlerFactory` since no `SSLEngine` objects alone were actually needed, but only Netty's `SslHandler` (reduces code duplication which would be worse with this PR) ## Verifying this change This change added tests and can be verified as follows: - added configuration-verification test to `NettyClientServerSslTest` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **yes** (kind-of) - If yes, how is the feature documented? **docs + JavaDocs** This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0, 1.5.4, 1.6.2 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586571#comment-16586571 ] Nico Kruber commented on FLINK-9878: Fixed via: - `release-1.5`: 9e421a438dd830c6be72e5f13f855e68a82aef21 (1.6 and master will get new forward-PRs) > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586567#comment-16586567 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK closed pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/_includes/generated/security_configuration.html b/docs/_includes/generated/security_configuration.html index cd682ecaf0f..357629473cd 100644 --- a/docs/_includes/generated/security_configuration.html +++ b/docs/_includes/generated/security_configuration.html @@ -12,11 +12,21 @@ "TLS_RSA_WITH_AES_128_CBC_SHA" The comma separated list of standard SSL algorithms to be supported. Read more a href="http://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html#ciphersuites"here/a;. + +security.ssl.close-notify-flush-timeout +-1 +The timeout (in ms) for flushing the `close_notify` that was triggered by closing a channel. If the `close_notify` was not flushed in the given timeout the channel will be closed forcibly. (-1 = use system default) + security.ssl.enabled false Turns on SSL for internal network communication. This can be optionally overridden by flags defined in different transport modules. + +security.ssl.handshake-timeout +-1 +The timeout (in ms) during SSL handshake. (-1 = use system default) + security.ssl.key-password (none) @@ -37,6 +47,16 @@ "TLSv1.2" The SSL protocol version to be supported for the ssl transport. Note that it doesn’t support comma separated list. + +security.ssl.session-cache-size +-1 +The size of the cache used for storing SSL session objects. According to https://github.com/netty/netty/issues/832, you should always set this to an appropriate number to not run into a bug with stalling IO threads during garbage collection. (-1 = use system default). + + +security.ssl.session-timeout +-1 +The timeout (in ms) for the cached SSL session objects. (-1 = use system default) + security.ssl.truststore (none) diff --git a/docs/ops/security-ssl.md b/docs/ops/security-ssl.md index c2ba7df8849..a805238ae08 100644 --- a/docs/ops/security-ssl.md +++ b/docs/ops/security-ssl.md @@ -33,6 +33,10 @@ SSL can be enabled for all network communication between Flink components. SSL k * **akka.ssl.enabled**: SSL flag for akka based control connection between the Flink client, jobmanager and taskmanager * **jobmanager.web.ssl.enabled**: Flag to enable https access to the jobmanager's web frontend +### Complete List of SSL Options + +{% include generated/security_configuration.html %} + ## Deploying Keystores and Truststores You need to have a Java Keystore generated and copied to each node in the Flink cluster. The common name or subject alternative names in the certificate should match the node's hostname and IP address. Keystores and truststores can be generated using the [keytool utility](https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html). All Flink components should have read access to the keystore and truststore files. diff --git a/flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java b/flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java index 0f25c6caf95..60a97643a4e 100644 --- a/flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java +++ b/flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java @@ -160,4 +160,41 @@ key("security.ssl.verify-hostname") .defaultValue(true) .withDescription("Flag to enable peer’s hostname verification during ssl handshake."); + + /** +* SSL session cache size. +*/ + public static final ConfigOption SSL_SESSION_CACHE_SIZE = + key("security.ssl.session-cache-size") + .defaultValue(-1) + .withDescription("The size of the cache used for storing SSL session objects. " + + "According to https://github.com/netty/netty/issues/832, you should always set " + + "this to an appropriate number to not run into a bug with stalling IO threads " +
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586024#comment-16586024 ] ASF GitHub Bot commented on FLINK-9878: --- pnowojski commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r211285792 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java ## @@ -98,21 +86,48 @@ public void testValidSslConnectionAdvanced() throws Exception { Channel ch = NettyTestUtil.connect(serverAndClient); SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl"); - assertEquals(sslHandler.getHandshakeTimeoutMillis(), handshakeTimeout); - assertEquals(sslHandler.getCloseNotifyTimeoutMillis(), closeNotifyFlushTimeout); + int handshakeTimeout = sslConfig.getInteger(SSL_HANDSHAKE_TIMEOUT); Review comment: ``` assertSslConfig(sslConfig.getInteger(SSL_HANDSHAKE_TIMEOUT), sslHandler.getHandshakeTimeoutMillis()) ``` and do: ``` assertSslConfig(expected, actual) { if (expected != -1) { assertEquals(expected, actual) } else { assertTrue(...); } } ``` in 4 places here? Maybe renaming it to `assertEqualsOrMinusAsDefaultValue`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585873#comment-16585873 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on issue #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#issuecomment-414304457 I updated the code but would do the de-duplication in `master` because of the (additional) merge conflicts I'll have there This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585836#comment-16585836 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r211235957 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java ## @@ -65,6 +68,60 @@ public void testValidSslConnection() throws Exception { Channel ch = NettyTestUtil.connect(serverAndClient); + SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl"); + assertTrue("default value should not be propagated", sslHandler.getHandshakeTimeoutMillis() >= 0); + assertTrue("default value should not be propagated", sslHandler.getCloseNotifyTimeoutMillis() >= 0); + + // should be able to send text data + ch.pipeline().addLast(new StringDecoder()).addLast(new StringEncoder()); + assertTrue(ch.writeAndFlush("test").await().isSuccess()); + + NettyTestUtil.shutdown(serverAndClient); + } + + /** +* Verify valid (advanced) ssl configuration and connection. +*/ + @Test + public void testValidSslConnectionAdvanced() throws Exception { Review comment: Actually, I found a way to verify that these two properties are also set - will update the test. Do you think, a benchmark should be included in this PR or rather separately (it is not really related to these changes)? Also: only in `master`? About the `taskmanager.netty.client` prefix: that sounds like a nice idea and could probably be extended to similar use cases with component-specific configurations. If you think, it's worth pursuing, can you open a JIRA ticket for this improvement? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582277#comment-16582277 ] ASF GitHub Bot commented on FLINK-9878: --- pnowojski commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r210529443 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java ## @@ -65,6 +68,60 @@ public void testValidSslConnection() throws Exception { Channel ch = NettyTestUtil.connect(serverAndClient); + SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl"); + assertTrue("default value should not be propagated", sslHandler.getHandshakeTimeoutMillis() >= 0); + assertTrue("default value should not be propagated", sslHandler.getCloseNotifyTimeoutMillis() >= 0); + + // should be able to send text data + ch.pipeline().addLast(new StringDecoder()).addLast(new StringEncoder()); + assertTrue(ch.writeAndFlush("test").await().isSuccess()); + + NettyTestUtil.shutdown(serverAndClient); + } + + /** +* Verify valid (advanced) ssl configuration and connection. +*/ + @Test + public void testValidSslConnectionAdvanced() throws Exception { Review comment: This hasn't been addressed. Those tests differ only with expected values and passed `NettyConfig` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582274#comment-16582274 ] ASF GitHub Bot commented on FLINK-9878: --- pnowojski commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r210514431 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java ## @@ -190,7 +194,14 @@ public void initChannel(SocketChannel channel) throws Exception { sslEngine.setSSLParameters(newSSLParameters); } - channel.pipeline().addLast("ssl", new SslHandler(sslEngine)); + SslHandler sslHandler = new SslHandler(sslEngine); Review comment: `ctrl+c` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582273#comment-16582273 ] ASF GitHub Bot commented on FLINK-9878: --- pnowojski commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r210531098 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java ## @@ -65,6 +68,60 @@ public void testValidSslConnection() throws Exception { Channel ch = NettyTestUtil.connect(serverAndClient); + SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl"); + assertTrue("default value should not be propagated", sslHandler.getHandshakeTimeoutMillis() >= 0); + assertTrue("default value should not be propagated", sslHandler.getCloseNotifyTimeoutMillis() >= 0); + + // should be able to send text data + ch.pipeline().addLast(new StringDecoder()).addLast(new StringEncoder()); + assertTrue(ch.writeAndFlush("test").await().isSuccess()); + + NettyTestUtil.shutdown(serverAndClient); + } + + /** +* Verify valid (advanced) ssl configuration and connection. +*/ + @Test + public void testValidSslConnectionAdvanced() throws Exception { Review comment: Yes, you are right regarding `handshake-timeout` and `close-notify-flush-timeout`, but as I wrote previously, I do not see how `SESSION_CACHE_SIZE` and `SESSION_TIMEOUT` are tested at all. And regardless of that, it still would be better to add a stress test/benchmark for that. Depends how important this feature is... However if it's not important one could argue why even bother supporting this? On a side note. Couldn't we provide some generic way to configure netty? Like passing any config option prefixed `taskmanager.netty.client` to taskmanager's netty client, without manually specifying and handling them by us? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582272#comment-16582272 ] ASF GitHub Bot commented on FLINK-9878: --- pnowojski commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r210515882 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java ## @@ -176,39 +176,43 @@ public static void setSSLVerifyHostname(Configuration sslConfig, SSLParameters s public static SSLContext createSSLClientContext(Configuration sslConfig) throws Exception { Preconditions.checkNotNull(sslConfig); - SSLContext clientSSLContext = null; - if (getSSLEnabled(sslConfig)) { - LOG.debug("Creating client SSL context from configuration"); - - String trustStoreFilePath = sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE); - String trustStorePassword = sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE_PASSWORD); - String sslProtocolVersion = sslConfig.getString(SecurityOptions.SSL_PROTOCOL); + if (!getSSLEnabled(sslConfig)) { + return null; + } - Preconditions.checkNotNull(trustStoreFilePath, SecurityOptions.SSL_TRUSTSTORE.key() + " was not configured."); - Preconditions.checkNotNull(trustStorePassword, SecurityOptions.SSL_TRUSTSTORE_PASSWORD.key() + " was not configured."); + LOG.debug("Creating client SSL context from configuration"); - KeyStore trustStore = KeyStore.getInstance(KeyStore.getDefaultType()); + String trustStoreFilePath = sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE); + String trustStorePassword = sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE_PASSWORD); + String sslProtocolVersion = sslConfig.getString(SecurityOptions.SSL_PROTOCOL); + int sessionCacheSize = sslConfig.getInteger(SecurityOptions.SSL_SESSION_CACHE_SIZE); + int sessionTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_SESSION_TIMEOUT); + int handshakeTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_HANDSHAKE_TIMEOUT); + int closeNotifyFlushTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT); - FileInputStream trustStoreFile = null; - try { - trustStoreFile = new FileInputStream(new File(trustStoreFilePath)); - trustStore.load(trustStoreFile, trustStorePassword.toCharArray()); - } finally { - if (trustStoreFile != null) { - trustStoreFile.close(); - } - } + Preconditions.checkNotNull(trustStoreFilePath, SecurityOptions.SSL_TRUSTSTORE.key() + " was not configured."); + Preconditions.checkNotNull(trustStorePassword, SecurityOptions.SSL_TRUSTSTORE_PASSWORD.key() + " was not configured."); - TrustManagerFactory trustManagerFactory = TrustManagerFactory.getInstance( - TrustManagerFactory.getDefaultAlgorithm()); - trustManagerFactory.init(trustStore); + KeyStore trustStore = KeyStore.getInstance(KeyStore.getDefaultType()); - clientSSLContext = SSLContext.getInstance(sslProtocolVersion); - clientSSLContext.init(null, trustManagerFactory.getTrustManagers(), null); + try (FileInputStream trustStoreFile = new FileInputStream(new File(trustStoreFilePath))) { + trustStore.load(trustStoreFile, trustStorePassword.toCharArray()); } - return clientSSLContext; + TrustManagerFactory trustManagerFactory = TrustManagerFactory.getInstance( + TrustManagerFactory.getDefaultAlgorithm()); + trustManagerFactory.init(trustStore); + + javax.net.ssl.SSLContext clientSSLContext = javax.net.ssl.SSLContext.getInstance(sslProtocolVersion); Review comment: `ctrl+c` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > >
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582275#comment-16582275 ] ASF GitHub Bot commented on FLINK-9878: --- pnowojski commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r210515985 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java ## @@ -225,38 +229,65 @@ public static SSLContext createSSLClientContext(Configuration sslConfig) throws public static SSLContext createSSLServerContext(Configuration sslConfig) throws Exception { Preconditions.checkNotNull(sslConfig); - SSLContext serverSSLContext = null; - if (getSSLEnabled(sslConfig)) { - LOG.debug("Creating server SSL context from configuration"); + if (!getSSLEnabled(sslConfig)) { + return null; + } - String keystoreFilePath = sslConfig.getString(SecurityOptions.SSL_KEYSTORE); + LOG.debug("Creating server SSL context from configuration"); - String keystorePassword = sslConfig.getString(SecurityOptions.SSL_KEYSTORE_PASSWORD); + String keystoreFilePath = sslConfig.getString(SecurityOptions.SSL_KEYSTORE); + String keystorePassword = sslConfig.getString(SecurityOptions.SSL_KEYSTORE_PASSWORD); + String certPassword = sslConfig.getString(SecurityOptions.SSL_KEY_PASSWORD); + String sslProtocolVersion = sslConfig.getString(SecurityOptions.SSL_PROTOCOL); + int sessionCacheSize = sslConfig.getInteger(SecurityOptions.SSL_SESSION_CACHE_SIZE); + int sessionTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_SESSION_TIMEOUT); + int handshakeTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_HANDSHAKE_TIMEOUT); + int closeNotifyFlushTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT); - String certPassword = sslConfig.getString(SecurityOptions.SSL_KEY_PASSWORD); + Preconditions.checkNotNull(keystoreFilePath, SecurityOptions.SSL_KEYSTORE.key() + " was not configured."); + Preconditions.checkNotNull(keystorePassword, SecurityOptions.SSL_KEYSTORE_PASSWORD.key() + " was not configured."); + Preconditions.checkNotNull(certPassword, SecurityOptions.SSL_KEY_PASSWORD.key() + " was not configured."); - String sslProtocolVersion = sslConfig.getString(SecurityOptions.SSL_PROTOCOL); + KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType()); + try (FileInputStream keyStoreFile = new FileInputStream(new File(keystoreFilePath))) { + ks.load(keyStoreFile, keystorePassword.toCharArray()); + } - Preconditions.checkNotNull(keystoreFilePath, SecurityOptions.SSL_KEYSTORE.key() + " was not configured."); - Preconditions.checkNotNull(keystorePassword, SecurityOptions.SSL_KEYSTORE_PASSWORD.key() + " was not configured."); - Preconditions.checkNotNull(certPassword, SecurityOptions.SSL_KEY_PASSWORD.key() + " was not configured."); + // Set up key manager factory to use the server key store + KeyManagerFactory kmf = KeyManagerFactory.getInstance( + KeyManagerFactory.getDefaultAlgorithm()); + kmf.init(ks, certPassword.toCharArray()); - KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType()); - try (FileInputStream keyStoreFile = new FileInputStream(new File(keystoreFilePath))) { - ks.load(keyStoreFile, keystorePassword.toCharArray()); - } + // Initialize the SSLContext Review comment: `ctrl+v` as well? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582276#comment-16582276 ] ASF GitHub Bot commented on FLINK-9878: --- pnowojski commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r210523907 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java ## @@ -225,38 +229,65 @@ public static SSLContext createSSLClientContext(Configuration sslConfig) throws public static SSLContext createSSLServerContext(Configuration sslConfig) throws Exception { Preconditions.checkNotNull(sslConfig); - SSLContext serverSSLContext = null; - if (getSSLEnabled(sslConfig)) { - LOG.debug("Creating server SSL context from configuration"); + if (!getSSLEnabled(sslConfig)) { + return null; + } - String keystoreFilePath = sslConfig.getString(SecurityOptions.SSL_KEYSTORE); + LOG.debug("Creating server SSL context from configuration"); - String keystorePassword = sslConfig.getString(SecurityOptions.SSL_KEYSTORE_PASSWORD); + String keystoreFilePath = sslConfig.getString(SecurityOptions.SSL_KEYSTORE); + String keystorePassword = sslConfig.getString(SecurityOptions.SSL_KEYSTORE_PASSWORD); + String certPassword = sslConfig.getString(SecurityOptions.SSL_KEY_PASSWORD); + String sslProtocolVersion = sslConfig.getString(SecurityOptions.SSL_PROTOCOL); + int sessionCacheSize = sslConfig.getInteger(SecurityOptions.SSL_SESSION_CACHE_SIZE); + int sessionTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_SESSION_TIMEOUT); + int handshakeTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_HANDSHAKE_TIMEOUT); + int closeNotifyFlushTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT); - String certPassword = sslConfig.getString(SecurityOptions.SSL_KEY_PASSWORD); + Preconditions.checkNotNull(keystoreFilePath, SecurityOptions.SSL_KEYSTORE.key() + " was not configured."); + Preconditions.checkNotNull(keystorePassword, SecurityOptions.SSL_KEYSTORE_PASSWORD.key() + " was not configured."); + Preconditions.checkNotNull(certPassword, SecurityOptions.SSL_KEY_PASSWORD.key() + " was not configured."); - String sslProtocolVersion = sslConfig.getString(SecurityOptions.SSL_PROTOCOL); + KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType()); + try (FileInputStream keyStoreFile = new FileInputStream(new File(keystoreFilePath))) { + ks.load(keyStoreFile, keystorePassword.toCharArray()); + } - Preconditions.checkNotNull(keystoreFilePath, SecurityOptions.SSL_KEYSTORE.key() + " was not configured."); - Preconditions.checkNotNull(keystorePassword, SecurityOptions.SSL_KEYSTORE_PASSWORD.key() + " was not configured."); - Preconditions.checkNotNull(certPassword, SecurityOptions.SSL_KEY_PASSWORD.key() + " was not configured."); + // Set up key manager factory to use the server key store + KeyManagerFactory kmf = KeyManagerFactory.getInstance( + KeyManagerFactory.getDefaultAlgorithm()); + kmf.init(ks, certPassword.toCharArray()); - KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType()); - try (FileInputStream keyStoreFile = new FileInputStream(new File(keystoreFilePath))) { - ks.load(keyStoreFile, keystorePassword.toCharArray()); - } + // Initialize the SSLContext + javax.net.ssl.SSLContext serverSSLContext = javax.net.ssl.SSLContext.getInstance(sslProtocolVersion); + serverSSLContext.init(kmf.getKeyManagers(), null, null); + if (sessionCacheSize >= 0) { + serverSSLContext.getServerSessionContext().setSessionCacheSize(sessionCacheSize); + } + if (sessionTimeoutMs >= 0) { + serverSSLContext.getServerSessionContext().setSessionTimeout(sessionTimeoutMs / 1000); + } - // Set up key manager factory to use the server key store - KeyManagerFactory kmf = KeyManagerFactory.getInstance( - KeyManagerFactory.getDefaultAlgorithm()); - kmf.init(ks, certPassword.toCharArray()); + return new
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582278#comment-16582278 ] ASF GitHub Bot commented on FLINK-9878: --- pnowojski commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r210515177 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyServer.java ## @@ -152,10 +154,17 @@ void init(final NettyProtocol protocol, NettyBufferPool nettyBufferPool) throws @Override public void initChannel(SocketChannel channel) throws Exception { if (serverSSLContext != null) { - SSLEngine sslEngine = serverSSLContext.createSSLEngine(); + SSLEngine sslEngine = serverSSLContext.sslContext.createSSLEngine(); config.setSSLVerAndCipherSuites(sslEngine); sslEngine.setUseClientMode(false); - channel.pipeline().addLast("ssl", new SslHandler(sslEngine)); + SslHandler sslHandler = new SslHandler(sslEngine); Review comment: `ctrl+v` - please deduplicate this somehow and please do this in this PR, since this is the place where you introduce/make duplication worse. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578653#comment-16578653 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on issue #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#issuecomment-412597634 I pushed a rework of this PR which has a lighter footprint on the changes in SSLUtils by using a wrapper around `SSLContext` as @pnowojski suggested. I kept all existing logic though, including the `@Nullable` fields (vs. `Optional`) for these reasons: 1) there are already conflicts when applying this to `release-1.6` and I'd like to keep the footprint small (some of the suggestions already make the diff bigger) 2) there are several `null` checks which would need refactoring 3) this seems to be out of scope of this PR, especially since no nullable field is added (any more) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578646#comment-16578646 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r209690587 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java ## @@ -175,7 +183,6 @@ ChannelFuture connect(final InetSocketAddress serverSocketAddress) { bootstrap.handler(new ChannelInitializer() { @Override public void initChannel(SocketChannel channel) throws Exception { - // SSL handler should be added first in the pipeline if (clientSSLContext != null) { Review comment: if SSL is disabled, for example This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578643#comment-16578643 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r209690309 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java ## @@ -52,6 +56,9 @@ private Bootstrap bootstrap; Review comment: out of scope of this PR - there's also more around this package, if you wanted to mark/change these accordingly This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578644#comment-16578644 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r209690309 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java ## @@ -52,6 +56,9 @@ private Bootstrap bootstrap; Review comment: out of scope of this PR - there's also even more around this package, if you wanted to mark/change these accordingly This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578629#comment-16578629 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r209682805 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java ## @@ -163,80 +163,188 @@ public static void setSSLVerifyHostname(Configuration sslConfig, SSLParameters s } /** -* Creates the SSL Context for the client if SSL is configured. +* Configuration settings and key/trustmanager instances to set up an SSL client connection. +*/ + public static class SSLClientConfiguration { Review comment: good idea - that makes the change even smaller...well, at least the important parts of the change ;) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578611#comment-16578611 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r209682805 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java ## @@ -163,80 +163,188 @@ public static void setSSLVerifyHostname(Configuration sslConfig, SSLParameters s } /** -* Creates the SSL Context for the client if SSL is configured. +* Configuration settings and key/trustmanager instances to set up an SSL client connection. +*/ + public static class SSLClientConfiguration { Review comment: good idea - that makes the change even smaller This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573164#comment-16573164 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on issue #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#issuecomment-411391874 Yes, that makes sense and is marked as a follow-up task: https://issues.apache.org/jira/browse/FLINK-9879 -> it probably takes some experiments to find the right parameters and their implications. Intuitively, I would agree with the session cache and timeout... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569951#comment-16569951 ] ASF GitHub Bot commented on FLINK-9878: --- StephanEwen commented on issue #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#issuecomment-410649691 Does it make sense to set some sane default values here, if Java's defaults are a bit insane? For example: - Handshake timeout could be higher. We have seen that this helps overloaded systems. - Would it make sense to minimize session caches and timeout? We never reconnect a TPC connection trying to "fast resume" an SSL session. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.1, 1.7.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565415#comment-16565415 ] ASF GitHub Bot commented on FLINK-9878: --- NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl options URL: https://github.com/apache/flink/pull/6355#discussion_r206905353 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java ## @@ -26,13 +26,16 @@ import org.apache.flink.shaded.netty4.io.netty.channel.ChannelHandler; import org.apache.flink.shaded.netty4.io.netty.handler.codec.string.StringDecoder; import org.apache.flink.shaded.netty4.io.netty.handler.codec.string.StringEncoder; +import org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler; import org.junit.Assert; import org.junit.Test; import java.net.InetAddress; +import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNotNull; Review comment: unused - remove! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.3, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552555#comment-16552555 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204329756 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java --- @@ -65,6 +68,60 @@ public void testValidSslConnection() throws Exception { Channel ch = NettyTestUtil.connect(serverAndClient); + SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl"); + assertTrue("default value should not be propagated", sslHandler.getHandshakeTimeoutMillis() >= 0); + assertTrue("default value should not be propagated", sslHandler.getCloseNotifyTimeoutMillis() >= 0); + + // should be able to send text data + ch.pipeline().addLast(new StringDecoder()).addLast(new StringEncoder()); + assertTrue(ch.writeAndFlush("test").await().isSuccess()); + + NettyTestUtil.shutdown(serverAndClient); + } + + /** +* Verify valid (advanced) ssl configuration and connection. +*/ + @Test + public void testValidSslConnectionAdvanced() throws Exception { --- End diff -- please deduplicate code with `testValidSslConnection` > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552554#comment-16552554 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204330930 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java --- @@ -65,6 +68,60 @@ public void testValidSslConnection() throws Exception { Channel ch = NettyTestUtil.connect(serverAndClient); + SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl"); + assertTrue("default value should not be propagated", sslHandler.getHandshakeTimeoutMillis() >= 0); + assertTrue("default value should not be propagated", sslHandler.getCloseNotifyTimeoutMillis() >= 0); + + // should be able to send text data + ch.pipeline().addLast(new StringDecoder()).addLast(new StringEncoder()); + assertTrue(ch.writeAndFlush("test").await().isSuccess()); + + NettyTestUtil.shutdown(serverAndClient); + } + + /** +* Verify valid (advanced) ssl configuration and connection. +*/ + @Test + public void testValidSslConnectionAdvanced() throws Exception { --- End diff -- This is quite poor test :( With respect to `SESSION_CACHE_SIZE` and `SESSION_TIMEOUT` it tests only for "not throwing any exception". If those properties are just ignored, the test will still pass. Can we add some stress test that actually verifies the bug which this PR is trying to solve? Maybe stress test AND benchmark like `StreamNetworkThroughputBenchmarkTest#largeRemoteMode`? > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552551#comment-16552551 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204301373 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java --- @@ -52,6 +56,9 @@ private Bootstrap bootstrap; --- End diff -- `bootstrap` is nullable and not marked - change to `Optional` > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552552#comment-16552552 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204329262 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java --- @@ -249,14 +357,73 @@ public static SSLContext createSSLServerContext(Configuration sslConfig) throws // Set up key manager factory to use the server key store KeyManagerFactory kmf = KeyManagerFactory.getInstance( - KeyManagerFactory.getDefaultAlgorithm()); + KeyManagerFactory.getDefaultAlgorithm()); kmf.init(ks, certPassword.toCharArray()); + return new SSLServerConfiguration( + sslProtocolVersion, + sslCipherSuites, + kmf, + sessionCacheSize, + sessionTimeoutMs, + handshakeTimeoutMs, + closeNotifyFlushTimeoutMs); + } + + return null; + } + + /** +* Creates the SSL Context for the server assuming SSL is configured. +* +* @param sslConfig +*The application configuration +* @return The SSLContext object which can be used by the ssl transport server +* @throws Exception +* Thrown if there is any misconfiguration +*/ + @Nullable + public static SSLContext createSSLServerContext(SSLServerConfiguration sslConfig) throws Exception { + Preconditions.checkNotNull(sslConfig); + + LOG.debug("Creating server SSL context from configuration"); + SSLContext serverSSLContext = SSLContext.getInstance(sslConfig.sslProtocolVersion); + serverSSLContext.init(sslConfig.keyManagerFactory.getKeyManagers(), null, null); + if (sslConfig.sessionCacheSize >= 0) { + serverSSLContext.getServerSessionContext().setSessionCacheSize(sslConfig.sessionCacheSize); + } + if (sslConfig.sessionTimeoutMs >= 0) { + serverSSLContext.getServerSessionContext().setSessionTimeout(sslConfig.sessionTimeoutMs / 1000); + } + + return serverSSLContext; + } + + /** +* Creates the SSL Context for the server if SSL is configured. +* +* @param sslConfig +*The application configuration +* @return The SSLContext object which can be used by the ssl transport server +* Returns null if SSL is disabled +* @throws Exception +* Thrown if there is any misconfiguration +*/ + @Nullable + public static SSLContext createSSLServerContext(Configuration sslConfig) throws Exception { + + Preconditions.checkNotNull(sslConfig); + SSLContext serverSSLContext = null; + + if (getSSLEnabled(sslConfig)) { --- End diff -- ditto: reverse if branch and `Optional` > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552546#comment-16552546 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204324645 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java --- @@ -175,7 +183,6 @@ ChannelFuture connect(final InetSocketAddress serverSocketAddress) { bootstrap.handler(new ChannelInitializer() { @Override public void initChannel(SocketChannel channel) throws Exception { - // SSL handler should be added first in the pipeline if (clientSSLContext != null) { --- End diff -- `checkState(!clientSSLContext.isEmpty())`? How can it ever be null? > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552548#comment-16552548 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204298813 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyConfig.java --- @@ -189,23 +192,34 @@ public TransportType getTransportType() { } } - public SSLContext createClientSSLContext() throws Exception { + @Nullable --- End diff -- `Optional` and ditto in other places. `@Nullable` is almost worthless without enforcing compile errors. > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552557#comment-16552557 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204336114 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java --- @@ -163,80 +163,188 @@ public static void setSSLVerifyHostname(Configuration sslConfig, SSLParameters s } /** -* Creates the SSL Context for the client if SSL is configured. +* Configuration settings and key/trustmanager instances to set up an SSL client connection. +*/ + public static class SSLClientConfiguration { --- End diff -- What's the value of introducing `SSLClientConfiguration`? As far as I can tell, the only point is to provide accessors to `handshakeTimeoutMS` and `closeNotifyFlushTimeoutMs` in `NettyClient#connect`, but it complicates initialisation by introducing one more extra obligatory step. Wouldn't it be better to wrap `SSLContext` with our class that provides those accessors? It seems like this would also remove the need for separate `SSLClientConfiguration` and `SSLServerConfiguration`, since all of theirs fields except of `handshakeTimeoutMS` and `closeNotifyFlushTimeoutMs` are/should be private. > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552556#comment-16552556 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204328091 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java --- @@ -163,80 +163,188 @@ public static void setSSLVerifyHostname(Configuration sslConfig, SSLParameters s } /** -* Creates the SSL Context for the client if SSL is configured. +* Configuration settings and key/trustmanager instances to set up an SSL client connection. +*/ + public static class SSLClientConfiguration { + public final String sslProtocolVersion; + public final TrustManagerFactory trustManagerFactory; + public final int sessionCacheSize; + public final int sessionTimeoutMs; + public final int handshakeTimeoutMs; + public final int closeNotifyFlushTimeoutMs; + + public SSLClientConfiguration( + String sslProtocolVersion, + TrustManagerFactory trustManagerFactory, + int sessionCacheSize, + int sessionTimeoutMs, + int handshakeTimeoutMs, + int closeNotifyFlushTimeoutMs) { + this.sslProtocolVersion = sslProtocolVersion; + this.trustManagerFactory = trustManagerFactory; + this.sessionCacheSize = sessionCacheSize; + this.sessionTimeoutMs = sessionTimeoutMs; + this.handshakeTimeoutMs = handshakeTimeoutMs; + this.closeNotifyFlushTimeoutMs = closeNotifyFlushTimeoutMs; + } + } + + /** +* Creates necessary helper objects to use for creating an SSL Context for the client if SSL is +* configured. * * @param sslConfig *The application configuration -* @return The SSLContext object which can be used by the ssl transport client -* Returns null if SSL is disabled +* @return The SSLClientConfiguration object which can be used for creating some SSL context object; +* returns null if SSL is disabled. * @throws Exception * Thrown if there is any misconfiguration */ @Nullable - public static SSLContext createSSLClientContext(Configuration sslConfig) throws Exception { - + public static SSLClientConfiguration createSSLClientConfiguration(Configuration sslConfig) throws Exception { Preconditions.checkNotNull(sslConfig); - SSLContext clientSSLContext = null; if (getSSLEnabled(sslConfig)) { - LOG.debug("Creating client SSL context from configuration"); + LOG.debug("Creating client SSL configuration"); String trustStoreFilePath = sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE); String trustStorePassword = sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE_PASSWORD); String sslProtocolVersion = sslConfig.getString(SecurityOptions.SSL_PROTOCOL); + int sessionCacheSize = sslConfig.getInteger(SecurityOptions.SSL_SESSION_CACHE_SIZE); + int sessionTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_SESSION_TIMEOUT); + int handshakeTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_HANDSHAKE_TIMEOUT); + int closeNotifyFlushTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT); Preconditions.checkNotNull(trustStoreFilePath, SecurityOptions.SSL_TRUSTSTORE.key() + " was not configured."); Preconditions.checkNotNull(trustStorePassword, SecurityOptions.SSL_TRUSTSTORE_PASSWORD.key() + " was not configured."); KeyStore trustStore = KeyStore.getInstance(KeyStore.getDefaultType()); - FileInputStream trustStoreFile = null; - try { - trustStoreFile = new FileInputStream(new File(trustStoreFilePath)); + try (FileInputStream trustStoreFile = new FileInputStream(new File(trustStoreFilePath))) { trustStore.load(trustStoreFile, trustStorePassword.toCharArray()); - } finally { - if (trustStoreFile != null) { -
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552549#comment-16552549 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204325132 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyServer.java --- @@ -61,6 +63,9 @@ private ChannelFuture bindFuture; + @Nullable --- End diff -- Please deduplicate this code with `NettyClient`. Introduce `NettyBase`, `NettyInitializer` or sth like that > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552547#comment-16552547 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204300332 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java --- @@ -52,6 +56,9 @@ private Bootstrap bootstrap; + @Nullable --- End diff -- Same argument as somewhere else: `Optional`. You mark `clientSSLConfig` as nullable and never check it for not null. > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552553#comment-16552553 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204328596 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java --- @@ -163,80 +163,188 @@ public static void setSSLVerifyHostname(Configuration sslConfig, SSLParameters s } /** -* Creates the SSL Context for the client if SSL is configured. +* Configuration settings and key/trustmanager instances to set up an SSL client connection. +*/ + public static class SSLClientConfiguration { + public final String sslProtocolVersion; + public final TrustManagerFactory trustManagerFactory; + public final int sessionCacheSize; + public final int sessionTimeoutMs; + public final int handshakeTimeoutMs; + public final int closeNotifyFlushTimeoutMs; + + public SSLClientConfiguration( + String sslProtocolVersion, + TrustManagerFactory trustManagerFactory, + int sessionCacheSize, + int sessionTimeoutMs, + int handshakeTimeoutMs, + int closeNotifyFlushTimeoutMs) { + this.sslProtocolVersion = sslProtocolVersion; + this.trustManagerFactory = trustManagerFactory; + this.sessionCacheSize = sessionCacheSize; + this.sessionTimeoutMs = sessionTimeoutMs; + this.handshakeTimeoutMs = handshakeTimeoutMs; + this.closeNotifyFlushTimeoutMs = closeNotifyFlushTimeoutMs; + } + } + + /** +* Creates necessary helper objects to use for creating an SSL Context for the client if SSL is +* configured. * * @param sslConfig *The application configuration -* @return The SSLContext object which can be used by the ssl transport client -* Returns null if SSL is disabled +* @return The SSLClientConfiguration object which can be used for creating some SSL context object; +* returns null if SSL is disabled. * @throws Exception * Thrown if there is any misconfiguration */ @Nullable - public static SSLContext createSSLClientContext(Configuration sslConfig) throws Exception { - + public static SSLClientConfiguration createSSLClientConfiguration(Configuration sslConfig) throws Exception { Preconditions.checkNotNull(sslConfig); - SSLContext clientSSLContext = null; if (getSSLEnabled(sslConfig)) { - LOG.debug("Creating client SSL context from configuration"); + LOG.debug("Creating client SSL configuration"); String trustStoreFilePath = sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE); String trustStorePassword = sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE_PASSWORD); String sslProtocolVersion = sslConfig.getString(SecurityOptions.SSL_PROTOCOL); + int sessionCacheSize = sslConfig.getInteger(SecurityOptions.SSL_SESSION_CACHE_SIZE); + int sessionTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_SESSION_TIMEOUT); + int handshakeTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_HANDSHAKE_TIMEOUT); + int closeNotifyFlushTimeoutMs = sslConfig.getInteger(SecurityOptions.SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT); Preconditions.checkNotNull(trustStoreFilePath, SecurityOptions.SSL_TRUSTSTORE.key() + " was not configured."); Preconditions.checkNotNull(trustStorePassword, SecurityOptions.SSL_TRUSTSTORE_PASSWORD.key() + " was not configured."); KeyStore trustStore = KeyStore.getInstance(KeyStore.getDefaultType()); - FileInputStream trustStoreFile = null; - try { - trustStoreFile = new FileInputStream(new File(trustStoreFilePath)); + try (FileInputStream trustStoreFile = new FileInputStream(new File(trustStoreFilePath))) { trustStore.load(trustStoreFile, trustStorePassword.toCharArray()); - } finally { - if (trustStoreFile != null) { -
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552550#comment-16552550 ] ASF GitHub Bot commented on FLINK-9878: --- Github user pnowojski commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r204326191 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java --- @@ -163,80 +163,188 @@ public static void setSSLVerifyHostname(Configuration sslConfig, SSLParameters s } /** -* Creates the SSL Context for the client if SSL is configured. +* Configuration settings and key/trustmanager instances to set up an SSL client connection. +*/ + public static class SSLClientConfiguration { + public final String sslProtocolVersion; + public final TrustManagerFactory trustManagerFactory; + public final int sessionCacheSize; + public final int sessionTimeoutMs; + public final int handshakeTimeoutMs; + public final int closeNotifyFlushTimeoutMs; + + public SSLClientConfiguration( + String sslProtocolVersion, + TrustManagerFactory trustManagerFactory, + int sessionCacheSize, + int sessionTimeoutMs, + int handshakeTimeoutMs, + int closeNotifyFlushTimeoutMs) { + this.sslProtocolVersion = sslProtocolVersion; + this.trustManagerFactory = trustManagerFactory; + this.sessionCacheSize = sessionCacheSize; + this.sessionTimeoutMs = sessionTimeoutMs; + this.handshakeTimeoutMs = handshakeTimeoutMs; + this.closeNotifyFlushTimeoutMs = closeNotifyFlushTimeoutMs; + } + } + + /** +* Creates necessary helper objects to use for creating an SSL Context for the client if SSL is +* configured. * * @param sslConfig *The application configuration -* @return The SSLContext object which can be used by the ssl transport client -* Returns null if SSL is disabled +* @return The SSLClientConfiguration object which can be used for creating some SSL context object; +* returns null if SSL is disabled. * @throws Exception * Thrown if there is any misconfiguration */ @Nullable - public static SSLContext createSSLClientContext(Configuration sslConfig) throws Exception { - + public static SSLClientConfiguration createSSLClientConfiguration(Configuration sslConfig) throws Exception { Preconditions.checkNotNull(sslConfig); - SSLContext clientSSLContext = null; if (getSSLEnabled(sslConfig)) { --- End diff -- reverse if/else conditions and `Optional` ``` if (!getSSLEnabled(...)) { return Optional.empty(); } > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549049#comment-16549049 ] ASF GitHub Bot commented on FLINK-9878: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203657904 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java --- @@ -52,6 +55,7 @@ private Bootstrap bootstrap; + private SSLUtils.SSLClientConfiguration clientSSLConfig; --- End diff -- add `@Nullable` > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549050#comment-16549050 ] ASF GitHub Bot commented on FLINK-9878: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203658272 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyServer.java --- @@ -61,6 +62,7 @@ private ChannelFuture bindFuture; + private SSLUtils.SSLServerConfiguration serverSSLConfig; --- End diff -- add `@Nullable` > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549030#comment-16549030 ] ASF GitHub Bot commented on FLINK-9878: --- Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203652882 --- Diff: docs/ops/security-ssl.md --- @@ -33,6 +33,9 @@ SSL can be enabled for all network communication between Flink components. SSL k * **akka.ssl.enabled**: SSL flag for akka based control connection between the Flink client, jobmanager and taskmanager * **jobmanager.web.ssl.enabled**: Flag to enable https access to the jobmanager's web frontend +Please see the configuration page about the +[complete list of SSL configuration parameters]({{site.baseurl}}/ops/config.html#ssl-settings), in particular **security.ssl.session-cache-size**. --- End diff -- agreed, that would make sense > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549023#comment-16549023 ] ASF GitHub Bot commented on FLINK-9878: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203652194 --- Diff: docs/ops/security-ssl.md --- @@ -33,6 +33,9 @@ SSL can be enabled for all network communication between Flink components. SSL k * **akka.ssl.enabled**: SSL flag for akka based control connection between the Flink client, jobmanager and taskmanager * **jobmanager.web.ssl.enabled**: Flag to enable https access to the jobmanager's web frontend +Please see the configuration page about the +[complete list of SSL configuration parameters]({{site.baseurl}}/ops/config.html#ssl-settings), in particular **security.ssl.session-cache-size**. --- End diff -- just a suggestion, you could also embed the entire table directly, see `Configuration.md` on how to do it. > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548875#comment-16548875 ] ASF GitHub Bot commented on FLINK-9878: --- Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203617345 --- Diff: flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java --- @@ -160,4 +160,41 @@ key("security.ssl.verify-hostname") .defaultValue(true) .withDescription("Flag to enable peer’s hostname verification during ssl handshake."); + + /** +* SSL session cache size. +*/ + public static final ConfigOption SSL_SESSION_CACHE_SIZE = + key("security.ssl.session-cache-size") + .defaultValue(-1) + .withDescription("The size of the cache used for storing SSL session objects. " + + "According to https://github.com/netty/netty/issues/832, you should always set " + + "this to an appropriate number to not run into a bug with stalling IO threads " + + "during garbage collection. (-1 = use system default)."); + + /** +* SSL session timeout. +*/ + public static final ConfigOption SSL_SESSION_TIMEOUT = + key("security.ssl.session-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for the cached SSL session objects. (-1 = use system default)"); + + /** +* SSL session timeout during handshakes. +*/ + public static final ConfigOption SSL_HANDSHAKE_TIMEOUT = + key("security.ssl.handshake-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) during SSL handshake. (-1 = use system default)"); + + /** +* SSL session timeout after flushing the `close_notify` message. +*/ + public static final ConfigOption SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT = + key("security.ssl.close-notify-flush-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for flushing the `close_notify` that was triggered by closing a " + --- End diff -- unfortunately yes FYI: I found the difference: `The timeout (in ms) for flushing the close_notify that was triggered by closing a channel. If the close_notify was not flushed in the given timeout the channel will be closed forcibly. (-1 = use system default)` vs. `The timeout (in ms) for flushing the close_notify that was triggered by closing a channel. If the close_notify was not flushed in the given timeout the channel will be closed forcibly. (-1 = use system default)` -> seems like a double-space is made a single space at some point...fixing... > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548014#comment-16548014 ] ASF GitHub Bot commented on FLINK-9878: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203437995 --- Diff: flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java --- @@ -160,4 +160,41 @@ key("security.ssl.verify-hostname") .defaultValue(true) .withDescription("Flag to enable peer’s hostname verification during ssl handshake."); + + /** +* SSL session cache size. +*/ + public static final ConfigOption SSL_SESSION_CACHE_SIZE = + key("security.ssl.session-cache-size") + .defaultValue(-1) + .withDescription("The size of the cache used for storing SSL session objects. " + + "According to https://github.com/netty/netty/issues/832, you should always set " + + "this to an appropriate number to not run into a bug with stalling IO threads " + + "during garbage collection. (-1 = use system default)."); + + /** +* SSL session timeout. +*/ + public static final ConfigOption SSL_SESSION_TIMEOUT = + key("security.ssl.session-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for the cached SSL session objects. (-1 = use system default)"); + + /** +* SSL session timeout during handshakes. +*/ + public static final ConfigOption SSL_HANDSHAKE_TIMEOUT = + key("security.ssl.handshake-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) during SSL handshake. (-1 = use system default)"); + + /** +* SSL session timeout after flushing the `close_notify` message. +*/ + public static final ConfigOption SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT = + key("security.ssl.close-notify-flush-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for flushing the `close_notify` that was triggered by closing a " + --- End diff -- it's not showing up as a code block since that only works for markdown; the description so far was plain-text. > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547909#comment-16547909 ] ASF GitHub Bot commented on FLINK-9878: --- Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203405530 --- Diff: flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java --- @@ -160,4 +160,41 @@ key("security.ssl.verify-hostname") .defaultValue(true) .withDescription("Flag to enable peer’s hostname verification during ssl handshake."); + + /** +* SSL session cache size. +*/ + public static final ConfigOption SSL_SESSION_CACHE_SIZE = + key("security.ssl.session-cache-size") + .defaultValue(-1) + .withDescription("The size of the cache used for storing SSL session objects. " + + "According to https://github.com/netty/netty/issues/832, you should always set " + + "this to an appropriate number to not run into a bug with stalling IO threads " + + "during garbage collection. (-1 = use system default)."); + + /** +* SSL session timeout. +*/ + public static final ConfigOption SSL_SESSION_TIMEOUT = + key("security.ssl.session-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for the cached SSL session objects. (-1 = use system default)"); + + /** +* SSL session timeout during handshakes. +*/ + public static final ConfigOption SSL_HANDSHAKE_TIMEOUT = + key("security.ssl.handshake-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) during SSL handshake. (-1 = use system default)"); + + /** +* SSL session timeout after flushing the `close_notify` message. +*/ + public static final ConfigOption SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT = + key("security.ssl.close-notify-flush-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for flushing the `close_notify` that was triggered by closing a " + --- End diff -- could try - strangely though, this is working for e.g. `security.kerberos.login.contexts` although the desired effect (marking it as code) is not there...but that's a different problem. > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547659#comment-16547659 ] ASF GitHub Bot commented on FLINK-9878: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203326103 --- Diff: flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java --- @@ -160,4 +160,41 @@ key("security.ssl.verify-hostname") .defaultValue(true) .withDescription("Flag to enable peer’s hostname verification during ssl handshake."); + + /** +* SSL session cache size. +*/ + public static final ConfigOption SSL_SESSION_CACHE_SIZE = + key("security.ssl.session-cache-size") + .defaultValue(-1) + .withDescription("The size of the cache used for storing SSL session objects. " + + "According to https://github.com/netty/netty/issues/832, you should always set " + + "this to an appropriate number to not run into a bug with stalling IO threads " + + "during garbage collection. (-1 = use system default)."); + + /** +* SSL session timeout. +*/ + public static final ConfigOption SSL_SESSION_TIMEOUT = + key("security.ssl.session-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for the cached SSL session objects. (-1 = use system default)"); + + /** +* SSL session timeout during handshakes. +*/ + public static final ConfigOption SSL_HANDSHAKE_TIMEOUT = + key("security.ssl.handshake-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) during SSL handshake. (-1 = use system default)"); + + /** +* SSL session timeout after flushing the `close_notify` message. +*/ + public static final ConfigOption SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT = + key("security.ssl.close-notify-flush-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for flushing the `close_notify` that was triggered by closing a " + --- End diff -- could you try removing the ` signs? let's see if that trips up the test. > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547419#comment-16547419 ] ASF GitHub Bot commented on FLINK-9878: --- Github user NicoK commented on the issue: https://github.com/apache/flink/pull/6355 about the travis error: I tried regenerating the configuration page from the sources but it does not change at all and the "documentation outdated" remains :( > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547160#comment-16547160 ] ASF GitHub Bot commented on FLINK-9878: --- GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/6355 [FLINK-9878][network][ssl] add more low-level ssl options ## What is the purpose of the change This is mostly to tackle bugs like https://github.com/netty/netty/issues/832 (JDK issue during garbage collection when the SSL session cache is not limited). We add the following low-level configuration options for the user to fine-tune their system: - SSL session cache size - SSL session timeout - SSL handshake timeout - SSL close notify flush timeout This is the PR for the `release-1.5` branch only - I'll create a separate one for `master` due to the changes of #6326. ## Brief change log - add `security.ssl.session-cache-size` and `security.ssl.session-timeout` configuration parameters -> configure these for `SSLContext`s created by `SSLUtil` - add `security.ssl.handshake-timeout` and `security.ssl.close-notify-flush-timeout` -> configure these in the TM-communication channels via `NettyClient` and `NettyServer` - refactor `SSLUtils` so that we extract these configurations separately ## Verifying this change This change added tests and can be verified as follows: - added configuration-verification test to `NettyClientServerSslTest` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **yes** (kind-of) - If yes, how is the feature documented? **docs + JavaDocs** You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink flink-9878 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6355.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6355 commit 9a19f64130837cba40c8f9b708aa98c002ae1a63 Author: Nico Kruber Date: 2018-07-17T21:40:11Z [FLINK-9878][network][ssl] add more low-level ssl options This is mostly to tackle bugs like https://github.com/netty/netty/issues/832 (JDK issue during garbage collection when the SSL session cache is not limited). We add the following low-level configuration options for the user to fine-tune their system: - SSL session cache size - SSL session timeout - SSL handshake timeout - SSL close notify flush timeout > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)