[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660287#comment-16660287
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK closed pull request #6838: [FLINK-9878][network][ssl] add more low-level 
ssl options
URL: https://github.com/apache/flink/pull/6838
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/_includes/generated/security_configuration.html 
b/docs/_includes/generated/security_configuration.html
index 680c1c02434..8999336926f 100644
--- a/docs/_includes/generated/security_configuration.html
+++ b/docs/_includes/generated/security_configuration.html
@@ -12,11 +12,21 @@
 "TLS_RSA_WITH_AES_128_CBC_SHA"
 The comma separated list of standard SSL algorithms to be 
supported. Read more http://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html#ciphersuites;>here
 
+
+security.ssl.internal.close-notify-flush-timeout
+-1
+The timeout (in ms) for flushing the `close_notify` that was 
triggered by closing a channel. If the `close_notify` was not flushed in the 
given timeout the channel will be closed forcibly. (-1 = use system 
default)
+
 
 security.ssl.internal.enabled
 false
 Turns on SSL for internal network communication. Optionally, 
specific components may override this through their own settings (rpc, data 
transport, REST, etc).
 
+
+security.ssl.internal.handshake-timeout
+-1
+The timeout (in ms) during SSL handshake. (-1 = use system 
default)
+
 
 security.ssl.internal.key-password
 (none)
@@ -32,6 +42,16 @@
 (none)
 The secret to decrypt the keystore file for Flink's for 
Flink's internal endpoints (rpc, data transport, blob server).
 
+
+security.ssl.internal.session-cache-size
+-1
+The size of the cache used for storing SSL session objects. 
According to https://github.com/netty/netty/issues/832, you should always set 
this to an appropriate number to not run into a bug with stalling IO threads 
during garbage collection. (-1 = use system default).
+
+
+security.ssl.internal.session-timeout
+-1
+The timeout (in ms) for the cached SSL session objects. (-1 = 
use system default)
+
 
 security.ssl.internal.truststore
 (none)
diff --git a/docs/ops/security-ssl.md b/docs/ops/security-ssl.md
index 6ea686203ee..4e3716218d2 100644
--- a/docs/ops/security-ssl.md
+++ b/docs/ops/security-ssl.md
@@ -22,6 +22,9 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+* ToC
+{:toc}
+
 This page provides instructions on how to enable TLS/SSL authentication and 
encryption for network communication with and between Flink processes.
 
 ## Internal and External Connectivity
@@ -37,7 +40,7 @@ For more flexibility, security for internal and external 
connectivity can be ena
   
 
 
- Internal Connectivity
+### Internal Connectivity
 
 Internal connectivity includes:
 
@@ -54,7 +57,7 @@ is not needed by any other party to interact with Flink, and 
can be simply added
 
 *Note: Because internal connections are mutually authenticated with shared 
certificates, Flink can skip hostname verification. This makes container-based 
setups easier.*
 
- External / REST Connectivity
+### External / REST Connectivity
 
 All external connectivity is exposed via an HTTP/REST endpoint, used for 
example by the web UI and the CLI:
 
@@ -71,7 +74,7 @@ Examples for proxies that Flink users have deployed are 
[Envoy Proxy](https://ww
 The rationale behind delegating authentication to a proxy is that such proxies 
offer a wide variety of authentication options and thus better integration into 
existing infrastructures.
 
 
- Queryable State
+### Queryable State
 
 Connections to the queryable state endpoints is currently not authenticated or 
encrypted.
 
@@ -92,13 +95,13 @@ When `security.ssl.internal.enabled` is set to `true`, you 
can set the following
   - `blob.service.ssl.enabled`: Transport of BLOBs from JobManager to 
TaskManager
   - `akka.ssl.enabled`: Akka-based RPC connections between JobManager / 
TaskManager / ResourceManager
 
- Keystores and Truststores
+### Keystores and Truststores
 
 The SSL configuration requires to configure a **keystore** and a 
**truststore**. The *keystore* contains the public certificate
 (public key) and the private key, while the truststore contains the trusted 
certificates or 

[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-10-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659401#comment-16659401
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK closed pull request #6895: [FLINK-9878][network][ssl] add more low-level 
ssl options
URL: https://github.com/apache/flink/pull/6895
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/_includes/generated/security_configuration.html 
b/docs/_includes/generated/security_configuration.html
index 680c1c02434..8999336926f 100644
--- a/docs/_includes/generated/security_configuration.html
+++ b/docs/_includes/generated/security_configuration.html
@@ -12,11 +12,21 @@
 "TLS_RSA_WITH_AES_128_CBC_SHA"
 The comma separated list of standard SSL algorithms to be 
supported. Read more http://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html#ciphersuites;>here
 
+
+security.ssl.internal.close-notify-flush-timeout
+-1
+The timeout (in ms) for flushing the `close_notify` that was 
triggered by closing a channel. If the `close_notify` was not flushed in the 
given timeout the channel will be closed forcibly. (-1 = use system 
default)
+
 
 security.ssl.internal.enabled
 false
 Turns on SSL for internal network communication. Optionally, 
specific components may override this through their own settings (rpc, data 
transport, REST, etc).
 
+
+security.ssl.internal.handshake-timeout
+-1
+The timeout (in ms) during SSL handshake. (-1 = use system 
default)
+
 
 security.ssl.internal.key-password
 (none)
@@ -32,6 +42,16 @@
 (none)
 The secret to decrypt the keystore file for Flink's for 
Flink's internal endpoints (rpc, data transport, blob server).
 
+
+security.ssl.internal.session-cache-size
+-1
+The size of the cache used for storing SSL session objects. 
According to https://github.com/netty/netty/issues/832, you should always set 
this to an appropriate number to not run into a bug with stalling IO threads 
during garbage collection. (-1 = use system default).
+
+
+security.ssl.internal.session-timeout
+-1
+The timeout (in ms) for the cached SSL session objects. (-1 = 
use system default)
+
 
 security.ssl.internal.truststore
 (none)
diff --git a/docs/ops/security-ssl.md b/docs/ops/security-ssl.md
index 6ea686203ee..4e3716218d2 100644
--- a/docs/ops/security-ssl.md
+++ b/docs/ops/security-ssl.md
@@ -22,6 +22,9 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+* ToC
+{:toc}
+
 This page provides instructions on how to enable TLS/SSL authentication and 
encryption for network communication with and between Flink processes.
 
 ## Internal and External Connectivity
@@ -37,7 +40,7 @@ For more flexibility, security for internal and external 
connectivity can be ena
   
 
 
- Internal Connectivity
+### Internal Connectivity
 
 Internal connectivity includes:
 
@@ -54,7 +57,7 @@ is not needed by any other party to interact with Flink, and 
can be simply added
 
 *Note: Because internal connections are mutually authenticated with shared 
certificates, Flink can skip hostname verification. This makes container-based 
setups easier.*
 
- External / REST Connectivity
+### External / REST Connectivity
 
 All external connectivity is exposed via an HTTP/REST endpoint, used for 
example by the web UI and the CLI:
 
@@ -71,7 +74,7 @@ Examples for proxies that Flink users have deployed are 
[Envoy Proxy](https://ww
 The rationale behind delegating authentication to a proxy is that such proxies 
offer a wide variety of authentication options and thus better integration into 
existing infrastructures.
 
 
- Queryable State
+### Queryable State
 
 Connections to the queryable state endpoints is currently not authenticated or 
encrypted.
 
@@ -92,13 +95,13 @@ When `security.ssl.internal.enabled` is set to `true`, you 
can set the following
   - `blob.service.ssl.enabled`: Transport of BLOBs from JobManager to 
TaskManager
   - `akka.ssl.enabled`: Akka-based RPC connections between JobManager / 
TaskManager / ResourceManager
 
- Keystores and Truststores
+### Keystores and Truststores
 
 The SSL configuration requires to configure a **keystore** and a 
**truststore**. The *keystore* contains the public certificate
 (public key) and the private key, while the truststore contains the trusted 
certificates or 

[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-10-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659376#comment-16659376
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on issue #6838: [FLINK-9878][network][ssl] add more low-level 
ssl options
URL: https://github.com/apache/flink/pull/6838#issuecomment-431836355
 
 
   thanks @pnowojski for the update - looks like the rebase onto latest master 
will be the same changes I did for #6895 earlier - I'll use these, will address 
your comments and merge


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.4, 1.6.3, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-10-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659377#comment-16659377
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on a change in pull request #6838: [FLINK-9878][network][ssl] 
add more low-level ssl options
URL: https://github.com/apache/flink/pull/6838#discussion_r226988384
 
 

 ##
 File path: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java
 ##
 @@ -88,6 +90,7 @@
 
private final HistoryServerArchiveFetcher archiveFetcher;
 
+   @Nullable
 
 Review comment:
   not changing that one with this PR, sorry
   
   FYI: my IntelliJ is actually giving me a warning using `Optional` as a field 
or a method parameter.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.4, 1.6.3, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-10-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659288#comment-16659288
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

pnowojski commented on a change in pull request #6838: 
[FLINK-9878][network][ssl] add more low-level ssl options
URL: https://github.com/apache/flink/pull/6838#discussion_r226291127
 
 

 ##
 File path: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java
 ##
 @@ -88,6 +90,7 @@
 
private final HistoryServerArchiveFetcher archiveFetcher;
 
+   @Nullable
 
 Review comment:
   `Optional`? :(


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.4, 1.6.3, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-10-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659289#comment-16659289
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

pnowojski commented on a change in pull request #6838: 
[FLINK-9878][network][ssl] add more low-level ssl options
URL: https://github.com/apache/flink/pull/6838#discussion_r226293240
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/SSLHandlerFactory.java
 ##
 @@ -36,29 +38,66 @@
 
private final boolean clientMode;
 
-   final boolean clientAuthentication;
+   private final boolean clientAuthentication;
+
+   private final int handshakeTimeoutMs;
+
+   private final int closeNotifyFlushTimeoutMs;
 
-   public SSLEngineFactory(
+   /**
+* Create a new SSLEngine factory.
 
 Review comment:
   nit: comment is outdated (`SSLEngine`).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.4, 1.6.3, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-10-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649106#comment-16649106
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK opened a new pull request #6838: [FLINK-9878][network][ssl] add more 
low-level ssl options
URL: https://github.com/apache/flink/pull/6838
 
 
   ## What is the purpose of the change
   
   This is mostly to tackle bugs like https://github.com/netty/netty/issues/832
   (JDK issue during garbage collection when the SSL session cache is not 
limited).
   We add the following low-level configuration options for the user to 
fine-tune
   their system, i.e. the Flink-internal communication:
   
   - SSL session cache size
   - SSL session timeout
   - SSL handshake timeout
   - SSL close notify flush timeout
   
   FYI: I'll also merge this into `master` if accepted.
   
   ## Brief change log
   
   - add `security.ssl.internal.session-cache-size` and 
`security.ssl.internal.session-timeout` configuration parameters
   -> configure these for `SSLContext`s created by `SSLUtil`
   - add `security.ssl.internal.handshake-timeout` and 
`security.ssl.internal.close-notify-flush-timeout`
   -> configure these for `SslHandler`s created by `SSLHandlerFactory` 
(previously `SSLEngineFactory`)
   - rename/refactor `SSLEngineFactory` to `SSLHandlerFactory` since no 
`SSLEngine` objects alone were actually needed, but only Netty's `SslHandler` 
(reduces code duplication which would be worse with this PR)
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
   - added configuration-verification test to `NettyClientServerSslTest`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): **no**
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
 - The serializers: **no**
 - The runtime per-record code paths (performance sensitive): **no**
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
 - The S3 file system connector: **no**
   
   ## Documentation
   
 - Does this pull request introduce a new feature? **yes** (kind-of)
 - If yes, how is the feature documented? **docs + JavaDocs**
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0, 1.5.4, 1.6.2
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-20 Thread Nico Kruber (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586571#comment-16586571
 ] 

Nico Kruber commented on FLINK-9878:


Fixed via:
- `release-1.5`: 9e421a438dd830c6be72e5f13f855e68a82aef21

(1.6 and master will get new forward-PRs)

> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0, 1.5.4
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586567#comment-16586567
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK closed pull request #6355: [FLINK-9878][network][ssl] add more low-level 
ssl options
URL: https://github.com/apache/flink/pull/6355
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/_includes/generated/security_configuration.html 
b/docs/_includes/generated/security_configuration.html
index cd682ecaf0f..357629473cd 100644
--- a/docs/_includes/generated/security_configuration.html
+++ b/docs/_includes/generated/security_configuration.html
@@ -12,11 +12,21 @@
 "TLS_RSA_WITH_AES_128_CBC_SHA"
 The comma separated list of standard SSL algorithms to be 
supported. Read more a 
href="http://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html#ciphersuites"here/a;.
 
+
+security.ssl.close-notify-flush-timeout
+-1
+The timeout (in ms) for flushing the `close_notify` that was 
triggered by closing a channel. If the `close_notify` was not flushed in the 
given timeout the channel will be closed forcibly. (-1 = use system 
default)
+
 
 security.ssl.enabled
 false
 Turns on SSL for internal network communication. This can be 
optionally overridden by flags defined in different transport modules.
 
+
+security.ssl.handshake-timeout
+-1
+The timeout (in ms) during SSL handshake. (-1 = use system 
default)
+
 
 security.ssl.key-password
 (none)
@@ -37,6 +47,16 @@
 "TLSv1.2"
 The SSL protocol version to be supported for the ssl 
transport. Note that it doesn’t support comma separated list.
 
+
+security.ssl.session-cache-size
+-1
+The size of the cache used for storing SSL session objects. 
According to https://github.com/netty/netty/issues/832, you should always set 
this to an appropriate number to not run into a bug with stalling IO threads 
during garbage collection. (-1 = use system default).
+
+
+security.ssl.session-timeout
+-1
+The timeout (in ms) for the cached SSL session objects. (-1 = 
use system default)
+
 
 security.ssl.truststore
 (none)
diff --git a/docs/ops/security-ssl.md b/docs/ops/security-ssl.md
index c2ba7df8849..a805238ae08 100644
--- a/docs/ops/security-ssl.md
+++ b/docs/ops/security-ssl.md
@@ -33,6 +33,10 @@ SSL can be enabled for all network communication between 
Flink components. SSL k
 * **akka.ssl.enabled**: SSL flag for akka based control connection between the 
Flink client, jobmanager and taskmanager 
 * **jobmanager.web.ssl.enabled**: Flag to enable https access to the 
jobmanager's web frontend
 
+### Complete List of SSL Options
+
+{% include generated/security_configuration.html %}
+
 ## Deploying Keystores and Truststores
 
 You need to have a Java Keystore generated and copied to each node in the 
Flink cluster. The common name or subject alternative names in the certificate 
should match the node's hostname and IP address. Keystores and truststores can 
be generated using the [keytool 
utility](https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html).
 All Flink components should have read access to the keystore and truststore 
files.
diff --git 
a/flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java 
b/flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java
index 0f25c6caf95..60a97643a4e 100644
--- 
a/flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java
+++ 
b/flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java
@@ -160,4 +160,41 @@
key("security.ssl.verify-hostname")
.defaultValue(true)
.withDescription("Flag to enable peer’s hostname 
verification during ssl handshake.");
+
+   /**
+* SSL session cache size.
+*/
+   public static final ConfigOption SSL_SESSION_CACHE_SIZE =
+   key("security.ssl.session-cache-size")
+   .defaultValue(-1)
+   .withDescription("The size of the cache used for 
storing SSL session objects. "
+   + "According to 
https://github.com/netty/netty/issues/832, you should always set "
+   + "this to an appropriate number to not run 
into a bug with stalling IO threads "
+  

[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586024#comment-16586024
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

pnowojski commented on a change in pull request #6355: 
[FLINK-9878][network][ssl] add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r211285792
 
 

 ##
 File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java
 ##
 @@ -98,21 +86,48 @@ public void testValidSslConnectionAdvanced() throws 
Exception {
Channel ch = NettyTestUtil.connect(serverAndClient);
 
SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl");
-   assertEquals(sslHandler.getHandshakeTimeoutMillis(), 
handshakeTimeout);
-   assertEquals(sslHandler.getCloseNotifyTimeoutMillis(), 
closeNotifyFlushTimeout);
+   int handshakeTimeout = 
sslConfig.getInteger(SSL_HANDSHAKE_TIMEOUT);
 
 Review comment:
   ```
   assertSslConfig(sslConfig.getInteger(SSL_HANDSHAKE_TIMEOUT), 
sslHandler.getHandshakeTimeoutMillis())
   ```
   and do:
   ```
   assertSslConfig(expected, actual) {
 if (expected != -1) {
   assertEquals(expected, actual)
 }
 else {
   assertTrue(...);
 }
   }
   ```
   in 4 places here? Maybe renaming it to `assertEqualsOrMinusAsDefaultValue`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0, 1.5.4
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585873#comment-16585873
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on issue #6355: [FLINK-9878][network][ssl] add more low-level 
ssl options
URL: https://github.com/apache/flink/pull/6355#issuecomment-414304457
 
 
   I updated the code but would do the de-duplication in `master` because of 
the (additional) merge conflicts I'll have there


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0, 1.5.4
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585836#comment-16585836
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] 
add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r211235957
 
 

 ##
 File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java
 ##
 @@ -65,6 +68,60 @@ public void testValidSslConnection() throws Exception {
 
Channel ch = NettyTestUtil.connect(serverAndClient);
 
+   SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl");
+   assertTrue("default value should not be propagated", 
sslHandler.getHandshakeTimeoutMillis() >= 0);
+   assertTrue("default value should not be propagated", 
sslHandler.getCloseNotifyTimeoutMillis() >= 0);
+
+   // should be able to send text data
+   ch.pipeline().addLast(new StringDecoder()).addLast(new 
StringEncoder());
+   assertTrue(ch.writeAndFlush("test").await().isSuccess());
+
+   NettyTestUtil.shutdown(serverAndClient);
+   }
+
+   /**
+* Verify valid (advanced) ssl configuration and connection.
+*/
+   @Test
+   public void testValidSslConnectionAdvanced() throws Exception {
 
 Review comment:
   Actually, I found a way to verify that these two properties are also set - 
will update the test.
   
   Do you think, a benchmark should be included in this PR or rather separately 
(it is not really related to these changes)? Also: only in `master`?
   
   About the `taskmanager.netty.client` prefix: that sounds like a nice idea 
and could probably be extended to similar use cases with component-specific 
configurations. If you think, it's worth pursuing, can you open a JIRA ticket 
for this improvement?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0, 1.5.4
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582277#comment-16582277
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

pnowojski commented on a change in pull request #6355: 
[FLINK-9878][network][ssl] add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r210529443
 
 

 ##
 File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java
 ##
 @@ -65,6 +68,60 @@ public void testValidSslConnection() throws Exception {
 
Channel ch = NettyTestUtil.connect(serverAndClient);
 
+   SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl");
+   assertTrue("default value should not be propagated", 
sslHandler.getHandshakeTimeoutMillis() >= 0);
+   assertTrue("default value should not be propagated", 
sslHandler.getCloseNotifyTimeoutMillis() >= 0);
+
+   // should be able to send text data
+   ch.pipeline().addLast(new StringDecoder()).addLast(new 
StringEncoder());
+   assertTrue(ch.writeAndFlush("test").await().isSuccess());
+
+   NettyTestUtil.shutdown(serverAndClient);
+   }
+
+   /**
+* Verify valid (advanced) ssl configuration and connection.
+*/
+   @Test
+   public void testValidSslConnectionAdvanced() throws Exception {
 
 Review comment:
   This hasn't been addressed. Those tests differ only with expected values and 
passed `NettyConfig`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582274#comment-16582274
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

pnowojski commented on a change in pull request #6355: 
[FLINK-9878][network][ssl] add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r210514431
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java
 ##
 @@ -190,7 +194,14 @@ public void initChannel(SocketChannel channel) throws 
Exception {

sslEngine.setSSLParameters(newSSLParameters);
}
 
-   channel.pipeline().addLast("ssl", new 
SslHandler(sslEngine));
+   SslHandler sslHandler = new 
SslHandler(sslEngine);
 
 Review comment:
   `ctrl+c`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582273#comment-16582273
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

pnowojski commented on a change in pull request #6355: 
[FLINK-9878][network][ssl] add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r210531098
 
 

 ##
 File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java
 ##
 @@ -65,6 +68,60 @@ public void testValidSslConnection() throws Exception {
 
Channel ch = NettyTestUtil.connect(serverAndClient);
 
+   SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl");
+   assertTrue("default value should not be propagated", 
sslHandler.getHandshakeTimeoutMillis() >= 0);
+   assertTrue("default value should not be propagated", 
sslHandler.getCloseNotifyTimeoutMillis() >= 0);
+
+   // should be able to send text data
+   ch.pipeline().addLast(new StringDecoder()).addLast(new 
StringEncoder());
+   assertTrue(ch.writeAndFlush("test").await().isSuccess());
+
+   NettyTestUtil.shutdown(serverAndClient);
+   }
+
+   /**
+* Verify valid (advanced) ssl configuration and connection.
+*/
+   @Test
+   public void testValidSslConnectionAdvanced() throws Exception {
 
 Review comment:
   Yes, you are right regarding `handshake-timeout` and 
`close-notify-flush-timeout`, but as I wrote previously, I do not see how 
`SESSION_CACHE_SIZE` and `SESSION_TIMEOUT` are tested at all. And regardless of 
that, it still would be better to add a stress test/benchmark for that. Depends 
how important this feature is... However if it's not important one could argue 
why even bother supporting this? 
   
   On a side note. Couldn't we provide some generic way to configure netty? 
Like passing any config option prefixed `taskmanager.netty.client` to 
taskmanager's netty client, without manually specifying and handling them by us?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582272#comment-16582272
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

pnowojski commented on a change in pull request #6355: 
[FLINK-9878][network][ssl] add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r210515882
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java
 ##
 @@ -176,39 +176,43 @@ public static void setSSLVerifyHostname(Configuration 
sslConfig, SSLParameters s
public static SSLContext createSSLClientContext(Configuration 
sslConfig) throws Exception {
 
Preconditions.checkNotNull(sslConfig);
-   SSLContext clientSSLContext = null;
 
-   if (getSSLEnabled(sslConfig)) {
-   LOG.debug("Creating client SSL context from 
configuration");
-
-   String trustStoreFilePath = 
sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE);
-   String trustStorePassword = 
sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE_PASSWORD);
-   String sslProtocolVersion = 
sslConfig.getString(SecurityOptions.SSL_PROTOCOL);
+   if (!getSSLEnabled(sslConfig)) {
+   return null;
+   }
 
-   Preconditions.checkNotNull(trustStoreFilePath, 
SecurityOptions.SSL_TRUSTSTORE.key() + " was not configured.");
-   Preconditions.checkNotNull(trustStorePassword, 
SecurityOptions.SSL_TRUSTSTORE_PASSWORD.key() + " was not configured.");
+   LOG.debug("Creating client SSL context from configuration");
 
-   KeyStore trustStore = 
KeyStore.getInstance(KeyStore.getDefaultType());
+   String trustStoreFilePath = 
sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE);
+   String trustStorePassword = 
sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE_PASSWORD);
+   String sslProtocolVersion = 
sslConfig.getString(SecurityOptions.SSL_PROTOCOL);
+   int sessionCacheSize = 
sslConfig.getInteger(SecurityOptions.SSL_SESSION_CACHE_SIZE);
+   int sessionTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_SESSION_TIMEOUT);
+   int handshakeTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_HANDSHAKE_TIMEOUT);
+   int closeNotifyFlushTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT);
 
-   FileInputStream trustStoreFile = null;
-   try {
-   trustStoreFile = new FileInputStream(new 
File(trustStoreFilePath));
-   trustStore.load(trustStoreFile, 
trustStorePassword.toCharArray());
-   } finally {
-   if (trustStoreFile != null) {
-   trustStoreFile.close();
-   }
-   }
+   Preconditions.checkNotNull(trustStoreFilePath, 
SecurityOptions.SSL_TRUSTSTORE.key() + " was not configured.");
+   Preconditions.checkNotNull(trustStorePassword, 
SecurityOptions.SSL_TRUSTSTORE_PASSWORD.key() + " was not configured.");
 
-   TrustManagerFactory trustManagerFactory = 
TrustManagerFactory.getInstance(
-   TrustManagerFactory.getDefaultAlgorithm());
-   trustManagerFactory.init(trustStore);
+   KeyStore trustStore = 
KeyStore.getInstance(KeyStore.getDefaultType());
 
-   clientSSLContext = 
SSLContext.getInstance(sslProtocolVersion);
-   clientSSLContext.init(null, 
trustManagerFactory.getTrustManagers(), null);
+   try (FileInputStream trustStoreFile = new FileInputStream(new 
File(trustStoreFilePath))) {
+   trustStore.load(trustStoreFile, 
trustStorePassword.toCharArray());
}
 
-   return clientSSLContext;
+   TrustManagerFactory trustManagerFactory = 
TrustManagerFactory.getInstance(
+   TrustManagerFactory.getDefaultAlgorithm());
+   trustManagerFactory.init(trustStore);
+
+   javax.net.ssl.SSLContext clientSSLContext = 
javax.net.ssl.SSLContext.getInstance(sslProtocolVersion);
 
 Review comment:
   `ctrl+c`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
>

[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582275#comment-16582275
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

pnowojski commented on a change in pull request #6355: 
[FLINK-9878][network][ssl] add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r210515985
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java
 ##
 @@ -225,38 +229,65 @@ public static SSLContext 
createSSLClientContext(Configuration sslConfig) throws
public static SSLContext createSSLServerContext(Configuration 
sslConfig) throws Exception {
 
Preconditions.checkNotNull(sslConfig);
-   SSLContext serverSSLContext = null;
 
-   if (getSSLEnabled(sslConfig)) {
-   LOG.debug("Creating server SSL context from 
configuration");
+   if (!getSSLEnabled(sslConfig)) {
+   return null;
+   }
 
-   String keystoreFilePath = 
sslConfig.getString(SecurityOptions.SSL_KEYSTORE);
+   LOG.debug("Creating server SSL context from configuration");
 
-   String keystorePassword = 
sslConfig.getString(SecurityOptions.SSL_KEYSTORE_PASSWORD);
+   String keystoreFilePath = 
sslConfig.getString(SecurityOptions.SSL_KEYSTORE);
+   String keystorePassword = 
sslConfig.getString(SecurityOptions.SSL_KEYSTORE_PASSWORD);
+   String certPassword = 
sslConfig.getString(SecurityOptions.SSL_KEY_PASSWORD);
+   String sslProtocolVersion = 
sslConfig.getString(SecurityOptions.SSL_PROTOCOL);
+   int sessionCacheSize = 
sslConfig.getInteger(SecurityOptions.SSL_SESSION_CACHE_SIZE);
+   int sessionTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_SESSION_TIMEOUT);
+   int handshakeTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_HANDSHAKE_TIMEOUT);
+   int closeNotifyFlushTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT);
 
-   String certPassword = 
sslConfig.getString(SecurityOptions.SSL_KEY_PASSWORD);
+   Preconditions.checkNotNull(keystoreFilePath, 
SecurityOptions.SSL_KEYSTORE.key() + " was not configured.");
+   Preconditions.checkNotNull(keystorePassword, 
SecurityOptions.SSL_KEYSTORE_PASSWORD.key() + " was not configured.");
+   Preconditions.checkNotNull(certPassword, 
SecurityOptions.SSL_KEY_PASSWORD.key() + " was not configured.");
 
-   String sslProtocolVersion = 
sslConfig.getString(SecurityOptions.SSL_PROTOCOL);
+   KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType());
+   try (FileInputStream keyStoreFile = new FileInputStream(new 
File(keystoreFilePath))) {
+   ks.load(keyStoreFile, keystorePassword.toCharArray());
+   }
 
-   Preconditions.checkNotNull(keystoreFilePath, 
SecurityOptions.SSL_KEYSTORE.key() + " was not configured.");
-   Preconditions.checkNotNull(keystorePassword, 
SecurityOptions.SSL_KEYSTORE_PASSWORD.key() + " was not configured.");
-   Preconditions.checkNotNull(certPassword, 
SecurityOptions.SSL_KEY_PASSWORD.key() + " was not configured.");
+   // Set up key manager factory to use the server key store
+   KeyManagerFactory kmf = KeyManagerFactory.getInstance(
+   KeyManagerFactory.getDefaultAlgorithm());
+   kmf.init(ks, certPassword.toCharArray());
 
-   KeyStore ks = 
KeyStore.getInstance(KeyStore.getDefaultType());
-   try (FileInputStream keyStoreFile = new 
FileInputStream(new File(keystoreFilePath))) {
-   ks.load(keyStoreFile, 
keystorePassword.toCharArray());
-   }
+   // Initialize the SSLContext
 
 Review comment:
   `ctrl+v` as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix 

[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582276#comment-16582276
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

pnowojski commented on a change in pull request #6355: 
[FLINK-9878][network][ssl] add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r210523907
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java
 ##
 @@ -225,38 +229,65 @@ public static SSLContext 
createSSLClientContext(Configuration sslConfig) throws
public static SSLContext createSSLServerContext(Configuration 
sslConfig) throws Exception {
 
Preconditions.checkNotNull(sslConfig);
-   SSLContext serverSSLContext = null;
 
-   if (getSSLEnabled(sslConfig)) {
-   LOG.debug("Creating server SSL context from 
configuration");
+   if (!getSSLEnabled(sslConfig)) {
+   return null;
+   }
 
-   String keystoreFilePath = 
sslConfig.getString(SecurityOptions.SSL_KEYSTORE);
+   LOG.debug("Creating server SSL context from configuration");
 
-   String keystorePassword = 
sslConfig.getString(SecurityOptions.SSL_KEYSTORE_PASSWORD);
+   String keystoreFilePath = 
sslConfig.getString(SecurityOptions.SSL_KEYSTORE);
+   String keystorePassword = 
sslConfig.getString(SecurityOptions.SSL_KEYSTORE_PASSWORD);
+   String certPassword = 
sslConfig.getString(SecurityOptions.SSL_KEY_PASSWORD);
+   String sslProtocolVersion = 
sslConfig.getString(SecurityOptions.SSL_PROTOCOL);
+   int sessionCacheSize = 
sslConfig.getInteger(SecurityOptions.SSL_SESSION_CACHE_SIZE);
+   int sessionTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_SESSION_TIMEOUT);
+   int handshakeTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_HANDSHAKE_TIMEOUT);
+   int closeNotifyFlushTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT);
 
-   String certPassword = 
sslConfig.getString(SecurityOptions.SSL_KEY_PASSWORD);
+   Preconditions.checkNotNull(keystoreFilePath, 
SecurityOptions.SSL_KEYSTORE.key() + " was not configured.");
+   Preconditions.checkNotNull(keystorePassword, 
SecurityOptions.SSL_KEYSTORE_PASSWORD.key() + " was not configured.");
+   Preconditions.checkNotNull(certPassword, 
SecurityOptions.SSL_KEY_PASSWORD.key() + " was not configured.");
 
-   String sslProtocolVersion = 
sslConfig.getString(SecurityOptions.SSL_PROTOCOL);
+   KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType());
+   try (FileInputStream keyStoreFile = new FileInputStream(new 
File(keystoreFilePath))) {
+   ks.load(keyStoreFile, keystorePassword.toCharArray());
+   }
 
-   Preconditions.checkNotNull(keystoreFilePath, 
SecurityOptions.SSL_KEYSTORE.key() + " was not configured.");
-   Preconditions.checkNotNull(keystorePassword, 
SecurityOptions.SSL_KEYSTORE_PASSWORD.key() + " was not configured.");
-   Preconditions.checkNotNull(certPassword, 
SecurityOptions.SSL_KEY_PASSWORD.key() + " was not configured.");
+   // Set up key manager factory to use the server key store
+   KeyManagerFactory kmf = KeyManagerFactory.getInstance(
+   KeyManagerFactory.getDefaultAlgorithm());
+   kmf.init(ks, certPassword.toCharArray());
 
-   KeyStore ks = 
KeyStore.getInstance(KeyStore.getDefaultType());
-   try (FileInputStream keyStoreFile = new 
FileInputStream(new File(keystoreFilePath))) {
-   ks.load(keyStoreFile, 
keystorePassword.toCharArray());
-   }
+   // Initialize the SSLContext
+   javax.net.ssl.SSLContext serverSSLContext = 
javax.net.ssl.SSLContext.getInstance(sslProtocolVersion);
+   serverSSLContext.init(kmf.getKeyManagers(), null, null);
+   if (sessionCacheSize >= 0) {
+   
serverSSLContext.getServerSessionContext().setSessionCacheSize(sessionCacheSize);
+   }
+   if (sessionTimeoutMs >= 0) {
+   
serverSSLContext.getServerSessionContext().setSessionTimeout(sessionTimeoutMs / 
1000);
+   }
 
-   // Set up key manager factory to use the server key 
store
-   KeyManagerFactory kmf = KeyManagerFactory.getInstance(
-   
KeyManagerFactory.getDefaultAlgorithm());
-   kmf.init(ks, certPassword.toCharArray());
+   return new 

[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582278#comment-16582278
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

pnowojski commented on a change in pull request #6355: 
[FLINK-9878][network][ssl] add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r210515177
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyServer.java
 ##
 @@ -152,10 +154,17 @@ void init(final NettyProtocol protocol, NettyBufferPool 
nettyBufferPool) throws
@Override
public void initChannel(SocketChannel channel) throws 
Exception {
if (serverSSLContext != null) {
-   SSLEngine sslEngine = 
serverSSLContext.createSSLEngine();
+   SSLEngine sslEngine = 
serverSSLContext.sslContext.createSSLEngine();

config.setSSLVerAndCipherSuites(sslEngine);
sslEngine.setUseClientMode(false);
-   channel.pipeline().addLast("ssl", new 
SslHandler(sslEngine));
+   SslHandler sslHandler = new 
SslHandler(sslEngine);
 
 Review comment:
   `ctrl+v` - please deduplicate this somehow and please do this in this PR, 
since this is the place where you introduce/make duplication worse.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578653#comment-16578653
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on issue #6355: [FLINK-9878][network][ssl] add more low-level 
ssl options
URL: https://github.com/apache/flink/pull/6355#issuecomment-412597634
 
 
   I pushed a rework of this PR which has a lighter footprint on the changes in 
SSLUtils by using a wrapper around `SSLContext` as @pnowojski suggested.
   
   I kept all existing logic though, including the `@Nullable` fields (vs. 
`Optional`) for these reasons:
   1) there are already conflicts when applying this to `release-1.6` and I'd 
like to keep the footprint small (some of the suggestions already make the diff 
bigger)
   2) there are several `null` checks which would need refactoring
   3) this seems to be out of scope of this PR, especially since no nullable 
field is added (any more)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578646#comment-16578646
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] 
add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r209690587
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java
 ##
 @@ -175,7 +183,6 @@ ChannelFuture connect(final InetSocketAddress 
serverSocketAddress) {
bootstrap.handler(new ChannelInitializer() {
@Override
public void initChannel(SocketChannel channel) throws 
Exception {
-
// SSL handler should be added first in the 
pipeline
if (clientSSLContext != null) {
 
 Review comment:
   if SSL is disabled, for example


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578643#comment-16578643
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] 
add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r209690309
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java
 ##
 @@ -52,6 +56,9 @@
 
private Bootstrap bootstrap;
 
 Review comment:
   out of scope of this PR - there's also more around this package, if you 
wanted to mark/change these accordingly


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578644#comment-16578644
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] 
add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r209690309
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java
 ##
 @@ -52,6 +56,9 @@
 
private Bootstrap bootstrap;
 
 Review comment:
   out of scope of this PR - there's also even more around this package, if you 
wanted to mark/change these accordingly


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578629#comment-16578629
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] 
add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r209682805
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java
 ##
 @@ -163,80 +163,188 @@ public static void setSSLVerifyHostname(Configuration 
sslConfig, SSLParameters s
}
 
/**
-* Creates the SSL Context for the client if SSL is configured.
+* Configuration settings and key/trustmanager instances to set up an 
SSL client connection.
+*/
+   public static class SSLClientConfiguration {
 
 Review comment:
   good idea - that makes the change even smaller...well, at least the 
important parts of the change ;)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578611#comment-16578611
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] 
add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r209682805
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java
 ##
 @@ -163,80 +163,188 @@ public static void setSSLVerifyHostname(Configuration 
sslConfig, SSLParameters s
}
 
/**
-* Creates the SSL Context for the client if SSL is configured.
+* Configuration settings and key/trustmanager instances to set up an 
SSL client connection.
+*/
+   public static class SSLClientConfiguration {
 
 Review comment:
   good idea - that makes the change even smaller


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573164#comment-16573164
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on issue #6355: [FLINK-9878][network][ssl] add more low-level 
ssl options
URL: https://github.com/apache/flink/pull/6355#issuecomment-411391874
 
 
   Yes, that makes sense and is marked as a follow-up task: 
https://issues.apache.org/jira/browse/FLINK-9879
   -> it probably takes some experiments to find the right parameters and their 
implications. Intuitively, I would agree with the session cache and timeout...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569951#comment-16569951
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

StephanEwen commented on issue #6355: [FLINK-9878][network][ssl] add more 
low-level ssl options
URL: https://github.com/apache/flink/pull/6355#issuecomment-410649691
 
 
   Does it make sense to set some sane default values here, if Java's defaults 
are a bit insane?
   
   For example:
 - Handshake timeout could be higher. We have seen that this helps 
overloaded systems.
 - Would it make sense to minimize session caches and timeout? We never 
reconnect a TPC connection trying to "fast resume" an SSL session.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565415#comment-16565415
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

NicoK commented on a change in pull request #6355: [FLINK-9878][network][ssl] 
add more low-level ssl options
URL: https://github.com/apache/flink/pull/6355#discussion_r206905353
 
 

 ##
 File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java
 ##
 @@ -26,13 +26,16 @@
 import org.apache.flink.shaded.netty4.io.netty.channel.ChannelHandler;
 import 
org.apache.flink.shaded.netty4.io.netty.handler.codec.string.StringDecoder;
 import 
org.apache.flink.shaded.netty4.io.netty.handler.codec.string.StringEncoder;
+import org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler;
 
 import org.junit.Assert;
 import org.junit.Test;
 
 import java.net.InetAddress;
 
+import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotNull;
 
 Review comment:
   unused - remove!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552555#comment-16552555
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204329756
  
--- Diff: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java
 ---
@@ -65,6 +68,60 @@ public void testValidSslConnection() throws Exception {
 
Channel ch = NettyTestUtil.connect(serverAndClient);
 
+   SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl");
+   assertTrue("default value should not be propagated", 
sslHandler.getHandshakeTimeoutMillis() >= 0);
+   assertTrue("default value should not be propagated", 
sslHandler.getCloseNotifyTimeoutMillis() >= 0);
+
+   // should be able to send text data
+   ch.pipeline().addLast(new StringDecoder()).addLast(new 
StringEncoder());
+   assertTrue(ch.writeAndFlush("test").await().isSuccess());
+
+   NettyTestUtil.shutdown(serverAndClient);
+   }
+
+   /**
+* Verify valid (advanced) ssl configuration and connection.
+*/
+   @Test
+   public void testValidSslConnectionAdvanced() throws Exception {
--- End diff --

please deduplicate code with `testValidSslConnection`


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552554#comment-16552554
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204330930
  
--- Diff: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyClientServerSslTest.java
 ---
@@ -65,6 +68,60 @@ public void testValidSslConnection() throws Exception {
 
Channel ch = NettyTestUtil.connect(serverAndClient);
 
+   SslHandler sslHandler = (SslHandler) ch.pipeline().get("ssl");
+   assertTrue("default value should not be propagated", 
sslHandler.getHandshakeTimeoutMillis() >= 0);
+   assertTrue("default value should not be propagated", 
sslHandler.getCloseNotifyTimeoutMillis() >= 0);
+
+   // should be able to send text data
+   ch.pipeline().addLast(new StringDecoder()).addLast(new 
StringEncoder());
+   assertTrue(ch.writeAndFlush("test").await().isSuccess());
+
+   NettyTestUtil.shutdown(serverAndClient);
+   }
+
+   /**
+* Verify valid (advanced) ssl configuration and connection.
+*/
+   @Test
+   public void testValidSslConnectionAdvanced() throws Exception {
--- End diff --

This is quite poor test :( With respect to `SESSION_CACHE_SIZE` and 
`SESSION_TIMEOUT` it tests only for "not throwing any exception". If those 
properties are just ignored, the test will still pass. 

Can we add some stress test that actually verifies the bug which this PR is 
trying to solve? Maybe stress test AND benchmark like 
`StreamNetworkThroughputBenchmarkTest#largeRemoteMode`?


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552551#comment-16552551
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204301373
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java
 ---
@@ -52,6 +56,9 @@
 
private Bootstrap bootstrap;
--- End diff --

`bootstrap` is nullable and not marked - change to `Optional`


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552552#comment-16552552
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204329262
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java ---
@@ -249,14 +357,73 @@ public static SSLContext 
createSSLServerContext(Configuration sslConfig) throws
 
// Set up key manager factory to use the server key 
store
KeyManagerFactory kmf = KeyManagerFactory.getInstance(
-   
KeyManagerFactory.getDefaultAlgorithm());
+   KeyManagerFactory.getDefaultAlgorithm());
kmf.init(ks, certPassword.toCharArray());
 
+   return new SSLServerConfiguration(
+   sslProtocolVersion,
+   sslCipherSuites,
+   kmf,
+   sessionCacheSize,
+   sessionTimeoutMs,
+   handshakeTimeoutMs,
+   closeNotifyFlushTimeoutMs);
+   }
+
+   return null;
+   }
+
+   /**
+* Creates the SSL Context for the server assuming SSL is configured.
+*
+* @param sslConfig
+*The application configuration
+* @return The SSLContext object which can be used by the ssl transport 
server
+* @throws Exception
+* Thrown if there is any misconfiguration
+*/
+   @Nullable
+   public static SSLContext createSSLServerContext(SSLServerConfiguration 
sslConfig) throws Exception {
+   Preconditions.checkNotNull(sslConfig);
+
+   LOG.debug("Creating server SSL context from configuration");
+   SSLContext serverSSLContext = 
SSLContext.getInstance(sslConfig.sslProtocolVersion);
+   
serverSSLContext.init(sslConfig.keyManagerFactory.getKeyManagers(), null, null);
+   if (sslConfig.sessionCacheSize >= 0) {
+   
serverSSLContext.getServerSessionContext().setSessionCacheSize(sslConfig.sessionCacheSize);
+   }
+   if (sslConfig.sessionTimeoutMs >= 0) {
+   
serverSSLContext.getServerSessionContext().setSessionTimeout(sslConfig.sessionTimeoutMs
 / 1000);
+   }
+
+   return serverSSLContext;
+   }
+
+   /**
+* Creates the SSL Context for the server if SSL is configured.
+*
+* @param sslConfig
+*The application configuration
+* @return The SSLContext object which can be used by the ssl transport 
server
+* Returns null if SSL is disabled
+* @throws Exception
+* Thrown if there is any misconfiguration
+*/
+   @Nullable
+   public static SSLContext createSSLServerContext(Configuration 
sslConfig) throws Exception {
+
+   Preconditions.checkNotNull(sslConfig);
+   SSLContext serverSSLContext = null;
+
+   if (getSSLEnabled(sslConfig)) {
--- End diff --

ditto: reverse if branch and `Optional`


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552546#comment-16552546
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204324645
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java
 ---
@@ -175,7 +183,6 @@ ChannelFuture connect(final InetSocketAddress 
serverSocketAddress) {
bootstrap.handler(new ChannelInitializer() {
@Override
public void initChannel(SocketChannel channel) throws 
Exception {
-
// SSL handler should be added first in the 
pipeline
if (clientSSLContext != null) {
--- End diff --

`checkState(!clientSSLContext.isEmpty())`? How can it ever be null?


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552548#comment-16552548
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204298813
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyConfig.java
 ---
@@ -189,23 +192,34 @@ public TransportType getTransportType() {
}
}
 
-   public SSLContext createClientSSLContext() throws Exception {
+   @Nullable
--- End diff --

`Optional` and ditto in other places. `@Nullable` is almost worthless 
without enforcing compile errors.


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552557#comment-16552557
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204336114
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java ---
@@ -163,80 +163,188 @@ public static void 
setSSLVerifyHostname(Configuration sslConfig, SSLParameters s
}
 
/**
-* Creates the SSL Context for the client if SSL is configured.
+* Configuration settings and key/trustmanager instances to set up an 
SSL client connection.
+*/
+   public static class SSLClientConfiguration {
--- End diff --

What's the value of introducing `SSLClientConfiguration`? As far as I can 
tell, the only point is to provide accessors to `handshakeTimeoutMS` and 
`closeNotifyFlushTimeoutMs` in `NettyClient#connect`, but it complicates 
initialisation by introducing one more extra obligatory step. 

Wouldn't it be better to wrap `SSLContext` with our class that provides 
those accessors? It seems like this would also remove the need for separate 
`SSLClientConfiguration` and `SSLServerConfiguration`, since all of theirs 
fields  except of `handshakeTimeoutMS` and `closeNotifyFlushTimeoutMs` 
are/should be private.


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552556#comment-16552556
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204328091
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java ---
@@ -163,80 +163,188 @@ public static void 
setSSLVerifyHostname(Configuration sslConfig, SSLParameters s
}
 
/**
-* Creates the SSL Context for the client if SSL is configured.
+* Configuration settings and key/trustmanager instances to set up an 
SSL client connection.
+*/
+   public static class SSLClientConfiguration {
+   public final String sslProtocolVersion;
+   public final TrustManagerFactory trustManagerFactory;
+   public final int sessionCacheSize;
+   public final int sessionTimeoutMs;
+   public final int handshakeTimeoutMs;
+   public final int closeNotifyFlushTimeoutMs;
+
+   public SSLClientConfiguration(
+   String sslProtocolVersion,
+   TrustManagerFactory trustManagerFactory,
+   int sessionCacheSize,
+   int sessionTimeoutMs,
+   int handshakeTimeoutMs,
+   int closeNotifyFlushTimeoutMs) {
+   this.sslProtocolVersion = sslProtocolVersion;
+   this.trustManagerFactory = trustManagerFactory;
+   this.sessionCacheSize = sessionCacheSize;
+   this.sessionTimeoutMs = sessionTimeoutMs;
+   this.handshakeTimeoutMs = handshakeTimeoutMs;
+   this.closeNotifyFlushTimeoutMs = 
closeNotifyFlushTimeoutMs;
+   }
+   }
+
+   /**
+* Creates necessary helper objects to use for creating an SSL Context 
for the client if SSL is
+* configured.
 *
 * @param sslConfig
 *The application configuration
-* @return The SSLContext object which can be used by the ssl transport 
client
-* Returns null if SSL is disabled
+* @return The SSLClientConfiguration object which can be used for 
creating some SSL context object;
+* returns null if SSL is disabled.
 * @throws Exception
 * Thrown if there is any misconfiguration
 */
@Nullable
-   public static SSLContext createSSLClientContext(Configuration 
sslConfig) throws Exception {
-
+   public static SSLClientConfiguration 
createSSLClientConfiguration(Configuration sslConfig) throws Exception {
Preconditions.checkNotNull(sslConfig);
-   SSLContext clientSSLContext = null;
 
if (getSSLEnabled(sslConfig)) {
-   LOG.debug("Creating client SSL context from 
configuration");
+   LOG.debug("Creating client SSL configuration");
 
String trustStoreFilePath = 
sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE);
String trustStorePassword = 
sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE_PASSWORD);
String sslProtocolVersion = 
sslConfig.getString(SecurityOptions.SSL_PROTOCOL);
+   int sessionCacheSize = 
sslConfig.getInteger(SecurityOptions.SSL_SESSION_CACHE_SIZE);
+   int sessionTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_SESSION_TIMEOUT);
+   int handshakeTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_HANDSHAKE_TIMEOUT);
+   int closeNotifyFlushTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT);
 
Preconditions.checkNotNull(trustStoreFilePath, 
SecurityOptions.SSL_TRUSTSTORE.key() + " was not configured.");
Preconditions.checkNotNull(trustStorePassword, 
SecurityOptions.SSL_TRUSTSTORE_PASSWORD.key() + " was not configured.");
 
KeyStore trustStore = 
KeyStore.getInstance(KeyStore.getDefaultType());
 
-   FileInputStream trustStoreFile = null;
-   try {
-   trustStoreFile = new FileInputStream(new 
File(trustStoreFilePath));
+   try (FileInputStream trustStoreFile = new 
FileInputStream(new File(trustStoreFilePath))) {
trustStore.load(trustStoreFile, 
trustStorePassword.toCharArray());
-   } finally {
-   if (trustStoreFile != null) {
-   

[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552549#comment-16552549
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204325132
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyServer.java
 ---
@@ -61,6 +63,9 @@
 
private ChannelFuture bindFuture;
 
+   @Nullable
--- End diff --

Please deduplicate this code with `NettyClient`. Introduce `NettyBase`, 
`NettyInitializer` or sth like that


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552547#comment-16552547
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204300332
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java
 ---
@@ -52,6 +56,9 @@
 
private Bootstrap bootstrap;
 
+   @Nullable
--- End diff --

Same argument as somewhere else: `Optional`.  You mark `clientSSLConfig` as 
nullable and never check it for not null. 


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552553#comment-16552553
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204328596
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java ---
@@ -163,80 +163,188 @@ public static void 
setSSLVerifyHostname(Configuration sslConfig, SSLParameters s
}
 
/**
-* Creates the SSL Context for the client if SSL is configured.
+* Configuration settings and key/trustmanager instances to set up an 
SSL client connection.
+*/
+   public static class SSLClientConfiguration {
+   public final String sslProtocolVersion;
+   public final TrustManagerFactory trustManagerFactory;
+   public final int sessionCacheSize;
+   public final int sessionTimeoutMs;
+   public final int handshakeTimeoutMs;
+   public final int closeNotifyFlushTimeoutMs;
+
+   public SSLClientConfiguration(
+   String sslProtocolVersion,
+   TrustManagerFactory trustManagerFactory,
+   int sessionCacheSize,
+   int sessionTimeoutMs,
+   int handshakeTimeoutMs,
+   int closeNotifyFlushTimeoutMs) {
+   this.sslProtocolVersion = sslProtocolVersion;
+   this.trustManagerFactory = trustManagerFactory;
+   this.sessionCacheSize = sessionCacheSize;
+   this.sessionTimeoutMs = sessionTimeoutMs;
+   this.handshakeTimeoutMs = handshakeTimeoutMs;
+   this.closeNotifyFlushTimeoutMs = 
closeNotifyFlushTimeoutMs;
+   }
+   }
+
+   /**
+* Creates necessary helper objects to use for creating an SSL Context 
for the client if SSL is
+* configured.
 *
 * @param sslConfig
 *The application configuration
-* @return The SSLContext object which can be used by the ssl transport 
client
-* Returns null if SSL is disabled
+* @return The SSLClientConfiguration object which can be used for 
creating some SSL context object;
+* returns null if SSL is disabled.
 * @throws Exception
 * Thrown if there is any misconfiguration
 */
@Nullable
-   public static SSLContext createSSLClientContext(Configuration 
sslConfig) throws Exception {
-
+   public static SSLClientConfiguration 
createSSLClientConfiguration(Configuration sslConfig) throws Exception {
Preconditions.checkNotNull(sslConfig);
-   SSLContext clientSSLContext = null;
 
if (getSSLEnabled(sslConfig)) {
-   LOG.debug("Creating client SSL context from 
configuration");
+   LOG.debug("Creating client SSL configuration");
 
String trustStoreFilePath = 
sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE);
String trustStorePassword = 
sslConfig.getString(SecurityOptions.SSL_TRUSTSTORE_PASSWORD);
String sslProtocolVersion = 
sslConfig.getString(SecurityOptions.SSL_PROTOCOL);
+   int sessionCacheSize = 
sslConfig.getInteger(SecurityOptions.SSL_SESSION_CACHE_SIZE);
+   int sessionTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_SESSION_TIMEOUT);
+   int handshakeTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_HANDSHAKE_TIMEOUT);
+   int closeNotifyFlushTimeoutMs = 
sslConfig.getInteger(SecurityOptions.SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT);
 
Preconditions.checkNotNull(trustStoreFilePath, 
SecurityOptions.SSL_TRUSTSTORE.key() + " was not configured.");
Preconditions.checkNotNull(trustStorePassword, 
SecurityOptions.SSL_TRUSTSTORE_PASSWORD.key() + " was not configured.");
 
KeyStore trustStore = 
KeyStore.getInstance(KeyStore.getDefaultType());
 
-   FileInputStream trustStoreFile = null;
-   try {
-   trustStoreFile = new FileInputStream(new 
File(trustStoreFilePath));
+   try (FileInputStream trustStoreFile = new 
FileInputStream(new File(trustStoreFilePath))) {
trustStore.load(trustStoreFile, 
trustStorePassword.toCharArray());
-   } finally {
-   if (trustStoreFile != null) {
-   

[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552550#comment-16552550
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user pnowojski commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r204326191
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java ---
@@ -163,80 +163,188 @@ public static void 
setSSLVerifyHostname(Configuration sslConfig, SSLParameters s
}
 
/**
-* Creates the SSL Context for the client if SSL is configured.
+* Configuration settings and key/trustmanager instances to set up an 
SSL client connection.
+*/
+   public static class SSLClientConfiguration {
+   public final String sslProtocolVersion;
+   public final TrustManagerFactory trustManagerFactory;
+   public final int sessionCacheSize;
+   public final int sessionTimeoutMs;
+   public final int handshakeTimeoutMs;
+   public final int closeNotifyFlushTimeoutMs;
+
+   public SSLClientConfiguration(
+   String sslProtocolVersion,
+   TrustManagerFactory trustManagerFactory,
+   int sessionCacheSize,
+   int sessionTimeoutMs,
+   int handshakeTimeoutMs,
+   int closeNotifyFlushTimeoutMs) {
+   this.sslProtocolVersion = sslProtocolVersion;
+   this.trustManagerFactory = trustManagerFactory;
+   this.sessionCacheSize = sessionCacheSize;
+   this.sessionTimeoutMs = sessionTimeoutMs;
+   this.handshakeTimeoutMs = handshakeTimeoutMs;
+   this.closeNotifyFlushTimeoutMs = 
closeNotifyFlushTimeoutMs;
+   }
+   }
+
+   /**
+* Creates necessary helper objects to use for creating an SSL Context 
for the client if SSL is
+* configured.
 *
 * @param sslConfig
 *The application configuration
-* @return The SSLContext object which can be used by the ssl transport 
client
-* Returns null if SSL is disabled
+* @return The SSLClientConfiguration object which can be used for 
creating some SSL context object;
+* returns null if SSL is disabled.
 * @throws Exception
 * Thrown if there is any misconfiguration
 */
@Nullable
-   public static SSLContext createSSLClientContext(Configuration 
sslConfig) throws Exception {
-
+   public static SSLClientConfiguration 
createSSLClientConfiguration(Configuration sslConfig) throws Exception {
Preconditions.checkNotNull(sslConfig);
-   SSLContext clientSSLContext = null;
 
if (getSSLEnabled(sslConfig)) {
--- End diff --

reverse if/else conditions and `Optional`
```
if (!getSSLEnabled(...)) {
  return Optional.empty();
}


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549049#comment-16549049
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r203657904
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyClient.java
 ---
@@ -52,6 +55,7 @@
 
private Bootstrap bootstrap;
 
+   private SSLUtils.SSLClientConfiguration clientSSLConfig;
--- End diff --

add `@Nullable`


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549050#comment-16549050
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r203658272
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/NettyServer.java
 ---
@@ -61,6 +62,7 @@
 
private ChannelFuture bindFuture;
 
+   private SSLUtils.SSLServerConfiguration serverSSLConfig;
--- End diff --

add `@Nullable`


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549030#comment-16549030
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user NicoK commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r203652882
  
--- Diff: docs/ops/security-ssl.md ---
@@ -33,6 +33,9 @@ SSL can be enabled for all network communication between 
Flink components. SSL k
 * **akka.ssl.enabled**: SSL flag for akka based control connection between 
the Flink client, jobmanager and taskmanager 
 * **jobmanager.web.ssl.enabled**: Flag to enable https access to the 
jobmanager's web frontend
 
+Please see the configuration page about the
+[complete list of SSL configuration 
parameters]({{site.baseurl}}/ops/config.html#ssl-settings), in particular 
**security.ssl.session-cache-size**.
--- End diff --

agreed, that would make sense


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549023#comment-16549023
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r203652194
  
--- Diff: docs/ops/security-ssl.md ---
@@ -33,6 +33,9 @@ SSL can be enabled for all network communication between 
Flink components. SSL k
 * **akka.ssl.enabled**: SSL flag for akka based control connection between 
the Flink client, jobmanager and taskmanager 
 * **jobmanager.web.ssl.enabled**: Flag to enable https access to the 
jobmanager's web frontend
 
+Please see the configuration page about the
+[complete list of SSL configuration 
parameters]({{site.baseurl}}/ops/config.html#ssl-settings), in particular 
**security.ssl.session-cache-size**.
--- End diff --

just a suggestion, you could also embed the entire table directly, see 
`Configuration.md` on how to do it.


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548875#comment-16548875
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user NicoK commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r203617345
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java ---
@@ -160,4 +160,41 @@
key("security.ssl.verify-hostname")
.defaultValue(true)
.withDescription("Flag to enable peer’s hostname 
verification during ssl handshake.");
+
+   /**
+* SSL session cache size.
+*/
+   public static final ConfigOption SSL_SESSION_CACHE_SIZE =
+   key("security.ssl.session-cache-size")
+   .defaultValue(-1)
+   .withDescription("The size of the cache used for 
storing SSL session objects. "
+   + "According to 
https://github.com/netty/netty/issues/832, you should always set "
+   + "this to an appropriate number to not run 
into a bug with stalling IO threads "
+   + "during garbage collection. (-1 = use system 
default).");
+
+   /**
+* SSL session timeout.
+*/
+   public static final ConfigOption SSL_SESSION_TIMEOUT =
+   key("security.ssl.session-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) for the cached 
SSL session objects. (-1 = use system default)");
+
+   /**
+* SSL session timeout during handshakes.
+*/
+   public static final ConfigOption SSL_HANDSHAKE_TIMEOUT =
+   key("security.ssl.handshake-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) during SSL 
handshake. (-1 = use system default)");
+
+   /**
+* SSL session timeout after flushing the `close_notify` message.
+*/
+   public static final ConfigOption 
SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT =
+   key("security.ssl.close-notify-flush-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) for flushing the 
`close_notify` that was triggered by closing a " +
--- End diff --

unfortunately yes

FYI: I found the difference:
`The timeout (in ms) for flushing the close_notify that was triggered by 
closing a channel. If the close_notify was not flushed in the given timeout the 
channel will be closed  forcibly. (-1 = use system default)` vs. 
`The timeout (in ms) for flushing the close_notify that was triggered by 
closing a channel. If the close_notify was not flushed in the given timeout the 
channel will be closed forcibly. (-1 = use system default)`
-> seems like a double-space is made a single space at some 
point...fixing...


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548014#comment-16548014
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r203437995
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java ---
@@ -160,4 +160,41 @@
key("security.ssl.verify-hostname")
.defaultValue(true)
.withDescription("Flag to enable peer’s hostname 
verification during ssl handshake.");
+
+   /**
+* SSL session cache size.
+*/
+   public static final ConfigOption SSL_SESSION_CACHE_SIZE =
+   key("security.ssl.session-cache-size")
+   .defaultValue(-1)
+   .withDescription("The size of the cache used for 
storing SSL session objects. "
+   + "According to 
https://github.com/netty/netty/issues/832, you should always set "
+   + "this to an appropriate number to not run 
into a bug with stalling IO threads "
+   + "during garbage collection. (-1 = use system 
default).");
+
+   /**
+* SSL session timeout.
+*/
+   public static final ConfigOption SSL_SESSION_TIMEOUT =
+   key("security.ssl.session-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) for the cached 
SSL session objects. (-1 = use system default)");
+
+   /**
+* SSL session timeout during handshakes.
+*/
+   public static final ConfigOption SSL_HANDSHAKE_TIMEOUT =
+   key("security.ssl.handshake-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) during SSL 
handshake. (-1 = use system default)");
+
+   /**
+* SSL session timeout after flushing the `close_notify` message.
+*/
+   public static final ConfigOption 
SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT =
+   key("security.ssl.close-notify-flush-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) for flushing the 
`close_notify` that was triggered by closing a " +
--- End diff --

it's not showing up as a code block since that only works for markdown; the 
description so far was plain-text.


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547909#comment-16547909
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user NicoK commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r203405530
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java ---
@@ -160,4 +160,41 @@
key("security.ssl.verify-hostname")
.defaultValue(true)
.withDescription("Flag to enable peer’s hostname 
verification during ssl handshake.");
+
+   /**
+* SSL session cache size.
+*/
+   public static final ConfigOption SSL_SESSION_CACHE_SIZE =
+   key("security.ssl.session-cache-size")
+   .defaultValue(-1)
+   .withDescription("The size of the cache used for 
storing SSL session objects. "
+   + "According to 
https://github.com/netty/netty/issues/832, you should always set "
+   + "this to an appropriate number to not run 
into a bug with stalling IO threads "
+   + "during garbage collection. (-1 = use system 
default).");
+
+   /**
+* SSL session timeout.
+*/
+   public static final ConfigOption SSL_SESSION_TIMEOUT =
+   key("security.ssl.session-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) for the cached 
SSL session objects. (-1 = use system default)");
+
+   /**
+* SSL session timeout during handshakes.
+*/
+   public static final ConfigOption SSL_HANDSHAKE_TIMEOUT =
+   key("security.ssl.handshake-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) during SSL 
handshake. (-1 = use system default)");
+
+   /**
+* SSL session timeout after flushing the `close_notify` message.
+*/
+   public static final ConfigOption 
SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT =
+   key("security.ssl.close-notify-flush-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) for flushing the 
`close_notify` that was triggered by closing a " +
--- End diff --

could try - strangely though, this is working for e.g. 
`security.kerberos.login.contexts` although the desired effect (marking it as 
code) is not there...but that's a different problem.


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547659#comment-16547659
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6355#discussion_r203326103
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java ---
@@ -160,4 +160,41 @@
key("security.ssl.verify-hostname")
.defaultValue(true)
.withDescription("Flag to enable peer’s hostname 
verification during ssl handshake.");
+
+   /**
+* SSL session cache size.
+*/
+   public static final ConfigOption SSL_SESSION_CACHE_SIZE =
+   key("security.ssl.session-cache-size")
+   .defaultValue(-1)
+   .withDescription("The size of the cache used for 
storing SSL session objects. "
+   + "According to 
https://github.com/netty/netty/issues/832, you should always set "
+   + "this to an appropriate number to not run 
into a bug with stalling IO threads "
+   + "during garbage collection. (-1 = use system 
default).");
+
+   /**
+* SSL session timeout.
+*/
+   public static final ConfigOption SSL_SESSION_TIMEOUT =
+   key("security.ssl.session-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) for the cached 
SSL session objects. (-1 = use system default)");
+
+   /**
+* SSL session timeout during handshakes.
+*/
+   public static final ConfigOption SSL_HANDSHAKE_TIMEOUT =
+   key("security.ssl.handshake-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) during SSL 
handshake. (-1 = use system default)");
+
+   /**
+* SSL session timeout after flushing the `close_notify` message.
+*/
+   public static final ConfigOption 
SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT =
+   key("security.ssl.close-notify-flush-timeout")
+   .defaultValue(-1)
+   .withDescription("The timeout (in ms) for flushing the 
`close_notify` that was triggered by closing a " +
--- End diff --

could you try removing the ` signs? let's see if that trips up the test.


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547419#comment-16547419
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/6355
  
about the travis error: I tried regenerating the configuration page from 
the sources but it does not change at all and the "documentation outdated" 
remains :(


> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc

2018-07-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547160#comment-16547160
 ] 

ASF GitHub Bot commented on FLINK-9878:
---

GitHub user NicoK opened a pull request:

https://github.com/apache/flink/pull/6355

[FLINK-9878][network][ssl] add more low-level ssl options

## What is the purpose of the change

This is mostly to tackle bugs like https://github.com/netty/netty/issues/832
(JDK issue during garbage collection when the SSL session cache is not 
limited).
We add the following low-level configuration options for the user to 
fine-tune
their system:

- SSL session cache size
- SSL session timeout
- SSL handshake timeout
- SSL close notify flush timeout

This is the PR for the `release-1.5` branch only - I'll create a separate 
one for `master` due to the changes of #6326.

## Brief change log

- add `security.ssl.session-cache-size` and `security.ssl.session-timeout` 
configuration parameters
-> configure these for `SSLContext`s created by `SSLUtil`
- add `security.ssl.handshake-timeout` and 
`security.ssl.close-notify-flush-timeout`
-> configure these in the TM-communication channels via `NettyClient` and 
`NettyServer`
- refactor `SSLUtils` so that we extract these configurations separately

## Verifying this change

This change added tests and can be verified as follows:

- added configuration-verification test to `NettyClientServerSslTest`

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): **no**
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
  - The serializers: **no**
  - The runtime per-record code paths (performance sensitive): **no**
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
  - The S3 file system connector: **no**

## Documentation

  - Does this pull request introduce a new feature? **yes** (kind-of)
  - If yes, how is the feature documented? **docs + JavaDocs**


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NicoK/flink flink-9878

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6355


commit 9a19f64130837cba40c8f9b708aa98c002ae1a63
Author: Nico Kruber 
Date:   2018-07-17T21:40:11Z

[FLINK-9878][network][ssl] add more low-level ssl options

This is mostly to tackle bugs like https://github.com/netty/netty/issues/832
(JDK issue during garbage collection when the SSL session cache is not 
limited).
We add the following low-level configuration options for the user to 
fine-tune
their system:

- SSL session cache size
- SSL session timeout
- SSL handshake timeout
- SSL close notify flush timeout




> IO worker threads BLOCKED on SSL Session Cache while CMS full gc
> 
>
> Key: FLINK-9878
> URL: https://issues.apache.org/jira/browse/FLINK-9878
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> According to https://github.com/netty/netty/issues/832, there is a JDK issue 
> during garbage collection when the SSL session cache is not limited. We 
> should allow the user to configure this and further (advanced) SSL parameters 
> for fine-tuning to fix this and similar issues. In particular, the following 
> parameters should be configurable:
> - SSL session cache size
> - SSL session timeout
> - SSL handshake timeout
> - SSL close notify flush timeout



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)