[jira] [Comment Edited] (FLINK-10538) standalone-job.sh causes Classpath issues
[ https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655216#comment-16655216 ] Razvan edited comment on FLINK-10538 at 10/18/18 1:55 PM: -- [~till.rohrmann] Yes, we use Aws S3 for state. But can't the dependencies be shaded? was (Author: razvan): @Till Yes, we use Aws S3 for state. But can't the dependencies be shaded? > standalone-job.sh causes Classpath issues > - > > Key: FLINK-10538 > URL: https://issues.apache.org/jira/browse/FLINK-10538 > Project: Flink > Issue Type: Bug > Components: Docker, Job-Submission, Kubernetes >Affects Versions: 1.6.0, 1.6.1, 1.6.2, 1.7.0 >Reporter: Razvan >Priority: Major > Fix For: 1.7.0 > > > When launching a job with the cluster through this script it creates > dependency issues. > > We have a job which uses AsyncHttpClient, which uses the netty library. When > building/running a Docker image for a Flink job cluster on Kubernetes > (build.sh > [https://github.com/apache/flink/blob/release-1.6/flink-container/docker/build.sh]) > will copy our given artifact to a file called "job.jar" in the lib/ folder > of the distribution inside the container. > Upon runtime (standalone-job.sh) we get: > > {code:java} > 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - > StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> > Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched > from RUNNING to FAILED. > java.lang.NoSuchMethodError: > io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext; > at > io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452) > at > org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58) > at > org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73) > at > org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100) > at > org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89) > at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32) > at > com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128) > at > com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51) > {code} > > It's because it loads Apache Flink's Netty dependency first > > {code:java} > [Loaded io.netty.handler.codec.http.HttpObject from > file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] > [Loaded io.netty.handler.codec.http.HttpMessage from > file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] > {code} > > {code:java} > 2018-10-12 11:48:20.434 [main] INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: > /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: > {code} > > The workaround is to rename job.jar to 1JOB.jar for example to be loaded first > > {code:java} > 2018-10-12 13:51:09.165 [main] INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: >
[jira] [Commented] (FLINK-10538) standalone-job.sh causes Classpath issues
[ https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655216#comment-16655216 ] Razvan commented on FLINK-10538: @Till Yes, we use Aws S3 for state. But can't the dependencies be shaded? > standalone-job.sh causes Classpath issues > - > > Key: FLINK-10538 > URL: https://issues.apache.org/jira/browse/FLINK-10538 > Project: Flink > Issue Type: Bug > Components: Docker, Job-Submission, Kubernetes >Affects Versions: 1.6.0, 1.6.1, 1.6.2, 1.7.0 >Reporter: Razvan >Priority: Major > Fix For: 1.7.0 > > > When launching a job with the cluster through this script it creates > dependency issues. > > We have a job which uses AsyncHttpClient, which uses the netty library. When > building/running a Docker image for a Flink job cluster on Kubernetes > (build.sh > [https://github.com/apache/flink/blob/release-1.6/flink-container/docker/build.sh]) > will copy our given artifact to a file called "job.jar" in the lib/ folder > of the distribution inside the container. > Upon runtime (standalone-job.sh) we get: > > {code:java} > 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - > StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> > Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched > from RUNNING to FAILED. > java.lang.NoSuchMethodError: > io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext; > at > io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452) > at > org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58) > at > org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73) > at > org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100) > at > org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89) > at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32) > at > com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128) > at > com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51) > {code} > > It's because it loads Apache Flink's Netty dependency first > > {code:java} > [Loaded io.netty.handler.codec.http.HttpObject from > file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] > [Loaded io.netty.handler.codec.http.HttpMessage from > file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] > {code} > > {code:java} > 2018-10-12 11:48:20.434 [main] INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: > /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: > {code} > > The workaround is to rename job.jar to 1JOB.jar for example to be loaded first > > {code:java} > 2018-10-12 13:51:09.165 [main] INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: >
[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues
[ https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-10538: --- Description: When launching a job with the cluster through this script it creates dependency issues. We have a job which uses AsyncHttpClient, which uses the netty library. When building/running a Docker image for a Flink job cluster on Kubernetes (build.sh [https://github.com/apache/flink/blob/release-1.6/flink-container/docker/build.sh]) will copy our given artifact to a file called "job.jar" in the lib/ folder of the distribution inside the container. Upon runtime (standalone-job.sh) we get: {code:java} 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched from RUNNING to FAILED. java.lang.NoSuchMethodError: io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext; at io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452) at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58) at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73) at org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100) at org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89) at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32) at com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128) at com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51) {code} It's because it loads Apache Flink's Netty dependency first {code:java} [Loaded io.netty.handler.codec.http.HttpObject from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] [Loaded io.netty.handler.codec.http.HttpMessage from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] {code} {code:java} 2018-10-12 11:48:20.434 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: {code} The workaround is to rename job.jar to 1JOB.jar for example to be loaded first {code:java} 2018-10-12 13:51:09.165 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: {code} {code:java} [Loaded io.netty.handler.codec.http.HttpObject from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] [Loaded io.netty.handler.codec.http.HttpMessage from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] {code} This needs to be fixed properly as it also means after workaround it will load the job's libraries first and could cause the Flink to crash or behave in unexpected ways. was: When launching a job with the cluster through this script it creates dependency
[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues
[ https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-10538: --- Description: When launching a job with the cluster through this script it creates dependency issues. We have a job which uses AsyncHttpClient, which uses the netty library. When building/running a Docker image for a Flink job cluster on Kubernetes (build.sh https://github.com/apache/flink/blob/release-1.6/flink-container/docker/build.sh) will copy our given artifact to a file called "job.jar" in the lib/ folder of the distribution inside the container. Upon runtime (standalone-job.sh) we get: {code:java} 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched from RUNNING to FAILED. java.lang.NoSuchMethodError: io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext; at io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452) at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58) at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73) at org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100) at org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89) at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32) at com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128) at com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51) {code} It's because it loads Apache Flink's netty first {code:java} [Loaded io.netty.handler.codec.http.HttpObject from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] [Loaded io.netty.handler.codec.http.HttpMessage from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] {code} {code:java} 2018-10-12 11:48:20.434 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: {code} The workaround is to rename job.jar to 1JOB.jar for example to be loaded first {code:java} 2018-10-12 13:51:09.165 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: {code} {code:java} [Loaded io.netty.handler.codec.http.HttpObject from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] [Loaded io.netty.handler.codec.http.HttpMessage from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] {code} This needs to be fixed properly as it also means after workaround it will load the job's libraries first and could cause the Flink to crash or behave in unexpected ways. was: When launching a job with the cluster through this script it creates dependency issues. We
[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues
[ https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-10538: --- Affects Version/s: 1.7.0 > standalone-job.sh causes Classpath issues > - > > Key: FLINK-10538 > URL: https://issues.apache.org/jira/browse/FLINK-10538 > Project: Flink > Issue Type: Bug > Components: Docker, Job-Submission, Kubernetes >Affects Versions: 1.6.0, 1.6.1, 1.7.0, 1.6.2 >Reporter: Razvan >Priority: Blocker > > When launching a job with the cluster through this script it creates > dependency issues. > > We have a job which uses AsyncHttpClient, which uses the netty library. By > default, using standalone-job.sh (when building/running a Docker image for > Kubenetes) it will copy our given artifact to a file called "job.jar" in the > lib/ folder of the distribution inside the container. > Upon runtime we get: > > {code:java} > 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - > StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> > Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched > from RUNNING to FAILED. > java.lang.NoSuchMethodError: > io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext; > at > io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452) > at > org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58) > at > org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73) > at > org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100) > at > org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89) > at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32) > at > com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128) > at > com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51) > {code} > > It's because it loads Apache Flink's netty first > > {code:java} > [Loaded io.netty.handler.codec.http.HttpObject from > file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] > [Loaded io.netty.handler.codec.http.HttpMessage from > file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] > {code} > > {code:java} > 2018-10-12 11:48:20.434 [main] INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: > /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: > {code} > > The workaround is to rename job.jar to 1JOB.jar for example to be loaded first > > {code:java} > 2018-10-12 13:51:09.165 [main] INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: > /Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: > > {code} > > > {code:java} > [Loaded io.netty.handler.codec.http.HttpObject from >
[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues
[ https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-10538: --- Description: When launching a job with the cluster through this script it creates dependency issues. We have a job which uses AsyncHttpClient, which uses the netty library. By default, using standalone-job.sh (when building/running a Docker image for Kubenetes) it will copy our given artifact to a file called "job.jar" in the lib/ folder of the distribution inside the container. Upon runtime we get: {code:java} 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched from RUNNING to FAILED. java.lang.NoSuchMethodError: io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext; at io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452) at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58) at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73) at org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100) at org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89) at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32) at com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128) at com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51) {code} It's because it loads Apache Flink's netty first {code:java} [Loaded io.netty.handler.codec.http.HttpObject from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] [Loaded io.netty.handler.codec.http.HttpMessage from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] {code} {code:java} 2018-10-12 11:48:20.434 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: {code} The workaround is to rename job.jar to 1JOB.jar for example to be loaded first {code:java} 2018-10-12 13:51:09.165 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: {code} {code:java} [Loaded io.netty.handler.codec.http.HttpObject from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] [Loaded io.netty.handler.codec.http.HttpMessage from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] {code} This needs to be fixed properly as it also means after workaround it will load the job's libraries first and could cause the Flink to crash or behave in unexpected ways. was: When launching a job with the cluster through this script it creates dependency issues. We have a job which uses AsyncHttpClient, which uses the netty library. By default, using
[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues
[ https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-10538: --- Description: When launching a job with the cluster through this script it creates dependency issues. We have a job which uses AsyncHttpClient, which uses the netty library. By default, using standalone-job.sh (when building/running a Docker image for Kubenetes) it will copy our given artifact to a file called "job.jar" in the lib/ folder of the distribution inside the container. Upon runtime we get: {code:java} 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched from RUNNING to FAILED. java.lang.NoSuchMethodError: io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext; at io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452) at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58) at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73) at org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100) at org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89) at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32) at com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128) at com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51) {code} It's because it loads Apache Flink's netty first {code:java} [Loaded io.netty.handler.codec.http.HttpObject from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] [Loaded io.netty.handler.codec.http.HttpMessage from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] {code} {code:java} 2018-10-12 11:48:20.434 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: {code} The workaround is to rename job.jar to 1JOB.jar for example to be loaded first {code:java} 2018-10-12 13:51:09.165 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: {code} {code:java} [Loaded io.netty.handler.codec.http.HttpObject from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] [Loaded io.netty.handler.codec.http.HttpMessage from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] {code} This needs to be fixed properly as it also means after workaround it will load the job's libraries first and could cause the Flink to crash or behave in unexpected ways was: When launching a job with the cluster through this script it creates dependency issues. We have a job which uses AsyncHttpClient, which uses the netty library. By default, using
[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues
[ https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-10538: --- Priority: Blocker (was: Critical) > standalone-job.sh causes Classpath issues > - > > Key: FLINK-10538 > URL: https://issues.apache.org/jira/browse/FLINK-10538 > Project: Flink > Issue Type: Bug > Components: Docker, Job-Submission, Kubernetes >Affects Versions: 1.6.0, 1.6.1, 1.6.2 >Reporter: Razvan >Priority: Blocker > > When launching a job with the cluster through this script it creates > dependency issues. > > We have a job which uses AsyncHttpClient, which uses the netty library. By > default, using standalone-job.sh (when building/running a Docker image for > Kubenetes) it will copy our given artifact to a file called "job.jar" in the > lib/ folder of the distribution inside the container. > Upon runtime we get: > > {code:java} > 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - > StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> > Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched > from RUNNING to FAILED. > java.lang.NoSuchMethodError: > io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext; > at > io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452) > at > org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58) > at > org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73) > at > org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100) > at > org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89) > at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32) > at > com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128) > at > com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51) > {code} > > > It's because it loads Apache Flink's netty first > > > {code:java} > [Loaded io.netty.handler.codec.http.HttpObject from > file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] > [Loaded io.netty.handler.codec.http.HttpMessage from > file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] > {code} > > {code:java} > 2018-10-12 11:48:20.434 [main] INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: > /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: > {code} > > The workaround is to rename job.jar to 1JOB.jar for example to be loaded first > > {code:java} > 2018-10-12 13:51:09.165 [main] INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: > /Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: > > {code} > > > {code:java} > [Loaded io.netty.handler.codec.http.HttpObject from
[jira] [Created] (FLINK-10538) standalone-job.sh causes Classpath issues
Razvan created FLINK-10538: -- Summary: standalone-job.sh causes Classpath issues Key: FLINK-10538 URL: https://issues.apache.org/jira/browse/FLINK-10538 Project: Flink Issue Type: Bug Components: Docker, Job-Submission, Kubernetes Affects Versions: 1.6.1, 1.6.0, 1.6.2 Reporter: Razvan When launching a job with the cluster through this script it creates dependency issues. We have a job which uses AsyncHttpClient, which uses the netty library. By default, using standalone-job.sh (when building/running a Docker image for Kubenetes) it will copy our given artifact to a file called "job.jar" in the lib/ folder of the distribution inside the container. Upon runtime we get: {code:java} 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched from RUNNING to FAILED. java.lang.NoSuchMethodError: io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext; at io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452) at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58) at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73) at org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100) at org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89) at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32) at com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128) at com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51) {code} It's because it loads Apache Flink's netty first {code:java} [Loaded io.netty.handler.codec.http.HttpObject from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] [Loaded io.netty.handler.codec.http.HttpMessage from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar] {code} {code:java} 2018-10-12 11:48:20.434 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: {code} The workaround is to rename job.jar to 1JOB.jar for example to be loaded first {code:java} 2018-10-12 13:51:09.165 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: {code} {code:java} [Loaded io.netty.handler.codec.http.HttpObject from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] [Loaded io.netty.handler.codec.http.HttpMessage from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] {code} This needs to be fixed properly as it also means after workaround it will load the job's libraries first and could cause the Flink to crash or behave in
[jira] [Comment Edited] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails
[ https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506323#comment-16506323 ] Razvan edited comment on FLINK-9540 at 6/8/18 5:51 PM: --- Hi [~aljoscha], Thanks for the hint. I managed to get 1.4.2 working with S3A by copying necessary libraries to lib/ for Hadoop 2.8 only (but it's not straightforward and it's not correctly documented in my opinion). My goal here is also to clean up the documentation a bit. So I want *to define exactly what is needed for each version of Apache Flink (1.4.* - 1.6.* ) in order to get it working with S3/A/N for each of Hadoop versions (2.4 - 2.8).* I believe this will increase the number of users of the framework and contributors. So if you / anyone could help me define the dependencies I can make PR for documentation. was (Author: razvan): Hi [~aljoscha], Thanks for the hint. I managed to get 1.4.2 working with S3A by copying necessary libraries to lib/ (but it's not straightforward and it's not correctly documented in my opinion). My goal here is also to clean up the documentation a bit. So I want *to define exactly what is needed for each version of Apache Flink (1.4.* - 1.6.* ) in order to get it working with S3/A/N for each of Hadoop versions (2.4 - 2.8).* I believe this will increase the number of users of the framework and contributors. So if you / anyone could help me define the dependencies I can make PR for documentation. > Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 > and fails > - > > Key: FLINK-9540 > URL: https://issues.apache.org/jira/browse/FLINK-9540 > Project: Flink > Issue Type: Bug >Reporter: Razvan >Priority: Blocker > > Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside > flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you > get: > > # > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # [http://www.apache.org/licenses/LICENSE-2.0] > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > # > *version=2.8.1* > revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be > *branch=branch-2.8.1-private* > user=vinodkv > date=2017-06-07T21:22Z > *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*] > srcChecksum=60125541c2b3e266cbf3becc5bda666 > protocVersion=2.5.0 > > This will fail to work with dependencies described in > ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)] > > "Depending on which file system you use, please add the following > dependencies. You can find these as part of the Hadoop binaries in > {{hadoop-2.7/share/hadoop/tools/lib}}: > * {{S3AFileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies: > *** {{aws-java-sdk-core-1.11.183.jar}} > *** {{aws-java-sdk-kms-1.11.183.jar}} > *** {{jackson-annotations-2.6.7.jar}} > *** {{jackson-core-2.6.7.jar}} > *** {{jackson-databind-2.6.7.jar}} > *** {{joda-time-2.8.1.jar}} > *** {{httpcore-4.4.4.jar}} > *** {{httpclient-4.5.3.jar}} > * {{NativeS3FileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{guava-11.0.2.jar"}}{{}} > > {{ I presume the build task is flawed, it should be built for Apache Hadoop > 2.7 can someone please check it?}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails
[ https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506323#comment-16506323 ] Razvan commented on FLINK-9540: --- Hi [~aljoscha], Thanks for the hint. I managed to get 1.4.2 working with S3A by copying necessary libraries to lib/ (but it's not straightforward and it's not correctly documented in my opinion). My goal here is also to clean up the documentation a bit. So I want *to define exactly what is needed for each version of Apache Flink (1.4.* - 1.6.* ) in order to get it working with S3/A/N for each of Hadoop versions (2.4 - 2.8).* I believe this will increase the number of users of the framework and contributors. So if you / anyone could help me define the dependencies I can make PR for documentation. > Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 > and fails > - > > Key: FLINK-9540 > URL: https://issues.apache.org/jira/browse/FLINK-9540 > Project: Flink > Issue Type: Bug >Reporter: Razvan >Priority: Blocker > > Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside > flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you > get: > > # > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # [http://www.apache.org/licenses/LICENSE-2.0] > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > # > *version=2.8.1* > revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be > *branch=branch-2.8.1-private* > user=vinodkv > date=2017-06-07T21:22Z > *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*] > srcChecksum=60125541c2b3e266cbf3becc5bda666 > protocVersion=2.5.0 > > This will fail to work with dependencies described in > ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)] > > "Depending on which file system you use, please add the following > dependencies. You can find these as part of the Hadoop binaries in > {{hadoop-2.7/share/hadoop/tools/lib}}: > * {{S3AFileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies: > *** {{aws-java-sdk-core-1.11.183.jar}} > *** {{aws-java-sdk-kms-1.11.183.jar}} > *** {{jackson-annotations-2.6.7.jar}} > *** {{jackson-core-2.6.7.jar}} > *** {{jackson-databind-2.6.7.jar}} > *** {{joda-time-2.8.1.jar}} > *** {{httpcore-4.4.4.jar}} > *** {{httpclient-4.5.3.jar}} > * {{NativeS3FileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{guava-11.0.2.jar"}}{{}} > > {{ I presume the build task is flawed, it should be built for Apache Hadoop > 2.7 can someone please check it?}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails
[ https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505955#comment-16505955 ] Razvan commented on FLINK-9540: --- Hi, This is the first exception: 2018-05-18 17:37:20.986 [main] WARN org.apache.flink.configuration.Configuration - Config uses deprecated configuration key 'high-availability.zookeeper.storageDir' instead of proper key 'high-availability.storageDir' 2018-05-18 17:37:21.239 [main] ERROR org.apache.flink.runtime.taskmanager.TaskManager - Failed to run TaskManager. java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:119) at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:92) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:103) at org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1681) at org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1592) at org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1590) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1590) at org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala) Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3a'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:405) at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320) at org.apache.flink.core.fs.Path.getFileSystem(Path.java:293) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:116) ... 11 common frames omitted Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Cannot support file system for 's3a' via Hadoop, because Hadoop is not in the classpath, or some classes are missing from the classpath. at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:179) at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401) ... 14 common frames omitted Caused by: java.lang.NoClassDefFoundError: org/apache/http/HttpEntityEnclosingRequest at com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:119) at com.amazonaws.services.s3.AmazonS3Client.(AmazonS3Client.java:389) at com.amazonaws.services.s3.AmazonS3Client.(AmazonS3Client.java:371) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235) at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:159) ... 15 common frames omitted Caused by: java.lang.ClassNotFoundException: org.apache.http.HttpEntityEnclosingRequest at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 20 common frames omitted Then I started to add HttpClient and other libs > Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 > and fails > - > > Key: FLINK-9540 > URL: https://issues.apache.org/jira/browse/FLINK-9540 > Project: Flink > Issue Type: Bug >Reporter: Razvan >Priority: Blocker > > Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside > flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you > get: > > # > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # [http://www.apache.org/licenses/LICENSE-2.0] > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
[jira] [Commented] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails
[ https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505847#comment-16505847 ] Razvan commented on FLINK-9540: --- If you know of a way to use S3A without adding those dependencies I'd be grateful to hear about it :) / maybe you can post documentation link. Because up to now the only way it worked in previous versions was to copy required libraries to lib/. > Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 > and fails > - > > Key: FLINK-9540 > URL: https://issues.apache.org/jira/browse/FLINK-9540 > Project: Flink > Issue Type: Bug >Reporter: Razvan >Priority: Blocker > > Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside > flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you > get: > > # > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # [http://www.apache.org/licenses/LICENSE-2.0] > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > # > *version=2.8.1* > revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be > *branch=branch-2.8.1-private* > user=vinodkv > date=2017-06-07T21:22Z > *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*] > srcChecksum=60125541c2b3e266cbf3becc5bda666 > protocVersion=2.5.0 > > This will fail to work with dependencies described in > ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)] > > "Depending on which file system you use, please add the following > dependencies. You can find these as part of the Hadoop binaries in > {{hadoop-2.7/share/hadoop/tools/lib}}: > * {{S3AFileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies: > *** {{aws-java-sdk-core-1.11.183.jar}} > *** {{aws-java-sdk-kms-1.11.183.jar}} > *** {{jackson-annotations-2.6.7.jar}} > *** {{jackson-core-2.6.7.jar}} > *** {{jackson-databind-2.6.7.jar}} > *** {{joda-time-2.8.1.jar}} > *** {{httpcore-4.4.4.jar}} > *** {{httpclient-4.5.3.jar}} > * {{NativeS3FileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{guava-11.0.2.jar"}}{{}} > > {{ I presume the build task is flawed, it should be built for Apache Hadoop > 2.7 can someone please check it?}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails
[ https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan reopened FLINK-9540: --- > Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 > and fails > - > > Key: FLINK-9540 > URL: https://issues.apache.org/jira/browse/FLINK-9540 > Project: Flink > Issue Type: Bug >Reporter: Razvan >Priority: Blocker > > Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside > flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you > get: > > # > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # [http://www.apache.org/licenses/LICENSE-2.0] > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > # > *version=2.8.1* > revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be > *branch=branch-2.8.1-private* > user=vinodkv > date=2017-06-07T21:22Z > *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*] > srcChecksum=60125541c2b3e266cbf3becc5bda666 > protocVersion=2.5.0 > > This will fail to work with dependencies described in > ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)] > > "Depending on which file system you use, please add the following > dependencies. You can find these as part of the Hadoop binaries in > {{hadoop-2.7/share/hadoop/tools/lib}}: > * {{S3AFileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies: > *** {{aws-java-sdk-core-1.11.183.jar}} > *** {{aws-java-sdk-kms-1.11.183.jar}} > *** {{jackson-annotations-2.6.7.jar}} > *** {{jackson-core-2.6.7.jar}} > *** {{jackson-databind-2.6.7.jar}} > *** {{joda-time-2.8.1.jar}} > *** {{httpcore-4.4.4.jar}} > *** {{httpclient-4.5.3.jar}} > * {{NativeS3FileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{guava-11.0.2.jar"}}{{}} > > {{ I presume the build task is flawed, it should be built for Apache Hadoop > 2.7 can someone please check it?}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails
[ https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504723#comment-16504723 ] Razvan commented on FLINK-9540: --- Hi [~aljoscha], Not sure I understand the question :) So what I'm suggesting is flink-s3-fs-hadoop-1.4.2.jar delivered as built for Hadoop 2.7 is actually built for Hadoop 2.8 and Apache Flink cluster won't start with error mentioned above. These jars require a lot of other jars (which list is not accurate in documentation btw) to enable Apache Flink to work with S3. > Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 > and fails > - > > Key: FLINK-9540 > URL: https://issues.apache.org/jira/browse/FLINK-9540 > Project: Flink > Issue Type: Bug >Reporter: Razvan >Priority: Blocker > > Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside > flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you > get: > > # > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # [http://www.apache.org/licenses/LICENSE-2.0] > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > # > *version=2.8.1* > revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be > *branch=branch-2.8.1-private* > user=vinodkv > date=2017-06-07T21:22Z > *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*] > srcChecksum=60125541c2b3e266cbf3becc5bda666 > protocVersion=2.5.0 > > This will fail to work with dependencies described in > ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)] > > "Depending on which file system you use, please add the following > dependencies. You can find these as part of the Hadoop binaries in > {{hadoop-2.7/share/hadoop/tools/lib}}: > * {{S3AFileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies: > *** {{aws-java-sdk-core-1.11.183.jar}} > *** {{aws-java-sdk-kms-1.11.183.jar}} > *** {{jackson-annotations-2.6.7.jar}} > *** {{jackson-core-2.6.7.jar}} > *** {{jackson-databind-2.6.7.jar}} > *** {{joda-time-2.8.1.jar}} > *** {{httpcore-4.4.4.jar}} > *** {{httpclient-4.5.3.jar}} > * {{NativeS3FileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{guava-11.0.2.jar"}}{{}} > > {{ I presume the build task is flawed, it should be built for Apache Hadoop > 2.7 can someone please check it?}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails
[ https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9540: -- Description: Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you get: # # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # [http://www.apache.org/licenses/LICENSE-2.0] # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # *version=2.8.1* revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be *branch=branch-2.8.1-private* user=vinodkv date=2017-06-07T21:22Z *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*] srcChecksum=60125541c2b3e266cbf3becc5bda666 protocVersion=2.5.0 This will fail to work with dependencies described in ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)] "Depending on which file system you use, please add the following dependencies. You can find these as part of the Hadoop binaries in {{hadoop-2.7/share/hadoop/tools/lib}}: * {{S3AFileSystem}}: ** {{hadoop-aws-2.7.3.jar}} ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies: *** {{aws-java-sdk-core-1.11.183.jar}} *** {{aws-java-sdk-kms-1.11.183.jar}} *** {{jackson-annotations-2.6.7.jar}} *** {{jackson-core-2.6.7.jar}} *** {{jackson-databind-2.6.7.jar}} *** {{joda-time-2.8.1.jar}} *** {{httpcore-4.4.4.jar}} *** {{httpclient-4.5.3.jar}} * {{NativeS3FileSystem}}: ** {{hadoop-aws-2.7.3.jar}} ** {{guava-11.0.2.jar"}}{{}} {{ I presume the build task is flawed, it should be built for Apache Hadoop 2.7 can someone please check it?}} was: Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you get: # # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # *version=2.8.1* revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be *branch=branch-2.8.1-private* user=vinodkv date=2017-06-07T21:22Z *url=https://git-wip-us.apache.org/repos/asf/hadoop.git* srcChecksum=60125541c2b3e266cbf3becc5bda666 protocVersion=2.5.0 This will fail to work with dependencies described in ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)] "Depending on which file system you use, please add the following dependencies. You can find these as part of the Hadoop binaries in {{hadoop-2.7/share/hadoop/tools/lib}}: * {{S3AFileSystem}}: ** {{hadoop-aws-2.7.3.jar}} ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies: *** {{aws-java-sdk-core-1.11.183.jar}} *** {{aws-java-sdk-kms-1.11.183.jar}} *** {{jackson-annotations-2.6.7.jar}} *** {{jackson-core-2.6.7.jar}} *** {{jackson-databind-2.6.7.jar}} *** {{joda-time-2.8.1.jar}} *** {{httpcore-4.4.4.jar}} *** {{httpclient-4.5.3.jar}} * {{NativeS3FileSystem}}: ** {{hadoop-aws-2.7.3.jar}} ** {{guava-11.0.2.jar"}}{{}} {{ I presume the build task is flawed, it should be built for Apache Hadoop 2.7}} > Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 > and fails > - > > Key: FLINK-9540 > URL: https://issues.apache.org/jira/browse/FLINK-9540 > Project: Flink > Issue Type: Bug >Reporter: Razvan >Priority: Blocker > > Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside > flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you > get: >
[jira] [Updated] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails
[ https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9540: -- Summary: Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails (was: Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 and fails) > Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 > and fails > - > > Key: FLINK-9540 > URL: https://issues.apache.org/jira/browse/FLINK-9540 > Project: Flink > Issue Type: Bug >Reporter: Razvan >Priority: Blocker > > Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside > flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you > get: > > # > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # http://www.apache.org/licenses/LICENSE-2.0 > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > # > *version=2.8.1* > revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be > *branch=branch-2.8.1-private* > user=vinodkv > date=2017-06-07T21:22Z > *url=https://git-wip-us.apache.org/repos/asf/hadoop.git* > srcChecksum=60125541c2b3e266cbf3becc5bda666 > protocVersion=2.5.0 > > This will fail to work with dependencies described in > ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)] > > "Depending on which file system you use, please add the following > dependencies. You can find these as part of the Hadoop binaries in > {{hadoop-2.7/share/hadoop/tools/lib}}: > * {{S3AFileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies: > *** {{aws-java-sdk-core-1.11.183.jar}} > *** {{aws-java-sdk-kms-1.11.183.jar}} > *** {{jackson-annotations-2.6.7.jar}} > *** {{jackson-core-2.6.7.jar}} > *** {{jackson-databind-2.6.7.jar}} > *** {{joda-time-2.8.1.jar}} > *** {{httpcore-4.4.4.jar}} > *** {{httpclient-4.5.3.jar}} > * {{NativeS3FileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{guava-11.0.2.jar"}}{{}} > > {{ I presume the build task is flawed, it should be built for Apache Hadoop > 2.7}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9540) Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 and fails
[ https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9540: -- Description: Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you get: # # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # *version=2.8.1* revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be *branch=branch-2.8.1-private* user=vinodkv date=2017-06-07T21:22Z *url=https://git-wip-us.apache.org/repos/asf/hadoop.git* srcChecksum=60125541c2b3e266cbf3becc5bda666 protocVersion=2.5.0 This will fail to work with dependencies described in ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)] "Depending on which file system you use, please add the following dependencies. You can find these as part of the Hadoop binaries in {{hadoop-2.7/share/hadoop/tools/lib}}: * {{S3AFileSystem}}: ** {{hadoop-aws-2.7.3.jar}} ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies: *** {{aws-java-sdk-core-1.11.183.jar}} *** {{aws-java-sdk-kms-1.11.183.jar}} *** {{jackson-annotations-2.6.7.jar}} *** {{jackson-core-2.6.7.jar}} *** {{jackson-databind-2.6.7.jar}} *** {{joda-time-2.8.1.jar}} *** {{httpcore-4.4.4.jar}} *** {{httpclient-4.5.3.jar}} * {{NativeS3FileSystem}}: ** {{hadoop-aws-2.7.3.jar}} ** {{guava-11.0.2.jar"}}{{}} {{ I presume the build task is flawed, it should be built for Apache Hadoop 2.7}} was: Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you get: > Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 > and fails > > > Key: FLINK-9540 > URL: https://issues.apache.org/jira/browse/FLINK-9540 > Project: Flink > Issue Type: Bug >Reporter: Razvan >Priority: Blocker > > Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside > flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you > get: > > # > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # http://www.apache.org/licenses/LICENSE-2.0 > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > # > *version=2.8.1* > revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be > *branch=branch-2.8.1-private* > user=vinodkv > date=2017-06-07T21:22Z > *url=https://git-wip-us.apache.org/repos/asf/hadoop.git* > srcChecksum=60125541c2b3e266cbf3becc5bda666 > protocVersion=2.5.0 > > This will fail to work with dependencies described in > ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)] > > "Depending on which file system you use, please add the following > dependencies. You can find these as part of the Hadoop binaries in > {{hadoop-2.7/share/hadoop/tools/lib}}: > * {{S3AFileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies: > *** {{aws-java-sdk-core-1.11.183.jar}} > *** {{aws-java-sdk-kms-1.11.183.jar}} > *** {{jackson-annotations-2.6.7.jar}} > *** {{jackson-core-2.6.7.jar}} > *** {{jackson-databind-2.6.7.jar}} > *** {{joda-time-2.8.1.jar}} > *** {{httpcore-4.4.4.jar}} > *** {{httpclient-4.5.3.jar}} > * {{NativeS3FileSystem}}: > ** {{hadoop-aws-2.7.3.jar}} > ** {{guava-11.0.2.jar"}}{{}} > > {{ I presume the build task is flawed, it should
[jira] [Updated] (FLINK-9540) Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 and fails
[ https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9540: -- Description: Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you get: was:++Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. > Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 > and fails > > > Key: FLINK-9540 > URL: https://issues.apache.org/jira/browse/FLINK-9540 > Project: Flink > Issue Type: Bug >Reporter: Razvan >Priority: Blocker > > Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside > flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you > get: > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9540) Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 and fails
[ https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9540: -- Description: ++Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. > Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 > and fails > > > Key: FLINK-9540 > URL: https://issues.apache.org/jira/browse/FLINK-9540 > Project: Flink > Issue Type: Bug >Reporter: Razvan >Priority: Blocker > > ++Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9540) Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 and fails
Razvan created FLINK-9540: - Summary: Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 and fails Key: FLINK-9540 URL: https://issues.apache.org/jira/browse/FLINK-9540 Project: Flink Issue Type: Bug Reporter: Razvan -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear
[ https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9441: -- Description: To be able to use Apache Flink with S3, a few libraries from Hadoop distribution are required to be added to lib/ folder. This list is partially documented in [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink supports 2.8 also. How to compose the dependecy list for Hadoop 2.8? Is it possible to bundle the dependencies in a separate archive that users can download? UPDATE: Downloaded Apache Flink 1.4.2 for Hadoop 2.7 and it seems it was compiled with Hadoop 2.8. I get the error here: {{"java.lang.NumberFormatException: For input string: "100M" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1319) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) }} {{..."}} [https://stackoverflow.com/questions/48149929/hive-1-2-metastore-service-doesnt-start-after-configuring-it-to-s3-storage-inst?rq=1] So I cannot start the cluster. was: To be able to use Apache Flink with S3, a few libraries from Hadoop distribution are required to be added to lib/ folder. This list is partially documented in [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink supports 2.8 also. How to compose the dependecy list for Hadoop 2.8? Is it possible to bundle the dependencies in a separate archive that users can download? UPDATE: > Hadoop Required Dependency List Not Clear > - > > Key: FLINK-9441 > URL: https://issues.apache.org/jira/browse/FLINK-9441 > Project: Flink > Issue Type: Bug > Components: Build System, Configuration, Documentation >Affects Versions: 1.5.0, 1.4.2, 1.6.0 >Reporter: Razvan >Priority: Blocker > > To be able to use Apache Flink with S3, a few libraries from Hadoop > distribution are required to be added to lib/ folder. > This list is partially documented in > [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] > (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas > Flink supports 2.8 also. > How to compose the dependecy list for Hadoop 2.8? > Is it possible to bundle the dependencies in a separate archive that users > can download? > > UPDATE: > Downloaded Apache Flink 1.4.2 for Hadoop 2.7 and it seems it was compiled > with Hadoop 2.8. I get the error here: > > {{"java.lang.NumberFormatException: For input string: "100M" at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:589) at > java.lang.Long.parseLong(Long.java:631) at > org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1319) at > org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248) at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811) at > org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) }} > {{..."}} > [https://stackoverflow.com/questions/48149929/hive-1-2-metastore-service-doesnt-start-after-configuring-it-to-s3-storage-inst?rq=1] > > So I cannot start the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear
[ https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9441: -- Description: To be able to use Apache Flink with S3, a few libraries from Hadoop distribution are required to be added to lib/ folder. This list is partially documented in [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink supports 2.8 also. How to compose the dependecy list for Hadoop 2.8? Is it possible to bundle the dependencies in a separate archive that users can download? UPDATE: was: To be able to use Apache Flink with S3, a few libraries from Hadoop distribution are required to be added to lib/ folder. This list is partially documented in [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink supports 2.8 also. So i would like to know how to compose the dependecy list for Hadoop 2.8 and update the documentation. Also would it be possible to bundle the dependencies in a separate archive that users can download? > Hadoop Required Dependency List Not Clear > - > > Key: FLINK-9441 > URL: https://issues.apache.org/jira/browse/FLINK-9441 > Project: Flink > Issue Type: Bug > Components: Build System, Configuration, Documentation >Affects Versions: 1.5.0, 1.4.2, 1.6.0 >Reporter: Razvan >Priority: Blocker > > To be able to use Apache Flink with S3, a few libraries from Hadoop > distribution are required to be added to lib/ folder. > This list is partially documented in > [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] > (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas > Flink supports 2.8 also. > How to compose the dependecy list for Hadoop 2.8? > Is it possible to bundle the dependencies in a separate archive that users > can download? > > UPDATE: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear
[ https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9441: -- Component/s: Build System > Hadoop Required Dependency List Not Clear > - > > Key: FLINK-9441 > URL: https://issues.apache.org/jira/browse/FLINK-9441 > Project: Flink > Issue Type: Bug > Components: Build System, Configuration, Documentation >Affects Versions: 1.5.0, 1.4.2, 1.6.0 >Reporter: Razvan >Priority: Blocker > > To be able to use Apache Flink with S3, a few libraries from Hadoop > distribution are required to be added to lib/ folder. > This list is partially documented in > [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] > (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas > Flink supports 2.8 also. > So i would like to know how to compose the dependecy list for Hadoop 2.8 and > update the documentation. > Also would it be possible to bundle the dependencies in a separate archive > that users can download? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear
[ https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9441: -- Component/s: Configuration > Hadoop Required Dependency List Not Clear > - > > Key: FLINK-9441 > URL: https://issues.apache.org/jira/browse/FLINK-9441 > Project: Flink > Issue Type: Bug > Components: Build System, Configuration, Documentation >Affects Versions: 1.5.0, 1.4.2, 1.6.0 >Reporter: Razvan >Priority: Blocker > > To be able to use Apache Flink with S3, a few libraries from Hadoop > distribution are required to be added to lib/ folder. > This list is partially documented in > [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] > (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas > Flink supports 2.8 also. > So i would like to know how to compose the dependecy list for Hadoop 2.8 and > update the documentation. > Also would it be possible to bundle the dependencies in a separate archive > that users can download? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9446) Compatibility table not up-to-date
[ https://issues.apache.org/jira/browse/FLINK-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9446: -- Priority: Major (was: Minor) > Compatibility table not up-to-date > -- > > Key: FLINK-9446 > URL: https://issues.apache.org/jira/browse/FLINK-9446 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0, 1.4.2, 1.6.0 >Reporter: Razvan >Priority: Major > > The compatibility table > https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html has > not been updated since 1.3.x. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9446) Compatibility table not up-to-date
[ https://issues.apache.org/jira/browse/FLINK-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9446: -- Description: The compatibility table https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html has not been updated since 1.3.x. > Compatibility table not up-to-date > -- > > Key: FLINK-9446 > URL: https://issues.apache.org/jira/browse/FLINK-9446 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0, 1.4.2, 1.6.0 >Reporter: Razvan >Priority: Minor > > The compatibility table > https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html has > not been updated since 1.3.x. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9446) Compatibility table not up-to-date
[ https://issues.apache.org/jira/browse/FLINK-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9446: -- Affects Version/s: 1.6.0 1.5.0 1.4.2 > Compatibility table not up-to-date > -- > > Key: FLINK-9446 > URL: https://issues.apache.org/jira/browse/FLINK-9446 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0, 1.4.2, 1.6.0 >Reporter: Razvan >Priority: Minor > > The compatibility table > https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html has > not been updated since 1.3.x. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9446) Compatibility table not up-to-date
Razvan created FLINK-9446: - Summary: Compatibility table not up-to-date Key: FLINK-9446 URL: https://issues.apache.org/jira/browse/FLINK-9446 Project: Flink Issue Type: Bug Components: Documentation Reporter: Razvan -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear
[ https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9441: -- Affects Version/s: 1.6.0 > Hadoop Required Dependency List Not Clear > - > > Key: FLINK-9441 > URL: https://issues.apache.org/jira/browse/FLINK-9441 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0, 1.4.2, 1.6.0 >Reporter: Razvan >Priority: Blocker > > To be able to use Apache Flink with S3, a few libraries from Hadoop > distribution are required to be added to lib/ folder. > This list is partially documented in > [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] > (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas > Flink supports 2.8 also. > So i would like to know how to compose the dependecy list for Hadoop 2.8 and > update the documentation. > Also would it be possible to bundle the dependencies in a separate archive > that users can download? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear
[ https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9441: -- Description: To be able to use Apache Flink with S3, a few libraries from Hadoop distribution are required to be added to lib/ folder. This list is partially documented in [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink supports 2.8 also. So i would like to know how to compose the dependecy list for Hadoop 2.8 and update the documentation. Also would it be possible to bundle the dependencies in a separate archive that users can download? was: To be able to use Apache Flink with S3, a few libraries from Hadoop distribution are required to be added to lib/ folder of the distribution. This list is partially documented in [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink supports 2.8 also. So i would like to know how to compose the dependecy list for Hadoop 2.8 and update the documentation. Also would it be possible to bundle the dependencies in a separate archive that users can download? > Hadoop Required Dependency List Not Clear > - > > Key: FLINK-9441 > URL: https://issues.apache.org/jira/browse/FLINK-9441 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0, 1.4.2 >Reporter: Razvan >Priority: Blocker > > To be able to use Apache Flink with S3, a few libraries from Hadoop > distribution are required to be added to lib/ folder. > This list is partially documented in > [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] > (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas > Flink supports 2.8 also. > So i would like to know how to compose the dependecy list for Hadoop 2.8 and > update the documentation. > Also would it be possible to bundle the dependencies in a separate archive > that users can download? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear
[ https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-9441: -- Description: To be able to use Apache Flink with S3, a few libraries from Hadoop distribution are required to be added to lib/ folder of the distribution. This list is partially documented in [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink supports 2.8 also. So i would like to know how to compose the dependecy list for Hadoop 2.8 and update the documentation. Also would it be possible to bundle the dependencies in a separate archive that users can download? was: To be able to use Apache Flink with S3, a few libraries from Hadoop are required to be added to lib/ folder of the distribution. This list is partially documented in [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink supports 2.8 also. So i would like to know how to compose the dependecy list for Hadoop 2.8 and update the documentation. Also would it be possible to bundle the dependencies in a separate archive that users can download? > Hadoop Required Dependency List Not Clear > - > > Key: FLINK-9441 > URL: https://issues.apache.org/jira/browse/FLINK-9441 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0, 1.4.2 >Reporter: Razvan >Priority: Blocker > > To be able to use Apache Flink with S3, a few libraries from Hadoop > distribution are required to be added to lib/ folder of the distribution. > This list is partially documented in > [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] > (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas > Flink supports 2.8 also. > So i would like to know how to compose the dependecy list for Hadoop 2.8 and > update the documentation. > Also would it be possible to bundle the dependencies in a separate archive > that users can download? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9441) Hadoop Required Dependency List Not Clear
Razvan created FLINK-9441: - Summary: Hadoop Required Dependency List Not Clear Key: FLINK-9441 URL: https://issues.apache.org/jira/browse/FLINK-9441 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.4.2, 1.5.0 Reporter: Razvan To be able to use Apache Flink with S3, a few libraries from Hadoop are required to be added to lib/ folder of the distribution. This list is partially documented in [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html] (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink supports 2.8 also. So i would like to know how to compose the dependecy list for Hadoop 2.8 and update the documentation. Also would it be possible to bundle the dependencies in a separate archive that users can download? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-7803) Update savepoint Documentation
Razvan created FLINK-7803: - Summary: Update savepoint Documentation Key: FLINK-7803 URL: https://issues.apache.org/jira/browse/FLINK-7803 Project: Flink Issue Type: Bug Components: Documentation Reporter: Razvan Can you please update https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html to specify the savepoint location *MUST* always be a location accessible by all hosts? I spent quite some time believing it'S a bug and trying to find solutions, see https://issues.apache.org/jira/browse/FLINK-7750. It's not obvious in the current documentation and other might waste time also believing it's an actual issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7750) Strange behaviour in savepoints
[ https://issues.apache.org/jira/browse/FLINK-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199262#comment-16199262 ] Razvan commented on FLINK-7750: --- [~aljoscha] Many thanks for clarifying this. Is there a way to update documentation to specify the savepoint location has to be accessible from all hosts as it's not really clear here (https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html)? Is there a way I can update the documentation myself? > Strange behaviour in savepoints > --- > > Key: FLINK-7750 > URL: https://issues.apache.org/jira/browse/FLINK-7750 > Project: Flink > Issue Type: Bug > Components: Core >Affects Versions: 1.3.2 >Reporter: Razvan >Priority: Blocker > > I recently upgraded from 1.2.0 and savepoint creration behaves strange. > Whenever I try to create a savepoint with specified directory Apache Flink > creates a folder on the active JobManager (even if I trigger savepoint > creation from a different JobManager) which contains only _metadata. And > another folder on the TaskManager where the job is running which contains the > actual savepoint. > Obviously if I try to restore it says it can't find the savepoint. > This worked in 1.2.0. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7750) Strange behaviour in savepoints
[ https://issues.apache.org/jira/browse/FLINK-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187987#comment-16187987 ] Razvan commented on FLINK-7750: --- [~aljoscha] Hi, it's on the local disk of the JobManager whre I'm running the command from. But as I told it will only save the _metadata in this location and for the savepoint it will create the same folder location but on the TaskManager where the job is currently running. So I cannot use any of the locations to restore, I would need to copy all this to one folder and use that. With 1.2.0 it didn't split it like this, all could be found on the Jobmanager. (With distributed FS S3 works fine) > Strange behaviour in savepoints > --- > > Key: FLINK-7750 > URL: https://issues.apache.org/jira/browse/FLINK-7750 > Project: Flink > Issue Type: Bug > Components: Core >Affects Versions: 1.3.2 >Reporter: Razvan >Priority: Blocker > > I recently upgraded from 1.2.0 and savepoint creration behaves strange. > Whenever I try to create a savepoint with specified directory Apache Flink > creates a folder on the active JobManager (even if I trigger savepoint > creation from a different JobManager) which contains only _metadata. And > another folder on the TaskManager where the job is running which contains the > actual savepoint. > Obviously if I try to restore it says it can't find the savepoint. > This worked in 1.2.0. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-7750) Strange behaviour in savepoints
[ https://issues.apache.org/jira/browse/FLINK-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187987#comment-16187987 ] Razvan edited comment on FLINK-7750 at 10/2/17 12:55 PM: - [~aljoscha] Hi, it's on the local disk of the JobManager where I'm running the command from. But as I told it will only save the _metadata in this location and for the savepoint it will create the same folder location but on the TaskManager where the job is currently running. So I cannot use any of the locations to restore, I would need to copy all this to one folder and use that. With 1.2.0 it didn't split it like this, all could be found on the Jobmanager. (With distributed FS S3 works fine) was (Author: razvan): [~aljoscha] Hi, it's on the local disk of the JobManager whre I'm running the command from. But as I told it will only save the _metadata in this location and for the savepoint it will create the same folder location but on the TaskManager where the job is currently running. So I cannot use any of the locations to restore, I would need to copy all this to one folder and use that. With 1.2.0 it didn't split it like this, all could be found on the Jobmanager. (With distributed FS S3 works fine) > Strange behaviour in savepoints > --- > > Key: FLINK-7750 > URL: https://issues.apache.org/jira/browse/FLINK-7750 > Project: Flink > Issue Type: Bug > Components: Core >Affects Versions: 1.3.2 >Reporter: Razvan >Priority: Blocker > > I recently upgraded from 1.2.0 and savepoint creration behaves strange. > Whenever I try to create a savepoint with specified directory Apache Flink > creates a folder on the active JobManager (even if I trigger savepoint > creation from a different JobManager) which contains only _metadata. And > another folder on the TaskManager where the job is running which contains the > actual savepoint. > Obviously if I try to restore it says it can't find the savepoint. > This worked in 1.2.0. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-7750) Strange behaviour in savepoints
[ https://issues.apache.org/jira/browse/FLINK-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187875#comment-16187875 ] Razvan edited comment on FLINK-7750 at 10/2/17 11:47 AM: - Sure, the application is reading messages from Kafka then calls a service via HTTP. The state is made up of counters. I tried to use S3 (Configured state.savepoints.dir: [S3]:///flink/savepoints) and this works fine. Using RocksDB as state backend and S3 location. But if I specify folder location from JobManager it splits the files as I wrote in description. The job is dependant on Apache Flink version 1.2.0 (Maven dependency) , should this be an issue? was (Author: razvan): Sure, the application is reading messages from Kafka then calls a service via HTTP. The state is made up of counters. I tried to use S3 (Configured state.savepoints.dir: [S3]:///flink/savepoints) and this works fine. But if I specify folder location from JobManager it splits the files as I wrote in description. The job is dependant on Apache Flink version 1.2.0 (Maven dependency) , should this be an issue? > Strange behaviour in savepoints > --- > > Key: FLINK-7750 > URL: https://issues.apache.org/jira/browse/FLINK-7750 > Project: Flink > Issue Type: Bug > Components: Core >Affects Versions: 1.3.2 >Reporter: Razvan >Priority: Blocker > > I recently upgraded from 1.2.0 and savepoint creration behaves strange. > Whenever I try to create a savepoint with specified directory Apache Flink > creates a folder on the active JobManager (even if I trigger savepoint > creation from a different JobManager) which contains only _metadata. And > another folder on the TaskManager where the job is running which contains the > actual savepoint. > Obviously if I try to restore it says it can't find the savepoint. > This worked in 1.2.0. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7750) Strange behaviour in savepoints
[ https://issues.apache.org/jira/browse/FLINK-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187875#comment-16187875 ] Razvan commented on FLINK-7750: --- Sure, the application is reading messages from Kafka then calls a service via HTTP. The state is made up of counters. I tried to use S3 (Configured state.savepoints.dir: [S3]:///flink/savepoints) and this works fine. But if I specify folder location from JobManager it splits the files as I wrote in description. The job is dependant on Apache Flink version 1.2.0 (Maven dependency) , should this be an issue? > Strange behaviour in savepoints > --- > > Key: FLINK-7750 > URL: https://issues.apache.org/jira/browse/FLINK-7750 > Project: Flink > Issue Type: Bug > Components: Core >Affects Versions: 1.3.2 >Reporter: Razvan >Priority: Blocker > > I recently upgraded from 1.2.0 and savepoint creration behaves strange. > Whenever I try to create a savepoint with specified directory Apache Flink > creates a folder on the active JobManager (even if I trigger savepoint > creation from a different JobManager) which contains only _metadata. And > another folder on the TaskManager where the job is running which contains the > actual savepoint. > Obviously if I try to restore it says it can't find the savepoint. > This worked in 1.2.0. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7750) Strange behaviour in savepoints
Razvan created FLINK-7750: - Summary: Strange behaviour in savepoints Key: FLINK-7750 URL: https://issues.apache.org/jira/browse/FLINK-7750 Project: Flink Issue Type: Bug Components: Core Affects Versions: 1.3.2 Reporter: Razvan Priority: Blocker I recently upgraded from 1.2.0 and savepoint creration behaves strange. Whenever I try to create a savepoint with specified directory Apache Flink creates a folder on the active JobManager (even if I trigger savepoint creation from a different JobManager) which contains only _metadata. And another folder on the TaskManager where the job is running which contains the actual savepoint. Obviously if I try to restore it says it can't find the savepoint. This worked in 1.2.0. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-4587) Yet another java.lang.NoSuchFieldError: INSTANCE
[ https://issues.apache.org/jira/browse/FLINK-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059508#comment-16059508 ] Razvan commented on FLINK-4587: --- @[~RenkaiGe] I have the same issue. Using Flink 1.2 and created an application to call some REST endpoints using HttpClient. Does this mean I need to build Flink? > Yet another java.lang.NoSuchFieldError: INSTANCE > > > Key: FLINK-4587 > URL: https://issues.apache.org/jira/browse/FLINK-4587 > Project: Flink > Issue Type: Bug > Components: Build System, Local Runtime >Affects Versions: 1.2.0 > Environment: Latest SNAPSHOT >Reporter: Renkai Ge > Attachments: diff in mvn clean package.png, flink-explore-src.zip > > > For some reason I need to use org.apache.httpcomponents:httpasyncclient:4.1.2 > in flink. > The source file is: > {code} > import org.apache.flink.streaming.api.scala._ > import org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory > /** > * Created by renkai on 16/9/7. > */ > object Main { > def main(args: Array[String]): Unit = { > val instance = ManagedNHttpClientConnectionFactory.INSTANCE > println("instance = " + instance) > val env = StreamExecutionEnvironment.getExecutionEnvironment > val stream = env.fromCollection(1 to 100) > val result = stream.map { x => > x * 2 > } > result.print() > env.execute("xixi") > } > } > {code} > and > {code} > name := "flink-explore" > version := "1.0" > scalaVersion := "2.11.8" > crossPaths := false > libraryDependencies ++= Seq( > "org.apache.flink" %% "flink-scala" % "1.2-SNAPSHOT" > exclude("com.google.code.findbugs", "jsr305"), > "org.apache.flink" %% "flink-connector-kafka-0.8" % "1.2-SNAPSHOT" > exclude("com.google.code.findbugs", "jsr305"), > "org.apache.flink" %% "flink-streaming-scala" % "1.2-SNAPSHOT" > exclude("com.google.code.findbugs", "jsr305"), > "org.apache.flink" %% "flink-clients" % "1.2-SNAPSHOT" > exclude("com.google.code.findbugs", "jsr305"), > "org.apache.httpcomponents" % "httpasyncclient" % "4.1.2" > ) > {code} > I use `sbt assembly` to get a fat jar. > If I run the command > {code} > java -cp flink-explore-assembly-1.0.jar Main > {code} > I get the result > {code} > instance = > org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory@4909b8da > log4j:WARN No appenders could be found for logger > (org.apache.flink.api.scala.ClosureCleaner$). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Connected to JobManager at Actor[akka://flink/user/jobmanager_1#-1177584915] > 09/07/2016 12:05:26 Job execution switched to status RUNNING. > 09/07/2016 12:05:26 Source: Collection Source(1/1) switched to SCHEDULED > 09/07/2016 12:05:26 Source: Collection Source(1/1) switched to DEPLOYING > ... > 09/07/2016 12:05:26 Map -> Sink: Unnamed(20/24) switched to RUNNING > 09/07/2016 12:05:26 Map -> Sink: Unnamed(19/24) switched to RUNNING > 15> 30 > 20> 184 > ... > 19> 182 > 1> 194 > 8> 160 > 09/07/2016 12:05:26 Source: Collection Source(1/1) switched to FINISHED > ... > 09/07/2016 12:05:26 Map -> Sink: Unnamed(1/24) switched to FINISHED > 09/07/2016 12:05:26 Job execution switched to status FINISHED. > {code} > Nothing special. > But if I run the jar by > {code} > ./bin/flink run shop-monitor-flink-assembly-1.0.jar > {code} > I will get an error > {code} > $ ./bin/flink run flink-explore-assembly-1.0.jar > Cluster configuration: Standalone cluster with JobManager at /127.0.0.1:6123 > Using address 127.0.0.1:6123 to connect to JobManager. > JobManager web interface address http://127.0.0.1:8081 > Starting execution of program > > The program finished with the following exception: > java.lang.NoSuchFieldError: INSTANCE > at > org.apache.http.impl.nio.codecs.DefaultHttpRequestWriterFactory.(DefaultHttpRequestWriterFactory.java:53) > at > org.apache.http.impl.nio.codecs.DefaultHttpRequestWriterFactory.(DefaultHttpRequestWriterFactory.java:57) > at > org.apache.http.impl.nio.codecs.DefaultHttpRequestWriterFactory.(DefaultHttpRequestWriterFactory.java:47) > at > org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory.(ManagedNHttpClientConnectionFactory.java:75) > at > org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory.(ManagedNHttpClientConnectionFactory.java:83) > at > org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory.(ManagedNHttpClientConnectionFactory.java:64) > at Main$.main(Main.scala:9) > at Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >
[jira] [Closed] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan closed FLINK-6063. - Resolution: Not A Problem It's not an actual issue with the framework just isn't clear dfs MUST be used for HA. > HA Configuration doesn't work with Flink 1.2 > > > Key: FLINK-6063 > URL: https://issues.apache.org/jira/browse/FLINK-6063 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Razvan >Priority: Critical > Attachments: flink-conf.yaml, Logs.tar.gz, masters, slaves, zoo.cfg > > > I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 > TaskManagers. I start the Zookeeper Quorum from JobManager1, I get > confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink > job on this JobManager1. > > The flink-conf.yaml is the same on all 5 VMs (also everything else related > to flink because I copied the folder across all VMs as suggested in > tutorials) this means jobmanager.rpc.address: points to JobManager1 > everywhere. > If I turn off the VM running JobManager1 I would expect Zookeeper to say one > of the remaining JobManagers is the leader and the TaskManagers should > reconnect to it. Instead a new leader is elected but the slaves keep > connecting to the old master > 2017-03-15 10:28:28,655 INFO org.apache.flink.core.fs.FileSystem > - Ensuring all FileSystem streams are closed for Async > calls on Source: Custom Source -> Flat Map (1/1) > 2017-03-15 10:28:38,534 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Disassociated] > 2017-03-15 10:28:46,606 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:28:52,431 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:02,435 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:10,489 INFO > org.apache.flink.runtime.taskmanager.TaskManager - TaskManager > akka://flink/user/taskmanager disconnects from JobManager > akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its > leadership. > 2017-03-15 10:29:10,490 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Cancelling > all computations and discarding all cached data. > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Source: Custom Source > -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223). > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Custom Source -> Flat Map (1/1) > (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED. > java.lang.Exception: TaskManager akka://flink/user/taskmanager > disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: > Old JobManager lost its leadership. > at > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at
[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940707#comment-15940707 ] Razvan commented on FLINK-6063: --- Hi, it works with DFS for me though I'd underline it is required for HA more in the documentation. Thank you for the help! > HA Configuration doesn't work with Flink 1.2 > > > Key: FLINK-6063 > URL: https://issues.apache.org/jira/browse/FLINK-6063 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Razvan >Priority: Critical > Attachments: flink-conf.yaml, Logs.tar.gz, masters, slaves, zoo.cfg > > > I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 > TaskManagers. I start the Zookeeper Quorum from JobManager1, I get > confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink > job on this JobManager1. > > The flink-conf.yaml is the same on all 5 VMs (also everything else related > to flink because I copied the folder across all VMs as suggested in > tutorials) this means jobmanager.rpc.address: points to JobManager1 > everywhere. > If I turn off the VM running JobManager1 I would expect Zookeeper to say one > of the remaining JobManagers is the leader and the TaskManagers should > reconnect to it. Instead a new leader is elected but the slaves keep > connecting to the old master > 2017-03-15 10:28:28,655 INFO org.apache.flink.core.fs.FileSystem > - Ensuring all FileSystem streams are closed for Async > calls on Source: Custom Source -> Flat Map (1/1) > 2017-03-15 10:28:38,534 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Disassociated] > 2017-03-15 10:28:46,606 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:28:52,431 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:02,435 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:10,489 INFO > org.apache.flink.runtime.taskmanager.TaskManager - TaskManager > akka://flink/user/taskmanager disconnects from JobManager > akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its > leadership. > 2017-03-15 10:29:10,490 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Cancelling > all computations and discarding all cached data. > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Source: Custom Source > -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223). > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Custom Source -> Flat Map (1/1) > (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED. > java.lang.Exception: TaskManager akka://flink/user/taskmanager > disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: > Old JobManager lost its leadership. > at > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at
[jira] [Comment Edited] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929644#comment-15929644 ] Razvan edited comment on FLINK-6063 at 3/17/17 9:29 AM: [~till.rohrmann] Hi, I just had a chat with [~dawidwys] and he pointed the issue might be that I'm using local filesystem instead of distributed FS. Could that be it? Uploaded logs and configuration also. was (Author: razvan): [~till.rohrmann] Hi, I just had a chat with [~dawidwys] and he pointed the issue might be that I'm using local filesystem instead of distributed FS. Could that be it? > HA Configuration doesn't work with Flink 1.2 > > > Key: FLINK-6063 > URL: https://issues.apache.org/jira/browse/FLINK-6063 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Razvan >Priority: Critical > Attachments: flink-conf.yaml, Logs.tar.gz, masters, slaves, zoo.cfg > > > I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 > TaskManagers. I start the Zookeeper Quorum from JobManager1, I get > confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink > job on this JobManager1. > > The flink-conf.yaml is the same on all 5 VMs (also everything else related > to flink because I copied the folder across all VMs as suggested in > tutorials) this means jobmanager.rpc.address: points to JobManager1 > everywhere. > If I turn off the VM running JobManager1 I would expect Zookeeper to say one > of the remaining JobManagers is the leader and the TaskManagers should > reconnect to it. Instead a new leader is elected but the slaves keep > connecting to the old master > 2017-03-15 10:28:28,655 INFO org.apache.flink.core.fs.FileSystem > - Ensuring all FileSystem streams are closed for Async > calls on Source: Custom Source -> Flat Map (1/1) > 2017-03-15 10:28:38,534 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Disassociated] > 2017-03-15 10:28:46,606 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:28:52,431 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:02,435 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:10,489 INFO > org.apache.flink.runtime.taskmanager.TaskManager - TaskManager > akka://flink/user/taskmanager disconnects from JobManager > akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its > leadership. > 2017-03-15 10:29:10,490 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Cancelling > all computations and discarding all cached data. > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Source: Custom Source > -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223). > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Custom Source -> Flat Map (1/1) > (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED. > java.lang.Exception: TaskManager akka://flink/user/taskmanager > disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: > Old JobManager lost its leadership. > at > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at >
[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929644#comment-15929644 ] Razvan commented on FLINK-6063: --- [~till.rohrmann] Hi, I just had a chat with [~dawidwys] and he pointed the issue might be that I'm using local filesystem instead of distributed FS. Could that be it? > HA Configuration doesn't work with Flink 1.2 > > > Key: FLINK-6063 > URL: https://issues.apache.org/jira/browse/FLINK-6063 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Razvan >Priority: Critical > Attachments: flink-conf.yaml, Logs.tar.gz, masters, slaves, zoo.cfg > > > I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 > TaskManagers. I start the Zookeeper Quorum from JobManager1, I get > confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink > job on this JobManager1. > > The flink-conf.yaml is the same on all 5 VMs (also everything else related > to flink because I copied the folder across all VMs as suggested in > tutorials) this means jobmanager.rpc.address: points to JobManager1 > everywhere. > If I turn off the VM running JobManager1 I would expect Zookeeper to say one > of the remaining JobManagers is the leader and the TaskManagers should > reconnect to it. Instead a new leader is elected but the slaves keep > connecting to the old master > 2017-03-15 10:28:28,655 INFO org.apache.flink.core.fs.FileSystem > - Ensuring all FileSystem streams are closed for Async > calls on Source: Custom Source -> Flat Map (1/1) > 2017-03-15 10:28:38,534 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Disassociated] > 2017-03-15 10:28:46,606 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:28:52,431 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:02,435 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:10,489 INFO > org.apache.flink.runtime.taskmanager.TaskManager - TaskManager > akka://flink/user/taskmanager disconnects from JobManager > akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its > leadership. > 2017-03-15 10:29:10,490 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Cancelling > all computations and discarding all cached data. > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Source: Custom Source > -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223). > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Custom Source -> Flat Map (1/1) > (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED. > java.lang.Exception: TaskManager akka://flink/user/taskmanager > disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: > Old JobManager lost its leadership. > at > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at >
[jira] [Updated] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Razvan updated FLINK-6063: -- Attachment: zoo.cfg slaves masters flink-conf.yaml Logs.tar.gz > HA Configuration doesn't work with Flink 1.2 > > > Key: FLINK-6063 > URL: https://issues.apache.org/jira/browse/FLINK-6063 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Razvan >Priority: Critical > Attachments: flink-conf.yaml, Logs.tar.gz, masters, slaves, zoo.cfg > > > I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 > TaskManagers. I start the Zookeeper Quorum from JobManager1, I get > confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink > job on this JobManager1. > > The flink-conf.yaml is the same on all 5 VMs (also everything else related > to flink because I copied the folder across all VMs as suggested in > tutorials) this means jobmanager.rpc.address: points to JobManager1 > everywhere. > If I turn off the VM running JobManager1 I would expect Zookeeper to say one > of the remaining JobManagers is the leader and the TaskManagers should > reconnect to it. Instead a new leader is elected but the slaves keep > connecting to the old master > 2017-03-15 10:28:28,655 INFO org.apache.flink.core.fs.FileSystem > - Ensuring all FileSystem streams are closed for Async > calls on Source: Custom Source -> Flat Map (1/1) > 2017-03-15 10:28:38,534 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Disassociated] > 2017-03-15 10:28:46,606 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:28:52,431 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:02,435 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:10,489 INFO > org.apache.flink.runtime.taskmanager.TaskManager - TaskManager > akka://flink/user/taskmanager disconnects from JobManager > akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its > leadership. > 2017-03-15 10:29:10,490 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Cancelling > all computations and discarding all cached data. > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Source: Custom Source > -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223). > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Custom Source -> Flat Map (1/1) > (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED. > java.lang.Exception: TaskManager akka://flink/user/taskmanager > disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: > Old JobManager lost its leadership. > at > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at
[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928047#comment-15928047 ] Razvan commented on FLINK-6063: --- Sure, here id 0x15ad68d898c0005 2017-03-16 09:58:17,319 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /1.2.3.5:53748 2017-03-16 09:58:17,320 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /1.2.3.5:53748 2017-03-16 09:58:17,322 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x15ad68d898c0006 with negotiated timeout 4 for client /1.2.3.5:53748 2017-03-16 09:58:18,336 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.5:53748 which had sessionid 0x15ad68d898c0006 2017-03-16 10:10:23,881 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15ad68d898c0001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-03-16 10:10:23,885 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.4:45752 which had sessionid 0x15ad68d898c0001 2017-03-16 10:10:23,885 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15ad68d898c0002, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-03-16 10:10:23,887 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.4:45754 which had sessionid 0x15ad68d898c0002 2017-03-16 13:11:21,841 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /[Client 2 IP here]:57692 2017-03-16 13:11:34,787 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to renew session 0x35ad68d8b4d0005 at /[Client 2 IP here]:57692 2017-03-16 13:11:34,787 INFO org.apache.zookeeper.server.quorum.Learner - Revalidating client: 0x35ad68d8b4d0005 2017-03-16 13:11:34,788 INFO org.apache.zookeeper.server.ZooKeeperServer - Invalid session 0x35ad68d8b4d0005 for client /[Client 2 IP here]:57692, probably expired 2017-03-16 13:11:34,789 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /[Client 2 IP here]:57692 which had sessionid 0x35ad68d8b4d0005 > HA Configuration doesn't work with Flink 1.2 > > > Key: FLINK-6063 > URL: https://issues.apache.org/jira/browse/FLINK-6063 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Razvan >Priority: Critical > > I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 > TaskManagers. I start the Zookeeper Quorum from JobManager1, I get > confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink > job on this JobManager1. > > The flink-conf.yaml is the same on all 5 VMs (also everything else related > to flink because I copied the folder across all VMs as suggested in > tutorials) this means jobmanager.rpc.address: points to JobManager1 > everywhere. > If I turn off the VM running JobManager1 I would expect Zookeeper to say one > of the remaining JobManagers is the leader and the TaskManagers should > reconnect to it. Instead a new leader is elected but the slaves keep > connecting to the old master > 2017-03-15 10:28:28,655 INFO org.apache.flink.core.fs.FileSystem > - Ensuring all FileSystem streams are closed for Async > calls on Source: Custom Source -> Flat Map (1/1) > 2017-03-15 10:28:38,534 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Disassociated] > 2017-03-15 10:28:46,606 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused >
[jira] [Comment Edited] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927872#comment-15927872 ] Razvan edited comment on FLINK-6063 at 3/16/17 11:42 AM: - Below the code for the Job (Main Class): import java.util.concurrent.TimeUnit; import org.apache.flink.api.common.restartstrategy.RestartStrategies; import org.apache.flink.api.common.time.Time; import org.apache.flink.streaming.api.CheckpointingMode; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.CheckpointConfig; import org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010; import org.apache.flink.streaming.util.serialization.SimpleStringSchema; import com.test.event.reader.SimpleEventReader; public class StreamingAccuracyJob { private final static String JOB_NAME = "Prototype CEP"; final static StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); private static void configureEnvironment() { // start a checkpoint every 1 ms env.enableCheckpointing(1); // advanced options: // set mode to exactly-once (this is the default) env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE); // make sure 500 ms of progress happen between checkpoints env.getCheckpointConfig().setMinPauseBetweenCheckpoints(5000); // checkpoints have to complete within one minute, or are discarded env.getCheckpointConfig().setCheckpointTimeout(6); // allow only one checkpoint to be in progress at the same time env.getCheckpointConfig().setMaxConcurrentCheckpoints(1); env.setRestartStrategy(RestartStrategies.fixedDelayRestart(50, // number // of // restart // attempts Time.of(10, TimeUnit.SECONDS) // delay )); env.setParallelism(1); // env.setMaxParallelism(1); // https://issues.apache.org/jira/browse/FLINK-5773 CheckpointConfig config = env.getCheckpointConfig(); config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION); } public static void main(String[] args) throws Exception { DataStream stream = env.addSource(new FlinkKafkaConsumer010<>(KafkaConfiguration.TOPIC_NAME, new SimpleStringSchema(), KafkaConfiguration.getConnectionProperties())); configureEnvironment(); DataStream streamTuples = stream.flatMap(new JsonToTupleFlatMap()); streamTuples.keyBy(SimpleEventReader.FIELD_USERID_TUPLE_POSITION).flatMap(new AccuracyTestEvents()); env.execute(JOB_NAME); } } was (Author: razvan): Below the code for the Job (Main Class): import java.util.concurrent.TimeUnit; import org.apache.flink.api.common.restartstrategy.RestartStrategies; import org.apache.flink.api.common.time.Time; import org.apache.flink.streaming.api.CheckpointingMode; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.CheckpointConfig; import org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010; import org.apache.flink.streaming.util.serialization.SimpleStringSchema; import com.booxware.event.reader.SimpleEventReader; public class StreamingAccuracyJob { private final static String JOB_NAME = "Prototype CEP"; final static StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); private static void configureEnvironment() { // start a checkpoint every 1 ms env.enableCheckpointing(1); // advanced options: // set mode to exactly-once (this is the default) env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927872#comment-15927872 ] Razvan commented on FLINK-6063: --- Below the code for the Job (Main Class): import java.util.concurrent.TimeUnit; import org.apache.flink.api.common.restartstrategy.RestartStrategies; import org.apache.flink.api.common.time.Time; import org.apache.flink.streaming.api.CheckpointingMode; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.CheckpointConfig; import org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010; import org.apache.flink.streaming.util.serialization.SimpleStringSchema; import com.booxware.event.reader.SimpleEventReader; public class StreamingAccuracyJob { private final static String JOB_NAME = "Prototype CEP"; final static StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); private static void configureEnvironment() { // start a checkpoint every 1 ms env.enableCheckpointing(1); // advanced options: // set mode to exactly-once (this is the default) env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE); // make sure 500 ms of progress happen between checkpoints env.getCheckpointConfig().setMinPauseBetweenCheckpoints(5000); // checkpoints have to complete within one minute, or are discarded env.getCheckpointConfig().setCheckpointTimeout(6); // allow only one checkpoint to be in progress at the same time env.getCheckpointConfig().setMaxConcurrentCheckpoints(1); env.setRestartStrategy(RestartStrategies.fixedDelayRestart(50, // number // of // restart // attempts Time.of(10, TimeUnit.SECONDS) // delay )); env.setParallelism(1); // env.setMaxParallelism(1); // https://issues.apache.org/jira/browse/FLINK-5773 CheckpointConfig config = env.getCheckpointConfig(); config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION); } public static void main(String[] args) throws Exception { DataStream stream = env.addSource(new FlinkKafkaConsumer010<>(KafkaConfiguration.TOPIC_NAME, new SimpleStringSchema(), KafkaConfiguration.getConnectionProperties())); configureEnvironment(); DataStream streamTuples = stream.flatMap(new JsonToTupleFlatMap()); streamTuples.keyBy(SimpleEventReader.FIELD_USERID_TUPLE_POSITION).flatMap(new AccuracyTestEvents()); env.execute(JOB_NAME); } } > HA Configuration doesn't work with Flink 1.2 > > > Key: FLINK-6063 > URL: https://issues.apache.org/jira/browse/FLINK-6063 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Razvan >Priority: Critical > > I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 > TaskManagers. I start the Zookeeper Quorum from JobManager1, I get > confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink > job on this JobManager1. > > The flink-conf.yaml is the same on all 5 VMs (also everything else related > to flink because I copied the folder across all VMs as suggested in > tutorials) this means jobmanager.rpc.address: points to JobManager1 > everywhere. > If I turn off the VM running JobManager1 I would expect Zookeeper to say one > of the remaining JobManagers is the leader and the TaskManagers should > reconnect to it. Instead a new leader is elected but the slaves keep > connecting to the old master > 2017-03-15 10:28:28,655 INFO org.apache.flink.core.fs.FileSystem > - Ensuring all FileSystem streams are closed for Async > calls on Source: Custom Source -> Flat Map (1/1) > 2017-03-15 10:28:38,534 WARN
[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927844#comment-15927844 ] Razvan commented on FLINK-6063: --- Hi Dawid, Sorry to say but it's not fine, as you can see from the logs the job is not resumed, this was the purpose of HA and also TaskManagers connecting to the old master shows they didn't acknowledge the leader > HA Configuration doesn't work with Flink 1.2 > > > Key: FLINK-6063 > URL: https://issues.apache.org/jira/browse/FLINK-6063 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.2.0 >Reporter: Razvan >Priority: Critical > > I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 > TaskManagers. I start the Zookeeper Quorum from JobManager1, I get > confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink > job on this JobManager1. > > The flink-conf.yaml is the same on all 5 VMs (also everything else related > to flink because I copied the folder across all VMs as suggested in > tutorials) this means jobmanager.rpc.address: points to JobManager1 > everywhere. > If I turn off the VM running JobManager1 I would expect Zookeeper to say one > of the remaining JobManagers is the leader and the TaskManagers should > reconnect to it. Instead a new leader is elected but the slaves keep > connecting to the old master > 2017-03-15 10:28:28,655 INFO org.apache.flink.core.fs.FileSystem > - Ensuring all FileSystem streams are closed for Async > calls on Source: Custom Source -> Flat Map (1/1) > 2017-03-15 10:28:38,534 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Disassociated] > 2017-03-15 10:28:46,606 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:28:52,431 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:02,435 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] > ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused > by: [Connection refused: /1.2.3.4:44779] > 2017-03-15 10:29:10,489 INFO > org.apache.flink.runtime.taskmanager.TaskManager - TaskManager > akka://flink/user/taskmanager disconnects from JobManager > akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its > leadership. > 2017-03-15 10:29:10,490 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Cancelling > all computations and discarding all cached data. > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Source: Custom Source > -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223). > 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Custom Source -> Flat Map (1/1) > (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED. > java.lang.Exception: TaskManager akka://flink/user/taskmanager > disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: > Old JobManager lost its leadership. > at > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) >
[jira] [Comment Edited] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927786#comment-15927786 ] Razvan edited comment on FLINK-6063 at 3/16/17 10:34 AM: - Hi Till, thanks for replying, sure I can attach the logs you mentioned Started Zookeeper then cluster @~9:58, killed leader JobManager @~10:10 TaskManagers stop retrying @~10:14 Job log: Cluster configuration: Standalone cluster with JobManager at /1.2.3.4:45164 Using address 1.2.3.4:45164 to connect to JobManager. JobManager web interface address http://1.2.3.4:8081 Starting execution of program Submitting job with JobID: a7c96ad4345c1f07fe666bc5fd78256f. Waiting for job completion. Connected to JobManager at Actor[akka.tcp://flink@1.2.3.4:45164/user/jobmanager#-1418996734] 03/16/2017 09:58:23 Job execution switched to status RUNNING. 03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to SCHEDULED 03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to DEPLOYING 03/16/2017 09:58:23 Flat Map(1/1) switched to SCHEDULED 03/16/2017 09:58:23 Flat Map(1/1) switched to DEPLOYING 03/16/2017 09:58:23 Flat Map(1/1) switched to RUNNING 03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to RUNNING New JobManager elected. Connecting to null Connected to JobManager at Actor[akka.tcp://flink@1.2.3.5:34987/user/jobmanager#-27372488] Killed JobManager log: 2017-03-16 09:58:14,953 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /[Client 1 IP here]:40858 2017-03-16 09:58:14,953 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /[Client 1 IP here]:40858 2017-03-16 09:58:14,957 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x35ad68d8b4d0004 with negotiated timeout 4 for client /[Client 1 IP here]:40858 2017-03-16 09:58:15,523 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /[Client 2 IP here]:40276 2017-03-16 09:58:15,528 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /[Client 2 IP here]:40276 2017-03-16 09:58:15,531 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x35ad68d8b4d0005 with negotiated timeout 4 for client /[Client 2 IP here]:40276 2017-03-16 10:10:25,118 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x35ad68d8b4d0002, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-03-16 10:10:25,120 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.4:47872 which had sessionid 0x35ad68d8b4d0002 New Leader log: 2017-03-16 09:58:17,319 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /1.2.3.5:53748 2017-03-16 09:58:17,320 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /1.2.3.5:53748 2017-03-16 09:58:17,322 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x15ad68d898c0006 with negotiated timeout 4 for client /1.2.3.5:53748 2017-03-16 09:58:18,336 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.5:53748 which had sessionid 0x15ad68d898c0006 2017-03-16 10:10:23,881 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15ad68d898c0001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-03-16 10:10:23,885 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.4:45752 which had sessionid 0x15ad68d898c0001 2017-03-16 10:10:23,885 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15ad68d898c0002, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at
[jira] [Comment Edited] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927786#comment-15927786 ] Razvan edited comment on FLINK-6063 at 3/16/17 10:28 AM: - Hi Till, thanks for replying, sure I can attach the logs you mentioned Started Zookeeper then cluster @~9:58, killed leader JobManager @~10:10 Whole cluster died @~10:14 Job log: Cluster configuration: Standalone cluster with JobManager at /1.2.3.4:45164 Using address 1.2.3.4:45164 to connect to JobManager. JobManager web interface address http://1.2.3.4:8081 Starting execution of program Submitting job with JobID: a7c96ad4345c1f07fe666bc5fd78256f. Waiting for job completion. Connected to JobManager at Actor[akka.tcp://flink@1.2.3.4:45164/user/jobmanager#-1418996734] 03/16/2017 09:58:23 Job execution switched to status RUNNING. 03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to SCHEDULED 03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to DEPLOYING 03/16/2017 09:58:23 Flat Map(1/1) switched to SCHEDULED 03/16/2017 09:58:23 Flat Map(1/1) switched to DEPLOYING 03/16/2017 09:58:23 Flat Map(1/1) switched to RUNNING 03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to RUNNING New JobManager elected. Connecting to null Connected to JobManager at Actor[akka.tcp://flink@1.2.3.5:34987/user/jobmanager#-27372488] Killed JobManager log: 2017-03-16 09:58:14,953 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /[Client 1 IP here]:40858 2017-03-16 09:58:14,953 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /[Client 1 IP here]:40858 2017-03-16 09:58:14,957 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x35ad68d8b4d0004 with negotiated timeout 4 for client /[Client 1 IP here]:40858 2017-03-16 09:58:15,523 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /[Client 2 IP here]:40276 2017-03-16 09:58:15,528 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /[Client 2 IP here]:40276 2017-03-16 09:58:15,531 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x35ad68d8b4d0005 with negotiated timeout 4 for client /[Client 2 IP here]:40276 2017-03-16 10:10:25,118 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x35ad68d8b4d0002, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-03-16 10:10:25,120 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.4:47872 which had sessionid 0x35ad68d8b4d0002 New Leader log: 2017-03-16 09:58:17,319 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /1.2.3.5:53748 2017-03-16 09:58:17,320 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /1.2.3.5:53748 2017-03-16 09:58:17,322 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x15ad68d898c0006 with negotiated timeout 4 for client /1.2.3.5:53748 2017-03-16 09:58:18,336 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.5:53748 which had sessionid 0x15ad68d898c0006 2017-03-16 10:10:23,881 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15ad68d898c0001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-03-16 10:10:23,885 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.4:45752 which had sessionid 0x15ad68d898c0001 2017-03-16 10:10:23,885 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15ad68d898c0002, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at
[jira] [Comment Edited] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927786#comment-15927786 ] Razvan edited comment on FLINK-6063 at 3/16/17 10:24 AM: - Hi Till, thanks for replying, sure I can attach the logs you mentioned Started Zookeeper then cluster @~9:58, killed leader JobManager @~10:10 Whole cluster died @~10:14 Job log: Cluster configuration: Standalone cluster with JobManager at /1.2.3.4:44307 Using address 1.2.3.4:44307 to connect to JobManager. JobManager web interface address http://1.2.3.4:8081 Starting execution of program Submitting job with JobID: 2c64b1126f327261b0c43f33f3cf43ee. Waiting for job completion. Connected to JobManager at Actor[akka.tcp://flink@1.2.3.4:44307/user/jobmanager#2001981191] 03/16/2017 09:40:10 Job execution switched to status RUNNING. 03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to SCHEDULED 03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to DEPLOYING 03/16/2017 09:40:10 Flat Map(1/1) switched to SCHEDULED 03/16/2017 09:40:10 Flat Map(1/1) switched to DEPLOYING 03/16/2017 09:40:10 Flat Map(1/1) switched to RUNNING 03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to RUNNING New JobManager elected. Connecting to null Connected to JobManager at Actor[akka.tcp://flink@1.2.3.5:43828/user/jobmanager#1400235434] Killed JobManager log: 2017-03-16 09:58:14,953 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /[Client 1 IP here]:40858 2017-03-16 09:58:14,953 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /[Client 1 IP here]:40858 2017-03-16 09:58:14,957 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x35ad68d8b4d0004 with negotiated timeout 4 for client /[Client 1 IP here]:40858 2017-03-16 09:58:15,523 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /[Client 2 IP here]:40276 2017-03-16 09:58:15,528 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /[Client 2 IP here]:40276 2017-03-16 09:58:15,531 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x35ad68d8b4d0005 with negotiated timeout 4 for client /[Client 2 IP here]:40276 2017-03-16 10:10:25,118 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x35ad68d8b4d0002, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-03-16 10:10:25,120 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.4:47872 which had sessionid 0x35ad68d8b4d0002 New Leader log: 2017-03-16 09:58:17,319 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /1.2.3.5:53748 2017-03-16 09:58:17,320 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /1.2.3.5:53748 2017-03-16 09:58:17,322 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x15ad68d898c0006 with negotiated timeout 4 for client /1.2.3.5:53748 2017-03-16 09:58:18,336 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.5:53748 which had sessionid 0x15ad68d898c0006 2017-03-16 10:10:23,881 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15ad68d898c0001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-03-16 10:10:23,885 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.4:45752 which had sessionid 0x15ad68d898c0001 2017-03-16 10:10:23,885 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15ad68d898c0002, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at
[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
[ https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927786#comment-15927786 ] Razvan commented on FLINK-6063: --- Hi Till, thanks for replying, sure I can attach the logs you mentioned Cluster configuration: Standalone cluster with JobManager at /1.2.3.4:44307 Using address 1.2.3.4:44307 to connect to JobManager. JobManager web interface address http://1.2.3.4:8081 Starting execution of program Submitting job with JobID: 2c64b1126f327261b0c43f33f3cf43ee. Waiting for job completion. Connected to JobManager at Actor[akka.tcp://flink@1.2.3.4:44307/user/jobmanager#2001981191] 03/16/2017 09:40:10 Job execution switched to status RUNNING. 03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to SCHEDULED 03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to DEPLOYING 03/16/2017 09:40:10 Flat Map(1/1) switched to SCHEDULED 03/16/2017 09:40:10 Flat Map(1/1) switched to DEPLOYING 03/16/2017 09:40:10 Flat Map(1/1) switched to RUNNING 03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to RUNNING New JobManager elected. Connecting to null Connected to JobManager at Actor[akka.tcp://flink@1.2.3.5:43828/user/jobmanager#1400235434] Killed JobManager 2017-03-16 09:58:14,953 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /[Client 1 IP here]:40858 2017-03-16 09:58:14,953 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /[Client 1 IP here]:40858 2017-03-16 09:58:14,957 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x35ad68d8b4d0004 with negotiated timeout 4 for client /[Client 1 IP here]:40858 2017-03-16 09:58:15,523 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /[Client 2 IP here]:40276 2017-03-16 09:58:15,528 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /[Client 2 IP here]:40276 2017-03-16 09:58:15,531 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x35ad68d8b4d0005 with negotiated timeout 4 for client /[Client 2 IP here]:40276 2017-03-16 10:10:25,118 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x35ad68d8b4d0002, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-03-16 10:10:25,120 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.4:47872 which had sessionid 0x35ad68d8b4d0002 New Leader 2017-03-16 09:58:17,319 INFO org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /1.2.3.5:53748 2017-03-16 09:58:17,320 INFO org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /1.2.3.5:53748 2017-03-16 09:58:17,322 INFO org.apache.zookeeper.server.ZooKeeperServer - Established session 0x15ad68d898c0006 with negotiated timeout 4 for client /1.2.3.5:53748 2017-03-16 09:58:18,336 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.5:53748 which had sessionid 0x15ad68d898c0006 2017-03-16 10:10:23,881 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15ad68d898c0001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-03-16 10:10:23,885 INFO org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /1.2.3.4:45752 which had sessionid 0x15ad68d898c0001 2017-03-16 10:10:23,885 WARN org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15ad68d898c0002, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-03-16 10:10:23,887 INFO org.apache.zookeeper.server.NIOServerCnxn -
[jira] [Created] (FLINK-6063) HA Configuration doesn't work with Flink 1.2
Razvan created FLINK-6063: - Summary: HA Configuration doesn't work with Flink 1.2 Key: FLINK-6063 URL: https://issues.apache.org/jira/browse/FLINK-6063 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 1.2.0 Reporter: Razvan Priority: Critical I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 TaskManagers. I start the Zookeeper Quorum from JobManager1, I get confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink job on this JobManager1. The flink-conf.yaml is the same on all 5 VMs (also everything else related to flink because I copied the folder across all VMs as suggested in tutorials) this means jobmanager.rpc.address: points to JobManager1 everywhere. If I turn off the VM running JobManager1 I would expect Zookeeper to say one of the remaining JobManagers is the leader and the TaskManagers should reconnect to it. Instead a new leader is elected but the slaves keep connecting to the old master 2017-03-15 10:28:28,655 INFO org.apache.flink.core.fs.FileSystem - Ensuring all FileSystem streams are closed for Async calls on Source: Custom Source -> Flat Map (1/1) 2017-03-15 10:28:38,534 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 2017-03-15 10:28:46,606 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused by: [Connection refused: /1.2.3.4:44779] 2017-03-15 10:28:52,431 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused by: [Connection refused: /1.2.3.4:44779] 2017-03-15 10:29:02,435 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused by: [Connection refused: /1.2.3.4:44779] 2017-03-15 10:29:10,489 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its leadership. 2017-03-15 10:29:10,490 INFO org.apache.flink.runtime.taskmanager.TaskManager - Cancelling all computations and discarding all cached data. 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Source: Custom Source -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223). 2017-03-15 10:29:10,491 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED. java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its leadership. at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074) at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426) at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:122) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at