[jira] [Comment Edited] (FLINK-10538) standalone-job.sh causes Classpath issues

2018-10-18 Thread Razvan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655216#comment-16655216
 ] 

Razvan edited comment on FLINK-10538 at 10/18/18 1:55 PM:
--

[~till.rohrmann] Yes, we use Aws S3 for state. But can't the dependencies be 
shaded?


was (Author: razvan):
@Till  Yes, we use Aws S3 for state. But can't the dependencies be shaded?

> standalone-job.sh causes Classpath issues
> -
>
> Key: FLINK-10538
> URL: https://issues.apache.org/jira/browse/FLINK-10538
> Project: Flink
>  Issue Type: Bug
>  Components: Docker, Job-Submission, Kubernetes
>Affects Versions: 1.6.0, 1.6.1, 1.6.2, 1.7.0
>Reporter: Razvan
>Priority: Major
> Fix For: 1.7.0
>
>
> When launching a job with the cluster through this script it creates 
> dependency issues.
>  
> We have a job which uses AsyncHttpClient, which uses the netty library. When 
> building/running a Docker image for a Flink job cluster on Kubernetes 
> (build.sh 
> [https://github.com/apache/flink/blob/release-1.6/flink-container/docker/build.sh])
>  will copy our given artifact to a file called "job.jar" in the lib/ folder 
> of the distribution inside the container.
> Upon runtime (standalone-job.sh) we get:
>  
> {code:java}
> 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph  - 
> StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> 
> Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched 
> from RUNNING to FAILED.
> java.lang.NoSuchMethodError: 
> io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext;
> at 
> io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452)
> at 
> org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58)
> at 
> org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73)
> at 
> org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100)
> at 
> org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89)
> at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32)
> at 
> com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128)
> at 
> com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51)
> {code}
>   
> It's because it loads Apache Flink's Netty dependency first
>  
> {code:java}
> [Loaded io.netty.handler.codec.http.HttpObject from 
> file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
> [Loaded io.netty.handler.codec.http.HttpMessage from 
> file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
> {code}
>  
> {code:java}
> 2018-10-12 11:48:20.434 [main] INFO 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
> /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
> {code}
>  
> The workaround is to rename job.jar to 1JOB.jar for example to be loaded first
>  
> {code:java}
> 2018-10-12 13:51:09.165 [main] INFO 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
> 

[jira] [Commented] (FLINK-10538) standalone-job.sh causes Classpath issues

2018-10-18 Thread Razvan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655216#comment-16655216
 ] 

Razvan commented on FLINK-10538:


@Till  Yes, we use Aws S3 for state. But can't the dependencies be shaded?

> standalone-job.sh causes Classpath issues
> -
>
> Key: FLINK-10538
> URL: https://issues.apache.org/jira/browse/FLINK-10538
> Project: Flink
>  Issue Type: Bug
>  Components: Docker, Job-Submission, Kubernetes
>Affects Versions: 1.6.0, 1.6.1, 1.6.2, 1.7.0
>Reporter: Razvan
>Priority: Major
> Fix For: 1.7.0
>
>
> When launching a job with the cluster through this script it creates 
> dependency issues.
>  
> We have a job which uses AsyncHttpClient, which uses the netty library. When 
> building/running a Docker image for a Flink job cluster on Kubernetes 
> (build.sh 
> [https://github.com/apache/flink/blob/release-1.6/flink-container/docker/build.sh])
>  will copy our given artifact to a file called "job.jar" in the lib/ folder 
> of the distribution inside the container.
> Upon runtime (standalone-job.sh) we get:
>  
> {code:java}
> 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph  - 
> StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> 
> Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched 
> from RUNNING to FAILED.
> java.lang.NoSuchMethodError: 
> io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext;
> at 
> io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452)
> at 
> org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58)
> at 
> org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73)
> at 
> org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100)
> at 
> org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89)
> at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32)
> at 
> com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128)
> at 
> com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51)
> {code}
>   
> It's because it loads Apache Flink's Netty dependency first
>  
> {code:java}
> [Loaded io.netty.handler.codec.http.HttpObject from 
> file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
> [Loaded io.netty.handler.codec.http.HttpMessage from 
> file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
> {code}
>  
> {code:java}
> 2018-10-12 11:48:20.434 [main] INFO 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
> /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
> {code}
>  
> The workaround is to rename job.jar to 1JOB.jar for example to be loaded first
>  
> {code:java}
> 2018-10-12 13:51:09.165 [main] INFO 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
> 

[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues

2018-10-12 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-10538:
---
Description: 
When launching a job with the cluster through this script it creates dependency 
issues.

 

We have a job which uses AsyncHttpClient, which uses the netty library. When 
building/running a Docker image for a Flink job cluster on Kubernetes (build.sh 
[https://github.com/apache/flink/blob/release-1.6/flink-container/docker/build.sh])
 will copy our given artifact to a file called "job.jar" in the lib/ folder of 
the distribution inside the container.

Upon runtime (standalone-job.sh) we get:

 
{code:java}
2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph  - StateProcessFunction 
-> ToCustomerRatingFlatMap -> async wait operator -> Sink: CollectResultsSink 
(1/1) (f7fac66a85d41d4eac44ff609c515710) switched from RUNNING to FAILED.
java.lang.NoSuchMethodError: 
io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext;
at io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452)
at 
org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58)
at 
org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73)
at 
org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100)
at 
org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89)
at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32)
at 
com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128)
at 
com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51)

{code}
  

It's because it loads Apache Flink's Netty dependency first

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from 
file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from 
file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
{code}
 
{code:java}
2018-10-12 11:48:20.434 [main] INFO 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
/opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
{code}
 

The workaround is to rename job.jar to 1JOB.jar for example to be loaded first

 
{code:java}
2018-10-12 13:51:09.165 [main] INFO 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
 
{code}
 

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from 
file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from 
file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] 
{code}
 

This needs to be fixed properly as it also means after workaround it will load 
the job's libraries first and could cause the Flink to crash or behave in 
unexpected ways.

 

 

  was:
When launching a job with the cluster through this script it creates dependency 

[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues

2018-10-12 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-10538:
---
Description: 
When launching a job with the cluster through this script it creates dependency 
issues.

 

We have a job which uses AsyncHttpClient, which uses the netty library. When 
building/running a Docker image for a Flink job cluster on Kubernetes (build.sh 
https://github.com/apache/flink/blob/release-1.6/flink-container/docker/build.sh)
 will copy our given artifact to a file called "job.jar" in the lib/ folder of 
the distribution inside the container.

Upon runtime (standalone-job.sh) we get:

 
{code:java}
2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph  - StateProcessFunction 
-> ToCustomerRatingFlatMap -> async wait operator -> Sink: CollectResultsSink 
(1/1) (f7fac66a85d41d4eac44ff609c515710) switched from RUNNING to FAILED.
java.lang.NoSuchMethodError: 
io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext;
at io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452)
at 
org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58)
at 
org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73)
at 
org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100)
at 
org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89)
at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32)
at 
com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128)
at 
com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51)

{code}
  

It's because it loads Apache Flink's netty first

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from 
file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from 
file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
{code}
 
{code:java}
2018-10-12 11:48:20.434 [main] INFO 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
/opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
{code}
 

The workaround is to rename job.jar to 1JOB.jar for example to be loaded first

 
{code:java}
2018-10-12 13:51:09.165 [main] INFO 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
 
{code}
 

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from 
file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from 
file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] 
{code}
 

This needs to be fixed properly as it also means after workaround it will load 
the job's libraries first and could cause the Flink to crash or behave in 
unexpected ways.

 

 

  was:
When launching a job with the cluster through this script it creates dependency 
issues.

 

We 

[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues

2018-10-12 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-10538:
---
Affects Version/s: 1.7.0

> standalone-job.sh causes Classpath issues
> -
>
> Key: FLINK-10538
> URL: https://issues.apache.org/jira/browse/FLINK-10538
> Project: Flink
>  Issue Type: Bug
>  Components: Docker, Job-Submission, Kubernetes
>Affects Versions: 1.6.0, 1.6.1, 1.7.0, 1.6.2
>Reporter: Razvan
>Priority: Blocker
>
> When launching a job with the cluster through this script it creates 
> dependency issues.
>  
> We have a job which uses AsyncHttpClient, which uses the netty library. By 
> default, using standalone-job.sh (when building/running a Docker image for 
> Kubenetes) it will copy our given artifact to a file called "job.jar" in the 
> lib/ folder of the distribution inside the container.
> Upon runtime we get:
>  
> {code:java}
> 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph  - 
> StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> 
> Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched 
> from RUNNING to FAILED.
> java.lang.NoSuchMethodError: 
> io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext;
> at 
> io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452)
> at 
> org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58)
> at 
> org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73)
> at 
> org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100)
> at 
> org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89)
> at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32)
> at 
> com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128)
> at 
> com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51)
> {code}
>   
> It's because it loads Apache Flink's netty first
>  
> {code:java}
> [Loaded io.netty.handler.codec.http.HttpObject from 
> file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
> [Loaded io.netty.handler.codec.http.HttpMessage from 
> file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
> {code}
>  
> {code:java}
> 2018-10-12 11:48:20.434 [main] INFO 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
> /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
> {code}
>  
> The workaround is to rename job.jar to 1JOB.jar for example to be loaded first
>  
> {code:java}
> 2018-10-12 13:51:09.165 [main] INFO 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
> /Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
>  
> {code}
>  
>  
> {code:java}
> [Loaded io.netty.handler.codec.http.HttpObject from 
> 

[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues

2018-10-12 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-10538:
---
Description: 
When launching a job with the cluster through this script it creates dependency 
issues.

 

We have a job which uses AsyncHttpClient, which uses the netty library. By 
default, using standalone-job.sh (when building/running a Docker image for 
Kubenetes) it will copy our given artifact to a file called "job.jar" in the 
lib/ folder of the distribution inside the container.

Upon runtime we get:

 
{code:java}
2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph  - StateProcessFunction 
-> ToCustomerRatingFlatMap -> async wait operator -> Sink: CollectResultsSink 
(1/1) (f7fac66a85d41d4eac44ff609c515710) switched from RUNNING to FAILED.
java.lang.NoSuchMethodError: 
io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext;
at io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452)
at 
org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58)
at 
org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73)
at 
org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100)
at 
org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89)
at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32)
at 
com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128)
at 
com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51)

{code}
  

It's because it loads Apache Flink's netty first

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from 
file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from 
file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
{code}
 
{code:java}
2018-10-12 11:48:20.434 [main] INFO 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
/opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
{code}
 

The workaround is to rename job.jar to 1JOB.jar for example to be loaded first

 
{code:java}
2018-10-12 13:51:09.165 [main] INFO 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
 
{code}
 

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from 
file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from 
file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] 
{code}
 

This needs to be fixed properly as it also means after workaround it will load 
the job's libraries first and could cause the Flink to crash or behave in 
unexpected ways.

 

 

  was:
When launching a job with the cluster through this script it creates dependency 
issues.

 

We have a job which uses AsyncHttpClient, which uses the netty library. By 
default, using 

[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues

2018-10-12 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-10538:
---
Description: 
When launching a job with the cluster through this script it creates dependency 
issues.

 

We have a job which uses AsyncHttpClient, which uses the netty library. By 
default, using standalone-job.sh (when building/running a Docker image for 
Kubenetes) it will copy our given artifact to a file called "job.jar" in the 
lib/ folder of the distribution inside the container.

Upon runtime we get:

 
{code:java}
2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph  - StateProcessFunction 
-> ToCustomerRatingFlatMap -> async wait operator -> Sink: CollectResultsSink 
(1/1) (f7fac66a85d41d4eac44ff609c515710) switched from RUNNING to FAILED.
java.lang.NoSuchMethodError: 
io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext;
at io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452)
at 
org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58)
at 
org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73)
at 
org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100)
at 
org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89)
at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32)
at 
com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128)
at 
com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51)

{code}
  

It's because it loads Apache Flink's netty first

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from 
file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from 
file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
{code}
 
{code:java}
2018-10-12 11:48:20.434 [main] INFO 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
/opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
{code}
 

The workaround is to rename job.jar to 1JOB.jar for example to be loaded first

 
{code:java}
2018-10-12 13:51:09.165 [main] INFO 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
 
{code}
 

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from 
file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from 
file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] 
{code}
 

This needs to be fixed properly as it also means after workaround it will load 
the job's libraries first and could cause the Flink to crash or behave in 
unexpected ways

  was:
When launching a job with the cluster through this script it creates dependency 
issues.

 

We have a job which uses AsyncHttpClient, which uses the netty library. By 
default, using 

[jira] [Updated] (FLINK-10538) standalone-job.sh causes Classpath issues

2018-10-12 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-10538:
---
Priority: Blocker  (was: Critical)

> standalone-job.sh causes Classpath issues
> -
>
> Key: FLINK-10538
> URL: https://issues.apache.org/jira/browse/FLINK-10538
> Project: Flink
>  Issue Type: Bug
>  Components: Docker, Job-Submission, Kubernetes
>Affects Versions: 1.6.0, 1.6.1, 1.6.2
>Reporter: Razvan
>Priority: Blocker
>
> When launching a job with the cluster through this script it creates 
> dependency issues.
>  
> We have a job which uses AsyncHttpClient, which uses the netty library. By 
> default, using standalone-job.sh (when building/running a Docker image for 
> Kubenetes) it will copy our given artifact to a file called "job.jar" in the 
> lib/ folder of the distribution inside the container.
> Upon runtime we get:
>  
> {code:java}
> 2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph  - 
> StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> 
> Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched 
> from RUNNING to FAILED.
> java.lang.NoSuchMethodError: 
> io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext;
> at 
> io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452)
> at 
> org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58)
> at 
> org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73)
> at 
> org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100)
> at 
> org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89)
> at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32)
> at 
> com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128)
> at 
> com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51)
> {code}
>  
>  
> It's because it loads Apache Flink's netty first
>  
>  
> {code:java}
> [Loaded io.netty.handler.codec.http.HttpObject from 
> file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
> [Loaded io.netty.handler.codec.http.HttpMessage from 
> file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
> {code}
>  
> {code:java}
> 2018-10-12 11:48:20.434 [main] INFO 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
> /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
> {code}
>  
> The workaround is to rename job.jar to 1JOB.jar for example to be loaded first
>  
> {code:java}
> 2018-10-12 13:51:09.165 [main] INFO 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
> /Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
>  
> {code}
>  
>  
> {code:java}
> [Loaded io.netty.handler.codec.http.HttpObject from 

[jira] [Created] (FLINK-10538) standalone-job.sh causes Classpath issues

2018-10-12 Thread Razvan (JIRA)
Razvan created FLINK-10538:
--

 Summary: standalone-job.sh causes Classpath issues
 Key: FLINK-10538
 URL: https://issues.apache.org/jira/browse/FLINK-10538
 Project: Flink
  Issue Type: Bug
  Components: Docker, Job-Submission, Kubernetes
Affects Versions: 1.6.1, 1.6.0, 1.6.2
Reporter: Razvan


When launching a job with the cluster through this script it creates dependency 
issues.

 

We have a job which uses AsyncHttpClient, which uses the netty library. By 
default, using standalone-job.sh (when building/running a Docker image for 
Kubenetes) it will copy our given artifact to a file called "job.jar" in the 
lib/ folder of the distribution inside the container.

Upon runtime we get:

 
{code:java}
2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph  - StateProcessFunction 
-> ToCustomerRatingFlatMap -> async wait operator -> Sink: CollectResultsSink 
(1/1) (f7fac66a85d41d4eac44ff609c515710) switched from RUNNING to FAILED.
java.lang.NoSuchMethodError: 
io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext;
at io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452)
at 
org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58)
at 
org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73)
at 
org.asynchttpclient.netty.channel.ChannelManager.(ChannelManager.java:100)
at 
org.asynchttpclient.DefaultAsyncHttpClient.(DefaultAsyncHttpClient.java:89)
at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32)
at 
com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128)
at 
com.test.events.common.asynchttp.AsyncHttpClientProvider.(AsyncHttpClientProvider.java:51)

{code}
 

 

It's because it loads Apache Flink's netty first

 

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from 
file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from 
file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
{code}
 
{code:java}
2018-10-12 11:48:20.434 [main] INFO 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
/opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
{code}
 

The workaround is to rename job.jar to 1JOB.jar for example to be loaded first

 
{code:java}
2018-10-12 13:51:09.165 [main] INFO 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: 
/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
 
{code}
 

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from 
file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from 
file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] 
{code}
 

This needs to be fixed properly as it also means after workaround it will load 
the job's libraries first and could cause the Flink to crash or behave in 

[jira] [Comment Edited] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-08 Thread Razvan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506323#comment-16506323
 ] 

Razvan edited comment on FLINK-9540 at 6/8/18 5:51 PM:
---

Hi [~aljoscha],

 

  Thanks for the hint. I managed to get 1.4.2 working with S3A by copying 
necessary libraries to lib/ for Hadoop 2.8 only (but it's not straightforward 
and it's not correctly documented in my opinion). My goal here is also to clean 
up the documentation a bit. So I want *to define exactly what is needed for 
each version of Apache Flink (1.4.* - 1.6.* ) in order to get it working with 
S3/A/N for each of Hadoop versions (2.4 - 2.8).* I believe this will increase 
the number of users of the framework and contributors. So if you / anyone could 
help me define the dependencies I can make PR for documentation. 


was (Author: razvan):
Hi [~aljoscha],

 

  Thanks for the hint. I managed to get 1.4.2 working with S3A by copying 
necessary libraries to lib/ (but it's not straightforward and it's not 
correctly documented in my opinion). My goal here is also to clean up the 
documentation a bit. So I want *to define exactly what is needed for each 
version of Apache Flink (1.4.* - 1.6.* ) in order to get it working with S3/A/N 
for each of Hadoop versions (2.4 - 2.8).* I believe this will increase the 
number of users of the framework and contributors. So if you / anyone could 
help me define the dependencies I can make PR for documentation. 

> Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 
> and fails
> -
>
> Key: FLINK-9540
> URL: https://issues.apache.org/jira/browse/FLINK-9540
> Project: Flink
>  Issue Type: Bug
>Reporter: Razvan
>Priority: Blocker
>
> Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
> flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
> get:
>  
> #
>  # Licensed to the Apache Software Foundation (ASF) under one
>  # or more contributor license agreements. See the NOTICE file
>  # distributed with this work for additional information
>  # regarding copyright ownership. The ASF licenses this file
>  # to you under the Apache License, Version 2.0 (the
>  # "License"); you may not use this file except in compliance
>  # with the License. You may obtain a copy of the License at
>  #
>  # [http://www.apache.org/licenses/LICENSE-2.0]
>  #
>  # Unless required by applicable law or agreed to in writing, software
>  # distributed under the License is distributed on an "AS IS" BASIS,
>  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  # See the License for the specific language governing permissions and
>  # limitations under the License.
>  #
> *version=2.8.1*
>  revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be
>  *branch=branch-2.8.1-private*
>  user=vinodkv
>  date=2017-06-07T21:22Z
>  *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*]
>  srcChecksum=60125541c2b3e266cbf3becc5bda666
>  protocVersion=2.5.0
>  
> This will fail to work with dependencies described in 
> ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)]
>  
> "Depending on which file system you use, please add the following 
> dependencies. You can find these as part of the Hadoop binaries in 
> {{hadoop-2.7/share/hadoop/tools/lib}}:
>  * {{S3AFileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies:
>  *** {{aws-java-sdk-core-1.11.183.jar}}
>  *** {{aws-java-sdk-kms-1.11.183.jar}}
>  *** {{jackson-annotations-2.6.7.jar}}
>  *** {{jackson-core-2.6.7.jar}}
>  *** {{jackson-databind-2.6.7.jar}}
>  *** {{joda-time-2.8.1.jar}}
>  *** {{httpcore-4.4.4.jar}}
>  *** {{httpclient-4.5.3.jar}}
>  * {{NativeS3FileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{guava-11.0.2.jar"}}{{}}
>  
> {{  I presume the build task is flawed, it should be built for Apache Hadoop 
> 2.7 can someone please check it?}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-08 Thread Razvan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506323#comment-16506323
 ] 

Razvan commented on FLINK-9540:
---

Hi [~aljoscha],

 

  Thanks for the hint. I managed to get 1.4.2 working with S3A by copying 
necessary libraries to lib/ (but it's not straightforward and it's not 
correctly documented in my opinion). My goal here is also to clean up the 
documentation a bit. So I want *to define exactly what is needed for each 
version of Apache Flink (1.4.* - 1.6.* ) in order to get it working with S3/A/N 
for each of Hadoop versions (2.4 - 2.8).* I believe this will increase the 
number of users of the framework and contributors. So if you / anyone could 
help me define the dependencies I can make PR for documentation. 

> Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 
> and fails
> -
>
> Key: FLINK-9540
> URL: https://issues.apache.org/jira/browse/FLINK-9540
> Project: Flink
>  Issue Type: Bug
>Reporter: Razvan
>Priority: Blocker
>
> Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
> flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
> get:
>  
> #
>  # Licensed to the Apache Software Foundation (ASF) under one
>  # or more contributor license agreements. See the NOTICE file
>  # distributed with this work for additional information
>  # regarding copyright ownership. The ASF licenses this file
>  # to you under the Apache License, Version 2.0 (the
>  # "License"); you may not use this file except in compliance
>  # with the License. You may obtain a copy of the License at
>  #
>  # [http://www.apache.org/licenses/LICENSE-2.0]
>  #
>  # Unless required by applicable law or agreed to in writing, software
>  # distributed under the License is distributed on an "AS IS" BASIS,
>  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  # See the License for the specific language governing permissions and
>  # limitations under the License.
>  #
> *version=2.8.1*
>  revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be
>  *branch=branch-2.8.1-private*
>  user=vinodkv
>  date=2017-06-07T21:22Z
>  *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*]
>  srcChecksum=60125541c2b3e266cbf3becc5bda666
>  protocVersion=2.5.0
>  
> This will fail to work with dependencies described in 
> ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)]
>  
> "Depending on which file system you use, please add the following 
> dependencies. You can find these as part of the Hadoop binaries in 
> {{hadoop-2.7/share/hadoop/tools/lib}}:
>  * {{S3AFileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies:
>  *** {{aws-java-sdk-core-1.11.183.jar}}
>  *** {{aws-java-sdk-kms-1.11.183.jar}}
>  *** {{jackson-annotations-2.6.7.jar}}
>  *** {{jackson-core-2.6.7.jar}}
>  *** {{jackson-databind-2.6.7.jar}}
>  *** {{joda-time-2.8.1.jar}}
>  *** {{httpcore-4.4.4.jar}}
>  *** {{httpclient-4.5.3.jar}}
>  * {{NativeS3FileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{guava-11.0.2.jar"}}{{}}
>  
> {{  I presume the build task is flawed, it should be built for Apache Hadoop 
> 2.7 can someone please check it?}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-08 Thread Razvan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505955#comment-16505955
 ] 

Razvan commented on FLINK-9540:
---

Hi,

 

  This is the first exception:

 

2018-05-18 17:37:20.986 [main] WARN 
org.apache.flink.configuration.Configuration - Config uses deprecated 
configuration key 'high-availability.zookeeper.storageDir' instead of proper 
key 'high-availability.storageDir'
2018-05-18 17:37:21.239 [main] ERROR 
org.apache.flink.runtime.taskmanager.TaskManager - Failed to run TaskManager.
java.io.IOException: Could not create FileSystem for highly available storage 
(high-availability.storageDir)
 at 
org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:119)
 at 
org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:92)
 at 
org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:103)
 at 
org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1681)
 at 
org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1592)
 at 
org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1590)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
 at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
 at 
org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1590)
 at org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could 
not find a file system implementation for scheme 's3a'. The scheme is not 
directly supported by Flink and no Hadoop file system to support this scheme 
could be loaded.
 at 
org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:405)
 at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
 at org.apache.flink.core.fs.Path.getFileSystem(Path.java:293)
 at 
org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:116)
 ... 11 common frames omitted
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: 
Cannot support file system for 's3a' via Hadoop, because Hadoop is not in the 
classpath, or some classes are missing from the classpath.
 at 
org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:179)
 at 
org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401)
 ... 14 common frames omitted
Caused by: java.lang.NoClassDefFoundError: 
org/apache/http/HttpEntityEnclosingRequest
 at com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:119)
 at com.amazonaws.services.s3.AmazonS3Client.(AmazonS3Client.java:389)
 at com.amazonaws.services.s3.AmazonS3Client.(AmazonS3Client.java:371)
 at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235)
 at 
org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:159)
 ... 15 common frames omitted
Caused by: java.lang.ClassNotFoundException: 
org.apache.http.HttpEntityEnclosingRequest
 at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 ... 20 common frames omitted

 

Then I started to add HttpClient and other libs

> Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 
> and fails
> -
>
> Key: FLINK-9540
> URL: https://issues.apache.org/jira/browse/FLINK-9540
> Project: Flink
>  Issue Type: Bug
>Reporter: Razvan
>Priority: Blocker
>
> Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
> flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
> get:
>  
> #
>  # Licensed to the Apache Software Foundation (ASF) under one
>  # or more contributor license agreements. See the NOTICE file
>  # distributed with this work for additional information
>  # regarding copyright ownership. The ASF licenses this file
>  # to you under the Apache License, Version 2.0 (the
>  # "License"); you may not use this file except in compliance
>  # with the License. You may obtain a copy of the License at
>  #
>  # [http://www.apache.org/licenses/LICENSE-2.0]
>  #
>  # Unless required by applicable law or agreed to in writing, software
>  # distributed under the License is distributed on an "AS IS" BASIS,
>  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 

[jira] [Commented] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-08 Thread Razvan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505847#comment-16505847
 ] 

Razvan commented on FLINK-9540:
---

If you know of a way to use S3A without adding those dependencies I'd be 
grateful to hear about it :) / maybe you can post documentation link.

Because up to now the only way it worked in previous versions was to copy 
required libraries to lib/.

> Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 
> and fails
> -
>
> Key: FLINK-9540
> URL: https://issues.apache.org/jira/browse/FLINK-9540
> Project: Flink
>  Issue Type: Bug
>Reporter: Razvan
>Priority: Blocker
>
> Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
> flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
> get:
>  
> #
>  # Licensed to the Apache Software Foundation (ASF) under one
>  # or more contributor license agreements. See the NOTICE file
>  # distributed with this work for additional information
>  # regarding copyright ownership. The ASF licenses this file
>  # to you under the Apache License, Version 2.0 (the
>  # "License"); you may not use this file except in compliance
>  # with the License. You may obtain a copy of the License at
>  #
>  # [http://www.apache.org/licenses/LICENSE-2.0]
>  #
>  # Unless required by applicable law or agreed to in writing, software
>  # distributed under the License is distributed on an "AS IS" BASIS,
>  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  # See the License for the specific language governing permissions and
>  # limitations under the License.
>  #
> *version=2.8.1*
>  revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be
>  *branch=branch-2.8.1-private*
>  user=vinodkv
>  date=2017-06-07T21:22Z
>  *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*]
>  srcChecksum=60125541c2b3e266cbf3becc5bda666
>  protocVersion=2.5.0
>  
> This will fail to work with dependencies described in 
> ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)]
>  
> "Depending on which file system you use, please add the following 
> dependencies. You can find these as part of the Hadoop binaries in 
> {{hadoop-2.7/share/hadoop/tools/lib}}:
>  * {{S3AFileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies:
>  *** {{aws-java-sdk-core-1.11.183.jar}}
>  *** {{aws-java-sdk-kms-1.11.183.jar}}
>  *** {{jackson-annotations-2.6.7.jar}}
>  *** {{jackson-core-2.6.7.jar}}
>  *** {{jackson-databind-2.6.7.jar}}
>  *** {{joda-time-2.8.1.jar}}
>  *** {{httpcore-4.4.4.jar}}
>  *** {{httpclient-4.5.3.jar}}
>  * {{NativeS3FileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{guava-11.0.2.jar"}}{{}}
>  
> {{  I presume the build task is flawed, it should be built for Apache Hadoop 
> 2.7 can someone please check it?}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-08 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan reopened FLINK-9540:
---

> Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 
> and fails
> -
>
> Key: FLINK-9540
> URL: https://issues.apache.org/jira/browse/FLINK-9540
> Project: Flink
>  Issue Type: Bug
>Reporter: Razvan
>Priority: Blocker
>
> Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
> flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
> get:
>  
> #
>  # Licensed to the Apache Software Foundation (ASF) under one
>  # or more contributor license agreements. See the NOTICE file
>  # distributed with this work for additional information
>  # regarding copyright ownership. The ASF licenses this file
>  # to you under the Apache License, Version 2.0 (the
>  # "License"); you may not use this file except in compliance
>  # with the License. You may obtain a copy of the License at
>  #
>  # [http://www.apache.org/licenses/LICENSE-2.0]
>  #
>  # Unless required by applicable law or agreed to in writing, software
>  # distributed under the License is distributed on an "AS IS" BASIS,
>  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  # See the License for the specific language governing permissions and
>  # limitations under the License.
>  #
> *version=2.8.1*
>  revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be
>  *branch=branch-2.8.1-private*
>  user=vinodkv
>  date=2017-06-07T21:22Z
>  *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*]
>  srcChecksum=60125541c2b3e266cbf3becc5bda666
>  protocVersion=2.5.0
>  
> This will fail to work with dependencies described in 
> ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)]
>  
> "Depending on which file system you use, please add the following 
> dependencies. You can find these as part of the Hadoop binaries in 
> {{hadoop-2.7/share/hadoop/tools/lib}}:
>  * {{S3AFileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies:
>  *** {{aws-java-sdk-core-1.11.183.jar}}
>  *** {{aws-java-sdk-kms-1.11.183.jar}}
>  *** {{jackson-annotations-2.6.7.jar}}
>  *** {{jackson-core-2.6.7.jar}}
>  *** {{jackson-databind-2.6.7.jar}}
>  *** {{joda-time-2.8.1.jar}}
>  *** {{httpcore-4.4.4.jar}}
>  *** {{httpclient-4.5.3.jar}}
>  * {{NativeS3FileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{guava-11.0.2.jar"}}{{}}
>  
> {{  I presume the build task is flawed, it should be built for Apache Hadoop 
> 2.7 can someone please check it?}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-07 Thread Razvan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504723#comment-16504723
 ] 

Razvan commented on FLINK-9540:
---

Hi [~aljoscha],

  Not sure I understand the question :) So what I'm suggesting is 
flink-s3-fs-hadoop-1.4.2.jar delivered as built for Hadoop 2.7 is actually 
built for Hadoop 2.8 and Apache Flink cluster won't start with error mentioned 
above. These jars require a lot of other jars (which list is not accurate in 
documentation btw) to enable Apache Flink to work with S3.

> Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 
> and fails
> -
>
> Key: FLINK-9540
> URL: https://issues.apache.org/jira/browse/FLINK-9540
> Project: Flink
>  Issue Type: Bug
>Reporter: Razvan
>Priority: Blocker
>
> Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
> flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
> get:
>  
> #
>  # Licensed to the Apache Software Foundation (ASF) under one
>  # or more contributor license agreements. See the NOTICE file
>  # distributed with this work for additional information
>  # regarding copyright ownership. The ASF licenses this file
>  # to you under the Apache License, Version 2.0 (the
>  # "License"); you may not use this file except in compliance
>  # with the License. You may obtain a copy of the License at
>  #
>  # [http://www.apache.org/licenses/LICENSE-2.0]
>  #
>  # Unless required by applicable law or agreed to in writing, software
>  # distributed under the License is distributed on an "AS IS" BASIS,
>  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  # See the License for the specific language governing permissions and
>  # limitations under the License.
>  #
> *version=2.8.1*
>  revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be
>  *branch=branch-2.8.1-private*
>  user=vinodkv
>  date=2017-06-07T21:22Z
>  *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*]
>  srcChecksum=60125541c2b3e266cbf3becc5bda666
>  protocVersion=2.5.0
>  
> This will fail to work with dependencies described in 
> ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)]
>  
> "Depending on which file system you use, please add the following 
> dependencies. You can find these as part of the Hadoop binaries in 
> {{hadoop-2.7/share/hadoop/tools/lib}}:
>  * {{S3AFileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies:
>  *** {{aws-java-sdk-core-1.11.183.jar}}
>  *** {{aws-java-sdk-kms-1.11.183.jar}}
>  *** {{jackson-annotations-2.6.7.jar}}
>  *** {{jackson-core-2.6.7.jar}}
>  *** {{jackson-databind-2.6.7.jar}}
>  *** {{joda-time-2.8.1.jar}}
>  *** {{httpcore-4.4.4.jar}}
>  *** {{httpclient-4.5.3.jar}}
>  * {{NativeS3FileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{guava-11.0.2.jar"}}{{}}
>  
> {{  I presume the build task is flawed, it should be built for Apache Hadoop 
> 2.7 can someone please check it?}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-06 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9540:
--
Description: 
Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
get:

 

#
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements. See the NOTICE file
 # distributed with this work for additional information
 # regarding copyright ownership. The ASF licenses this file
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License. You may obtain a copy of the License at
 #
 # [http://www.apache.org/licenses/LICENSE-2.0]
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #

*version=2.8.1*
 revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be
 *branch=branch-2.8.1-private*
 user=vinodkv
 date=2017-06-07T21:22Z
 *url=[https://git-wip-us.apache.org/repos/asf/hadoop.git*]
 srcChecksum=60125541c2b3e266cbf3becc5bda666
 protocVersion=2.5.0

 

This will fail to work with dependencies described in 
([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)]

 

"Depending on which file system you use, please add the following dependencies. 
You can find these as part of the Hadoop binaries in 
{{hadoop-2.7/share/hadoop/tools/lib}}:
 * {{S3AFileSystem}}:
 ** {{hadoop-aws-2.7.3.jar}}
 ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies:
 *** {{aws-java-sdk-core-1.11.183.jar}}
 *** {{aws-java-sdk-kms-1.11.183.jar}}
 *** {{jackson-annotations-2.6.7.jar}}
 *** {{jackson-core-2.6.7.jar}}
 *** {{jackson-databind-2.6.7.jar}}
 *** {{joda-time-2.8.1.jar}}
 *** {{httpcore-4.4.4.jar}}
 *** {{httpclient-4.5.3.jar}}
 * {{NativeS3FileSystem}}:
 ** {{hadoop-aws-2.7.3.jar}}
 ** {{guava-11.0.2.jar"}}{{}}

 

{{  I presume the build task is flawed, it should be built for Apache Hadoop 
2.7 can someone please check it?}}

  was:
Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
get:

 

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

*version=2.8.1*
revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be
*branch=branch-2.8.1-private*
user=vinodkv
date=2017-06-07T21:22Z
*url=https://git-wip-us.apache.org/repos/asf/hadoop.git*
srcChecksum=60125541c2b3e266cbf3becc5bda666
protocVersion=2.5.0

 

This will fail to work with dependencies described in 
([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)]

 

"Depending on which file system you use, please add the following dependencies. 
You can find these as part of the Hadoop binaries in 
{{hadoop-2.7/share/hadoop/tools/lib}}:
 * {{S3AFileSystem}}:
 ** {{hadoop-aws-2.7.3.jar}}
 ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies:
 *** {{aws-java-sdk-core-1.11.183.jar}}
 *** {{aws-java-sdk-kms-1.11.183.jar}}
 *** {{jackson-annotations-2.6.7.jar}}
 *** {{jackson-core-2.6.7.jar}}
 *** {{jackson-databind-2.6.7.jar}}
 *** {{joda-time-2.8.1.jar}}
 *** {{httpcore-4.4.4.jar}}
 *** {{httpclient-4.5.3.jar}}
 * {{NativeS3FileSystem}}:
 ** {{hadoop-aws-2.7.3.jar}}
 ** {{guava-11.0.2.jar"}}{{}}

 

{{  I presume the build task is flawed, it should be built for Apache Hadoop 
2.7}}


> Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 
> and fails
> -
>
> Key: FLINK-9540
> URL: https://issues.apache.org/jira/browse/FLINK-9540
> Project: Flink
>  Issue Type: Bug
>Reporter: Razvan
>Priority: Blocker
>
> Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
> flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
> get:
>  

[jira] [Updated] (FLINK-9540) Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-06 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9540:
--
Summary: Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for 
Hadoop 2.8 and fails  (was: Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 
is built for Hadoop 2.8 and fails)

> Apache Flink 1.4.2 S3 Hadoop library for Hadoop 2.7 is built for Hadoop 2.8 
> and fails
> -
>
> Key: FLINK-9540
> URL: https://issues.apache.org/jira/browse/FLINK-9540
> Project: Flink
>  Issue Type: Bug
>Reporter: Razvan
>Priority: Blocker
>
> Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
> flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
> get:
>  
> #
> # Licensed to the Apache Software Foundation (ASF) under one
> # or more contributor license agreements. See the NOTICE file
> # distributed with this work for additional information
> # regarding copyright ownership. The ASF licenses this file
> # to you under the Apache License, Version 2.0 (the
> # "License"); you may not use this file except in compliance
> # with the License. You may obtain a copy of the License at
> #
> # http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing, software
> # distributed under the License is distributed on an "AS IS" BASIS,
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
> #
> *version=2.8.1*
> revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be
> *branch=branch-2.8.1-private*
> user=vinodkv
> date=2017-06-07T21:22Z
> *url=https://git-wip-us.apache.org/repos/asf/hadoop.git*
> srcChecksum=60125541c2b3e266cbf3becc5bda666
> protocVersion=2.5.0
>  
> This will fail to work with dependencies described in 
> ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)]
>  
> "Depending on which file system you use, please add the following 
> dependencies. You can find these as part of the Hadoop binaries in 
> {{hadoop-2.7/share/hadoop/tools/lib}}:
>  * {{S3AFileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies:
>  *** {{aws-java-sdk-core-1.11.183.jar}}
>  *** {{aws-java-sdk-kms-1.11.183.jar}}
>  *** {{jackson-annotations-2.6.7.jar}}
>  *** {{jackson-core-2.6.7.jar}}
>  *** {{jackson-databind-2.6.7.jar}}
>  *** {{joda-time-2.8.1.jar}}
>  *** {{httpcore-4.4.4.jar}}
>  *** {{httpclient-4.5.3.jar}}
>  * {{NativeS3FileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{guava-11.0.2.jar"}}{{}}
>  
> {{  I presume the build task is flawed, it should be built for Apache Hadoop 
> 2.7}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9540) Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-06 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9540:
--
Description: 
Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
get:

 

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

*version=2.8.1*
revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be
*branch=branch-2.8.1-private*
user=vinodkv
date=2017-06-07T21:22Z
*url=https://git-wip-us.apache.org/repos/asf/hadoop.git*
srcChecksum=60125541c2b3e266cbf3becc5bda666
protocVersion=2.5.0

 

This will fail to work with dependencies described in 
([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)]

 

"Depending on which file system you use, please add the following dependencies. 
You can find these as part of the Hadoop binaries in 
{{hadoop-2.7/share/hadoop/tools/lib}}:
 * {{S3AFileSystem}}:
 ** {{hadoop-aws-2.7.3.jar}}
 ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies:
 *** {{aws-java-sdk-core-1.11.183.jar}}
 *** {{aws-java-sdk-kms-1.11.183.jar}}
 *** {{jackson-annotations-2.6.7.jar}}
 *** {{jackson-core-2.6.7.jar}}
 *** {{jackson-databind-2.6.7.jar}}
 *** {{joda-time-2.8.1.jar}}
 *** {{httpcore-4.4.4.jar}}
 *** {{httpclient-4.5.3.jar}}
 * {{NativeS3FileSystem}}:
 ** {{hadoop-aws-2.7.3.jar}}
 ** {{guava-11.0.2.jar"}}{{}}

 

{{  I presume the build task is flawed, it should be built for Apache Hadoop 
2.7}}

  was:
Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
get:

 

 


> Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 
> and fails
> 
>
> Key: FLINK-9540
> URL: https://issues.apache.org/jira/browse/FLINK-9540
> Project: Flink
>  Issue Type: Bug
>Reporter: Razvan
>Priority: Blocker
>
> Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
> flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
> get:
>  
> #
> # Licensed to the Apache Software Foundation (ASF) under one
> # or more contributor license agreements. See the NOTICE file
> # distributed with this work for additional information
> # regarding copyright ownership. The ASF licenses this file
> # to you under the Apache License, Version 2.0 (the
> # "License"); you may not use this file except in compliance
> # with the License. You may obtain a copy of the License at
> #
> # http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing, software
> # distributed under the License is distributed on an "AS IS" BASIS,
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
> #
> *version=2.8.1*
> revision=1e6296df38f9cd3d9581c8af58a2a03a6e4312be
> *branch=branch-2.8.1-private*
> user=vinodkv
> date=2017-06-07T21:22Z
> *url=https://git-wip-us.apache.org/repos/asf/hadoop.git*
> srcChecksum=60125541c2b3e266cbf3becc5bda666
> protocVersion=2.5.0
>  
> This will fail to work with dependencies described in 
> ([https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html)]
>  
> "Depending on which file system you use, please add the following 
> dependencies. You can find these as part of the Hadoop binaries in 
> {{hadoop-2.7/share/hadoop/tools/lib}}:
>  * {{S3AFileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{aws-java-sdk-s3-1.11.183.jar}} and its dependencies:
>  *** {{aws-java-sdk-core-1.11.183.jar}}
>  *** {{aws-java-sdk-kms-1.11.183.jar}}
>  *** {{jackson-annotations-2.6.7.jar}}
>  *** {{jackson-core-2.6.7.jar}}
>  *** {{jackson-databind-2.6.7.jar}}
>  *** {{joda-time-2.8.1.jar}}
>  *** {{httpcore-4.4.4.jar}}
>  *** {{httpclient-4.5.3.jar}}
>  * {{NativeS3FileSystem}}:
>  ** {{hadoop-aws-2.7.3.jar}}
>  ** {{guava-11.0.2.jar"}}{{}}
>  
> {{  I presume the build task is flawed, it should 

[jira] [Updated] (FLINK-9540) Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-06 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9540:
--
Description: 
Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
get:

 

 

  was:++Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. 


> Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 
> and fails
> 
>
> Key: FLINK-9540
> URL: https://issues.apache.org/jira/browse/FLINK-9540
> Project: Flink
>  Issue Type: Bug
>Reporter: Razvan
>Priority: Blocker
>
> Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. Go to opt/ inside 
> flink-s3-fs-hadoop-1.4.2.jar, open common version info .properties inside you 
> get:
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9540) Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-06 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9540:
--
Description: ++Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. 

> Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 
> and fails
> 
>
> Key: FLINK-9540
> URL: https://issues.apache.org/jira/browse/FLINK-9540
> Project: Flink
>  Issue Type: Bug
>Reporter: Razvan
>Priority: Blocker
>
> ++Download Apache Flink 1.4.2 for Hadoop 2.7 or earlier. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9540) Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is built for Hadoop 2.8 and fails

2018-06-06 Thread Razvan (JIRA)
Razvan created FLINK-9540:
-

 Summary: Apache Flink 1.4.2 S3 Hadooplibrary for Hadoop 2.7 is 
built for Hadoop 2.8 and fails
 Key: FLINK-9540
 URL: https://issues.apache.org/jira/browse/FLINK-9540
 Project: Flink
  Issue Type: Bug
Reporter: Razvan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear

2018-06-05 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9441:
--
Description: 
To be able to use Apache Flink with S3, a few libraries from Hadoop 
distribution are required to be added to lib/ folder.

This list is partially documented in 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
  (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink 
supports 2.8 also.

How to compose the dependecy list for Hadoop 2.8?

Is it possible to bundle the dependencies in a separate archive that users can 
download?

 

UPDATE:

Downloaded Apache Flink 1.4.2 for Hadoop 2.7 and it seems it was compiled with 
Hadoop 2.8. I get the error here: 

 

{{"java.lang.NumberFormatException: For input string: "100M" at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
at java.lang.Long.parseLong(Long.java:589) at 
java.lang.Long.parseLong(Long.java:631) at 
org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1319) at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248) at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811) at 
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) }}

{{..."}}

[https://stackoverflow.com/questions/48149929/hive-1-2-metastore-service-doesnt-start-after-configuring-it-to-s3-storage-inst?rq=1]

 

So I cannot start the cluster.

  was:
To be able to use Apache Flink with S3, a few libraries from Hadoop 
distribution are required to be added to lib/ folder.

This list is partially documented in 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
  (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink 
supports 2.8 also.

How to compose the dependecy list for Hadoop 2.8?

Is it possible to bundle the dependencies in a separate archive that users can 
download?

 

UPDATE:


> Hadoop Required Dependency List Not Clear
> -
>
> Key: FLINK-9441
> URL: https://issues.apache.org/jira/browse/FLINK-9441
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Configuration, Documentation
>Affects Versions: 1.5.0, 1.4.2, 1.6.0
>Reporter: Razvan
>Priority: Blocker
>
> To be able to use Apache Flink with S3, a few libraries from Hadoop 
> distribution are required to be added to lib/ folder.
> This list is partially documented in 
> [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
>   (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas 
> Flink supports 2.8 also.
> How to compose the dependecy list for Hadoop 2.8?
> Is it possible to bundle the dependencies in a separate archive that users 
> can download?
>  
> UPDATE:
> Downloaded Apache Flink 1.4.2 for Hadoop 2.7 and it seems it was compiled 
> with Hadoop 2.8. I get the error here: 
>  
> {{"java.lang.NumberFormatException: For input string: "100M" at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
> at java.lang.Long.parseLong(Long.java:589) at 
> java.lang.Long.parseLong(Long.java:631) at 
> org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1319) at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248) at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811) at 
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) }}
> {{..."}}
> [https://stackoverflow.com/questions/48149929/hive-1-2-metastore-service-doesnt-start-after-configuring-it-to-s3-storage-inst?rq=1]
>  
> So I cannot start the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear

2018-06-05 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9441:
--
Description: 
To be able to use Apache Flink with S3, a few libraries from Hadoop 
distribution are required to be added to lib/ folder.

This list is partially documented in 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
  (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink 
supports 2.8 also.

How to compose the dependecy list for Hadoop 2.8?

Is it possible to bundle the dependencies in a separate archive that users can 
download?

 

UPDATE:

  was:
To be able to use Apache Flink with S3, a few libraries from Hadoop 
distribution are required to be added to lib/ folder.

This list is partially documented in 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
  (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink 
supports 2.8 also.

So i would like to know how to compose the dependecy list for Hadoop 2.8 and 
update the documentation.

Also would it be possible to bundle the dependencies in a separate archive that 
users can download?


> Hadoop Required Dependency List Not Clear
> -
>
> Key: FLINK-9441
> URL: https://issues.apache.org/jira/browse/FLINK-9441
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Configuration, Documentation
>Affects Versions: 1.5.0, 1.4.2, 1.6.0
>Reporter: Razvan
>Priority: Blocker
>
> To be able to use Apache Flink with S3, a few libraries from Hadoop 
> distribution are required to be added to lib/ folder.
> This list is partially documented in 
> [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
>   (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas 
> Flink supports 2.8 also.
> How to compose the dependecy list for Hadoop 2.8?
> Is it possible to bundle the dependencies in a separate archive that users 
> can download?
>  
> UPDATE:



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear

2018-06-05 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9441:
--
Component/s: Build System

> Hadoop Required Dependency List Not Clear
> -
>
> Key: FLINK-9441
> URL: https://issues.apache.org/jira/browse/FLINK-9441
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Configuration, Documentation
>Affects Versions: 1.5.0, 1.4.2, 1.6.0
>Reporter: Razvan
>Priority: Blocker
>
> To be able to use Apache Flink with S3, a few libraries from Hadoop 
> distribution are required to be added to lib/ folder.
> This list is partially documented in 
> [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
>   (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas 
> Flink supports 2.8 also.
> So i would like to know how to compose the dependecy list for Hadoop 2.8 and 
> update the documentation.
> Also would it be possible to bundle the dependencies in a separate archive 
> that users can download?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear

2018-06-05 Thread Razvan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9441:
--
Component/s: Configuration

> Hadoop Required Dependency List Not Clear
> -
>
> Key: FLINK-9441
> URL: https://issues.apache.org/jira/browse/FLINK-9441
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Configuration, Documentation
>Affects Versions: 1.5.0, 1.4.2, 1.6.0
>Reporter: Razvan
>Priority: Blocker
>
> To be able to use Apache Flink with S3, a few libraries from Hadoop 
> distribution are required to be added to lib/ folder.
> This list is partially documented in 
> [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
>   (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas 
> Flink supports 2.8 also.
> So i would like to know how to compose the dependecy list for Hadoop 2.8 and 
> update the documentation.
> Also would it be possible to bundle the dependencies in a separate archive 
> that users can download?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9446) Compatibility table not up-to-date

2018-05-26 Thread Razvan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9446:
--
Priority: Major  (was: Minor)

> Compatibility table not up-to-date
> --
>
> Key: FLINK-9446
> URL: https://issues.apache.org/jira/browse/FLINK-9446
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0, 1.4.2, 1.6.0
>Reporter: Razvan
>Priority: Major
>
> The compatibility table 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html has 
> not been updated since 1.3.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9446) Compatibility table not up-to-date

2018-05-26 Thread Razvan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9446:
--
Description: The compatibility table 
https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html has 
not been updated since 1.3.x.

> Compatibility table not up-to-date
> --
>
> Key: FLINK-9446
> URL: https://issues.apache.org/jira/browse/FLINK-9446
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0, 1.4.2, 1.6.0
>Reporter: Razvan
>Priority: Minor
>
> The compatibility table 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html has 
> not been updated since 1.3.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9446) Compatibility table not up-to-date

2018-05-26 Thread Razvan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9446:
--
Affects Version/s: 1.6.0
   1.5.0
   1.4.2

> Compatibility table not up-to-date
> --
>
> Key: FLINK-9446
> URL: https://issues.apache.org/jira/browse/FLINK-9446
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0, 1.4.2, 1.6.0
>Reporter: Razvan
>Priority: Minor
>
> The compatibility table 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html has 
> not been updated since 1.3.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9446) Compatibility table not up-to-date

2018-05-26 Thread Razvan (JIRA)
Razvan created FLINK-9446:
-

 Summary: Compatibility table not up-to-date
 Key: FLINK-9446
 URL: https://issues.apache.org/jira/browse/FLINK-9446
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Razvan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear

2018-05-26 Thread Razvan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9441:
--
Affects Version/s: 1.6.0

> Hadoop Required Dependency List Not Clear
> -
>
> Key: FLINK-9441
> URL: https://issues.apache.org/jira/browse/FLINK-9441
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0, 1.4.2, 1.6.0
>Reporter: Razvan
>Priority: Blocker
>
> To be able to use Apache Flink with S3, a few libraries from Hadoop 
> distribution are required to be added to lib/ folder.
> This list is partially documented in 
> [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
>   (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas 
> Flink supports 2.8 also.
> So i would like to know how to compose the dependecy list for Hadoop 2.8 and 
> update the documentation.
> Also would it be possible to bundle the dependencies in a separate archive 
> that users can download?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear

2018-05-26 Thread Razvan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9441:
--
Description: 
To be able to use Apache Flink with S3, a few libraries from Hadoop 
distribution are required to be added to lib/ folder.

This list is partially documented in 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
  (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink 
supports 2.8 also.

So i would like to know how to compose the dependecy list for Hadoop 2.8 and 
update the documentation.

Also would it be possible to bundle the dependencies in a separate archive that 
users can download?

  was:
To be able to use Apache Flink with S3, a few libraries from Hadoop 
distribution are required to be added to lib/ folder of the distribution.

This list is partially documented in 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
  (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink 
supports 2.8 also.

So i would like to know how to compose the dependecy list for Hadoop 2.8 and 
update the documentation.

Also would it be possible to bundle the dependencies in a separate archive that 
users can download?


> Hadoop Required Dependency List Not Clear
> -
>
> Key: FLINK-9441
> URL: https://issues.apache.org/jira/browse/FLINK-9441
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0, 1.4.2
>Reporter: Razvan
>Priority: Blocker
>
> To be able to use Apache Flink with S3, a few libraries from Hadoop 
> distribution are required to be added to lib/ folder.
> This list is partially documented in 
> [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
>   (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas 
> Flink supports 2.8 also.
> So i would like to know how to compose the dependecy list for Hadoop 2.8 and 
> update the documentation.
> Also would it be possible to bundle the dependencies in a separate archive 
> that users can download?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9441) Hadoop Required Dependency List Not Clear

2018-05-26 Thread Razvan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-9441:
--
Description: 
To be able to use Apache Flink with S3, a few libraries from Hadoop 
distribution are required to be added to lib/ folder of the distribution.

This list is partially documented in 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
  (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink 
supports 2.8 also.

So i would like to know how to compose the dependecy list for Hadoop 2.8 and 
update the documentation.

Also would it be possible to bundle the dependencies in a separate archive that 
users can download?

  was:
To be able to use Apache Flink with S3, a few libraries from Hadoop are 
required to be added to lib/ folder of the distribution.

This list is partially documented in 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
  (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink 
supports 2.8 also.

So i would like to know how to compose the dependecy list for Hadoop 2.8 and 
update the documentation.

Also would it be possible to bundle the dependencies in a separate archive that 
users can download?


> Hadoop Required Dependency List Not Clear
> -
>
> Key: FLINK-9441
> URL: https://issues.apache.org/jira/browse/FLINK-9441
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0, 1.4.2
>Reporter: Razvan
>Priority: Blocker
>
> To be able to use Apache Flink with S3, a few libraries from Hadoop 
> distribution are required to be added to lib/ folder of the distribution.
> This list is partially documented in 
> [https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
>   (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas 
> Flink supports 2.8 also.
> So i would like to know how to compose the dependecy list for Hadoop 2.8 and 
> update the documentation.
> Also would it be possible to bundle the dependencies in a separate archive 
> that users can download?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9441) Hadoop Required Dependency List Not Clear

2018-05-26 Thread Razvan (JIRA)
Razvan created FLINK-9441:
-

 Summary: Hadoop Required Dependency List Not Clear
 Key: FLINK-9441
 URL: https://issues.apache.org/jira/browse/FLINK-9441
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.4.2, 1.5.0
Reporter: Razvan


To be able to use Apache Flink with S3, a few libraries from Hadoop are 
required to be added to lib/ folder of the distribution.

This list is partially documented in 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html]
  (Provide S3 FileSystem Dependency). But it refers to Hadoop 2.7 whereas Flink 
supports 2.8 also.

So i would like to know how to compose the dependecy list for Hadoop 2.8 and 
update the documentation.

Also would it be possible to bundle the dependencies in a separate archive that 
users can download?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-7803) Update savepoint Documentation

2017-10-11 Thread Razvan (JIRA)
Razvan created FLINK-7803:
-

 Summary: Update savepoint Documentation
 Key: FLINK-7803
 URL: https://issues.apache.org/jira/browse/FLINK-7803
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Razvan


Can you please update 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html
 to specify the savepoint location *MUST* always be a location accessible by 
all hosts?
I spent quite some time believing it'S a bug and trying to find solutions, see 
https://issues.apache.org/jira/browse/FLINK-7750. It's not obvious in the 
current documentation and other might waste time also believing it's an actual 
issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7750) Strange behaviour in savepoints

2017-10-10 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199262#comment-16199262
 ] 

Razvan commented on FLINK-7750:
---

[~aljoscha] Many thanks for clarifying this. Is there a way to update 
documentation to specify the savepoint location has to be accessible from all 
hosts as it's not really clear here 
(https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html)?
 Is there a way I can update the documentation myself?

> Strange behaviour in savepoints
> ---
>
> Key: FLINK-7750
> URL: https://issues.apache.org/jira/browse/FLINK-7750
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.2
>Reporter: Razvan
>Priority: Blocker
>
> I recently upgraded from 1.2.0 and savepoint creration behaves strange. 
> Whenever I try to create a savepoint with specified directory Apache Flink 
> creates a folder on the active JobManager (even if I trigger savepoint 
> creation from a different JobManager) which contains only _metadata. And 
> another folder on the TaskManager where the job is running which contains the 
> actual savepoint.
> Obviously if I try to restore it says it can't find the savepoint.
> This worked in 1.2.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7750) Strange behaviour in savepoints

2017-10-02 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187987#comment-16187987
 ] 

Razvan commented on FLINK-7750:
---

[~aljoscha] Hi, it's on the local disk of the JobManager whre I'm running the 
command from. But as I told it will only save the _metadata in this location 
and for the savepoint it will create the same folder location but on the 
TaskManager where the job is currently running. So I cannot use any of the 
locations to restore, I would need to copy all this to one folder and use that.
With 1.2.0 it didn't split it like this, all could be found on the Jobmanager. 
(With distributed FS S3 works fine)

> Strange behaviour in savepoints
> ---
>
> Key: FLINK-7750
> URL: https://issues.apache.org/jira/browse/FLINK-7750
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.2
>Reporter: Razvan
>Priority: Blocker
>
> I recently upgraded from 1.2.0 and savepoint creration behaves strange. 
> Whenever I try to create a savepoint with specified directory Apache Flink 
> creates a folder on the active JobManager (even if I trigger savepoint 
> creation from a different JobManager) which contains only _metadata. And 
> another folder on the TaskManager where the job is running which contains the 
> actual savepoint.
> Obviously if I try to restore it says it can't find the savepoint.
> This worked in 1.2.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (FLINK-7750) Strange behaviour in savepoints

2017-10-02 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187987#comment-16187987
 ] 

Razvan edited comment on FLINK-7750 at 10/2/17 12:55 PM:
-

[~aljoscha] Hi, it's on the local disk of the JobManager where I'm running the 
command from. But as I told it will only save the _metadata in this location 
and for the savepoint it will create the same folder location but on the 
TaskManager where the job is currently running. So I cannot use any of the 
locations to restore, I would need to copy all this to one folder and use that.
With 1.2.0 it didn't split it like this, all could be found on the Jobmanager. 
(With distributed FS S3 works fine)


was (Author: razvan):
[~aljoscha] Hi, it's on the local disk of the JobManager whre I'm running the 
command from. But as I told it will only save the _metadata in this location 
and for the savepoint it will create the same folder location but on the 
TaskManager where the job is currently running. So I cannot use any of the 
locations to restore, I would need to copy all this to one folder and use that.
With 1.2.0 it didn't split it like this, all could be found on the Jobmanager. 
(With distributed FS S3 works fine)

> Strange behaviour in savepoints
> ---
>
> Key: FLINK-7750
> URL: https://issues.apache.org/jira/browse/FLINK-7750
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.2
>Reporter: Razvan
>Priority: Blocker
>
> I recently upgraded from 1.2.0 and savepoint creration behaves strange. 
> Whenever I try to create a savepoint with specified directory Apache Flink 
> creates a folder on the active JobManager (even if I trigger savepoint 
> creation from a different JobManager) which contains only _metadata. And 
> another folder on the TaskManager where the job is running which contains the 
> actual savepoint.
> Obviously if I try to restore it says it can't find the savepoint.
> This worked in 1.2.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (FLINK-7750) Strange behaviour in savepoints

2017-10-02 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187875#comment-16187875
 ] 

Razvan edited comment on FLINK-7750 at 10/2/17 11:47 AM:
-

Sure, the application is reading messages from Kafka then calls a service via 
HTTP. The state is made up of counters. I tried to use S3 (Configured 
state.savepoints.dir: [S3]:///flink/savepoints) and this works fine.
Using RocksDB as state backend and S3 location.
But if I specify folder location from JobManager it splits the files as I wrote 
in description.
The job is dependant on Apache Flink version 1.2.0 (Maven dependency) , should 
this be an issue?


was (Author: razvan):
Sure, the application is reading messages from Kafka then calls a service via 
HTTP. The state is made up of counters. I tried to use S3 (Configured 
state.savepoints.dir: [S3]:///flink/savepoints) and this works fine.
But if I specify folder location from JobManager it splits the files as I wrote 
in description.
The job is dependant on Apache Flink version 1.2.0 (Maven dependency) , should 
this be an issue?

> Strange behaviour in savepoints
> ---
>
> Key: FLINK-7750
> URL: https://issues.apache.org/jira/browse/FLINK-7750
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.2
>Reporter: Razvan
>Priority: Blocker
>
> I recently upgraded from 1.2.0 and savepoint creration behaves strange. 
> Whenever I try to create a savepoint with specified directory Apache Flink 
> creates a folder on the active JobManager (even if I trigger savepoint 
> creation from a different JobManager) which contains only _metadata. And 
> another folder on the TaskManager where the job is running which contains the 
> actual savepoint.
> Obviously if I try to restore it says it can't find the savepoint.
> This worked in 1.2.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7750) Strange behaviour in savepoints

2017-10-02 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187875#comment-16187875
 ] 

Razvan commented on FLINK-7750:
---

Sure, the application is reading messages from Kafka then calls a service via 
HTTP. The state is made up of counters. I tried to use S3 (Configured 
state.savepoints.dir: [S3]:///flink/savepoints) and this works fine.
But if I specify folder location from JobManager it splits the files as I wrote 
in description.
The job is dependant on Apache Flink version 1.2.0 (Maven dependency) , should 
this be an issue?

> Strange behaviour in savepoints
> ---
>
> Key: FLINK-7750
> URL: https://issues.apache.org/jira/browse/FLINK-7750
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.2
>Reporter: Razvan
>Priority: Blocker
>
> I recently upgraded from 1.2.0 and savepoint creration behaves strange. 
> Whenever I try to create a savepoint with specified directory Apache Flink 
> creates a folder on the active JobManager (even if I trigger savepoint 
> creation from a different JobManager) which contains only _metadata. And 
> another folder on the TaskManager where the job is running which contains the 
> actual savepoint.
> Obviously if I try to restore it says it can't find the savepoint.
> This worked in 1.2.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7750) Strange behaviour in savepoints

2017-10-02 Thread Razvan (JIRA)
Razvan created FLINK-7750:
-

 Summary: Strange behaviour in savepoints
 Key: FLINK-7750
 URL: https://issues.apache.org/jira/browse/FLINK-7750
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.3.2
Reporter: Razvan
Priority: Blocker


I recently upgraded from 1.2.0 and savepoint creration behaves strange. 
Whenever I try to create a savepoint with specified directory Apache Flink 
creates a folder on the active JobManager (even if I trigger savepoint creation 
from a different JobManager) which contains only _metadata. And another folder 
on the TaskManager where the job is running which contains the actual savepoint.
Obviously if I try to restore it says it can't find the savepoint.
This worked in 1.2.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-4587) Yet another java.lang.NoSuchFieldError: INSTANCE

2017-06-22 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059508#comment-16059508
 ] 

Razvan commented on FLINK-4587:
---

@[~RenkaiGe] I have the same issue. Using Flink 1.2 and created an application 
to call some REST endpoints using HttpClient. Does this mean I need to build 
Flink?

> Yet another java.lang.NoSuchFieldError: INSTANCE
> 
>
> Key: FLINK-4587
> URL: https://issues.apache.org/jira/browse/FLINK-4587
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Local Runtime
>Affects Versions: 1.2.0
> Environment: Latest SNAPSHOT
>Reporter: Renkai Ge
> Attachments: diff in mvn clean package.png, flink-explore-src.zip
>
>
> For some reason I need to use org.apache.httpcomponents:httpasyncclient:4.1.2 
> in flink.
> The source file is:
> {code}
> import org.apache.flink.streaming.api.scala._
> import org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory
> /**
>   * Created by renkai on 16/9/7.
>   */
> object Main {
>   def main(args: Array[String]): Unit = {
> val instance = ManagedNHttpClientConnectionFactory.INSTANCE
> println("instance = " + instance)
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> val stream = env.fromCollection(1 to 100)
> val result = stream.map { x =>
>   x * 2
> }
> result.print()
> env.execute("xixi")
>   }
> }
> {code}
> and 
> {code}
> name := "flink-explore"
> version := "1.0"
> scalaVersion := "2.11.8"
> crossPaths := false
> libraryDependencies ++= Seq(
>   "org.apache.flink" %% "flink-scala" % "1.2-SNAPSHOT"
> exclude("com.google.code.findbugs", "jsr305"),
>   "org.apache.flink" %% "flink-connector-kafka-0.8" % "1.2-SNAPSHOT"
> exclude("com.google.code.findbugs", "jsr305"),
>   "org.apache.flink" %% "flink-streaming-scala" % "1.2-SNAPSHOT"
> exclude("com.google.code.findbugs", "jsr305"),
>   "org.apache.flink" %% "flink-clients" % "1.2-SNAPSHOT"
> exclude("com.google.code.findbugs", "jsr305"),
>   "org.apache.httpcomponents" % "httpasyncclient" % "4.1.2"
> )
> {code}
> I use `sbt assembly` to get a fat jar.
> If I run the command
> {code}
>  java -cp flink-explore-assembly-1.0.jar Main
> {code}
> I get the result 
> {code}
> instance = 
> org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory@4909b8da
> log4j:WARN No appenders could be found for logger 
> (org.apache.flink.api.scala.ClosureCleaner$).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Connected to JobManager at Actor[akka://flink/user/jobmanager_1#-1177584915]
> 09/07/2016 12:05:26   Job execution switched to status RUNNING.
> 09/07/2016 12:05:26   Source: Collection Source(1/1) switched to SCHEDULED
> 09/07/2016 12:05:26   Source: Collection Source(1/1) switched to DEPLOYING
> ...
> 09/07/2016 12:05:26   Map -> Sink: Unnamed(20/24) switched to RUNNING
> 09/07/2016 12:05:26   Map -> Sink: Unnamed(19/24) switched to RUNNING
> 15> 30
> 20> 184
> ...
> 19> 182
> 1> 194
> 8> 160
> 09/07/2016 12:05:26   Source: Collection Source(1/1) switched to FINISHED
> ...
> 09/07/2016 12:05:26   Map -> Sink: Unnamed(1/24) switched to FINISHED
> 09/07/2016 12:05:26   Job execution switched to status FINISHED.
> {code}
> Nothing special.
> But if I run the jar by
> {code}
> ./bin/flink run shop-monitor-flink-assembly-1.0.jar
> {code}
> I will get an error
> {code}
> $ ./bin/flink run flink-explore-assembly-1.0.jar
> Cluster configuration: Standalone cluster with JobManager at /127.0.0.1:6123
> Using address 127.0.0.1:6123 to connect to JobManager.
> JobManager web interface address http://127.0.0.1:8081
> Starting execution of program
> 
>  The program finished with the following exception:
> java.lang.NoSuchFieldError: INSTANCE
>   at 
> org.apache.http.impl.nio.codecs.DefaultHttpRequestWriterFactory.(DefaultHttpRequestWriterFactory.java:53)
>   at 
> org.apache.http.impl.nio.codecs.DefaultHttpRequestWriterFactory.(DefaultHttpRequestWriterFactory.java:57)
>   at 
> org.apache.http.impl.nio.codecs.DefaultHttpRequestWriterFactory.(DefaultHttpRequestWriterFactory.java:47)
>   at 
> org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory.(ManagedNHttpClientConnectionFactory.java:75)
>   at 
> org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory.(ManagedNHttpClientConnectionFactory.java:83)
>   at 
> org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionFactory.(ManagedNHttpClientConnectionFactory.java:64)
>   at Main$.main(Main.scala:9)
>   at Main.main(Main.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> 

[jira] [Closed] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-24 Thread Razvan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan closed FLINK-6063.
-
Resolution: Not A Problem

It's not an actual issue with the framework just isn't clear dfs MUST be used 
for HA.

> HA Configuration doesn't work with Flink 1.2
> 
>
> Key: FLINK-6063
> URL: https://issues.apache.org/jira/browse/FLINK-6063
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.2.0
>Reporter: Razvan
>Priority: Critical
> Attachments: flink-conf.yaml, Logs.tar.gz, masters, slaves, zoo.cfg
>
>
>  I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 
> TaskManagers. I start the Zookeeper Quorum from JobManager1, I get 
> confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink 
> job on this JobManager1.   
>  
>  The flink-conf.yaml is the same on all 5 VMs (also everything else related 
> to flink because I copied the folder across all VMs as suggested in 
> tutorials) this means jobmanager.rpc.address: points to JobManager1 
> everywhere.
> If I turn off the VM running JobManager1 I would expect Zookeeper to say one 
> of the remaining JobManagers is the leader and the TaskManagers should 
> reconnect to it. Instead a new leader is elected but the slaves keep 
> connecting to the old master
> 2017-03-15 10:28:28,655 INFO  org.apache.flink.core.fs.FileSystem 
>   - Ensuring all FileSystem streams are closed for Async 
> calls on Source: Custom Source -> Flat Map (1/1)
> 2017-03-15 10:28:38,534 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Disassociated] 
> 2017-03-15 10:28:46,606 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:28:52,431 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:02,435 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:10,489 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - TaskManager 
> akka://flink/user/taskmanager disconnects from JobManager 
> akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its 
> leadership.
> 2017-03-15 10:29:10,490 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - Cancelling 
> all computations and discarding all cached data.
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Source: Custom Source 
> -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223).
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Source: Custom Source -> Flat Map (1/1) 
> (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager 
> disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: 
> Old JobManager lost its leadership.
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>   at 

[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-24 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940707#comment-15940707
 ] 

Razvan commented on FLINK-6063:
---

Hi, it works with DFS for me though I'd underline it is required for HA more in 
the documentation. Thank you for the help!

> HA Configuration doesn't work with Flink 1.2
> 
>
> Key: FLINK-6063
> URL: https://issues.apache.org/jira/browse/FLINK-6063
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.2.0
>Reporter: Razvan
>Priority: Critical
> Attachments: flink-conf.yaml, Logs.tar.gz, masters, slaves, zoo.cfg
>
>
>  I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 
> TaskManagers. I start the Zookeeper Quorum from JobManager1, I get 
> confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink 
> job on this JobManager1.   
>  
>  The flink-conf.yaml is the same on all 5 VMs (also everything else related 
> to flink because I copied the folder across all VMs as suggested in 
> tutorials) this means jobmanager.rpc.address: points to JobManager1 
> everywhere.
> If I turn off the VM running JobManager1 I would expect Zookeeper to say one 
> of the remaining JobManagers is the leader and the TaskManagers should 
> reconnect to it. Instead a new leader is elected but the slaves keep 
> connecting to the old master
> 2017-03-15 10:28:28,655 INFO  org.apache.flink.core.fs.FileSystem 
>   - Ensuring all FileSystem streams are closed for Async 
> calls on Source: Custom Source -> Flat Map (1/1)
> 2017-03-15 10:28:38,534 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Disassociated] 
> 2017-03-15 10:28:46,606 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:28:52,431 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:02,435 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:10,489 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - TaskManager 
> akka://flink/user/taskmanager disconnects from JobManager 
> akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its 
> leadership.
> 2017-03-15 10:29:10,490 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - Cancelling 
> all computations and discarding all cached data.
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Source: Custom Source 
> -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223).
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Source: Custom Source -> Flat Map (1/1) 
> (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager 
> disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: 
> Old JobManager lost its leadership.
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>   at 

[jira] [Comment Edited] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-17 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929644#comment-15929644
 ] 

Razvan edited comment on FLINK-6063 at 3/17/17 9:29 AM:


[~till.rohrmann] Hi, I just had a chat with [~dawidwys] and he pointed the 
issue might be that I'm using local filesystem instead of distributed FS. Could 
that be it? Uploaded logs and configuration also.


was (Author: razvan):
[~till.rohrmann] Hi, I just had a chat with [~dawidwys] and he pointed the 
issue might be that I'm using local filesystem instead of distributed FS. Could 
that be it?

> HA Configuration doesn't work with Flink 1.2
> 
>
> Key: FLINK-6063
> URL: https://issues.apache.org/jira/browse/FLINK-6063
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.2.0
>Reporter: Razvan
>Priority: Critical
> Attachments: flink-conf.yaml, Logs.tar.gz, masters, slaves, zoo.cfg
>
>
>  I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 
> TaskManagers. I start the Zookeeper Quorum from JobManager1, I get 
> confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink 
> job on this JobManager1.   
>  
>  The flink-conf.yaml is the same on all 5 VMs (also everything else related 
> to flink because I copied the folder across all VMs as suggested in 
> tutorials) this means jobmanager.rpc.address: points to JobManager1 
> everywhere.
> If I turn off the VM running JobManager1 I would expect Zookeeper to say one 
> of the remaining JobManagers is the leader and the TaskManagers should 
> reconnect to it. Instead a new leader is elected but the slaves keep 
> connecting to the old master
> 2017-03-15 10:28:28,655 INFO  org.apache.flink.core.fs.FileSystem 
>   - Ensuring all FileSystem streams are closed for Async 
> calls on Source: Custom Source -> Flat Map (1/1)
> 2017-03-15 10:28:38,534 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Disassociated] 
> 2017-03-15 10:28:46,606 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:28:52,431 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:02,435 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:10,489 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - TaskManager 
> akka://flink/user/taskmanager disconnects from JobManager 
> akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its 
> leadership.
> 2017-03-15 10:29:10,490 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - Cancelling 
> all computations and discarding all cached data.
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Source: Custom Source 
> -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223).
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Source: Custom Source -> Flat Map (1/1) 
> (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager 
> disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: 
> Old JobManager lost its leadership.
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> 

[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-17 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929644#comment-15929644
 ] 

Razvan commented on FLINK-6063:
---

[~till.rohrmann] Hi, I just had a chat with [~dawidwys] and he pointed the 
issue might be that I'm using local filesystem instead of distributed FS. Could 
that be it?

> HA Configuration doesn't work with Flink 1.2
> 
>
> Key: FLINK-6063
> URL: https://issues.apache.org/jira/browse/FLINK-6063
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.2.0
>Reporter: Razvan
>Priority: Critical
> Attachments: flink-conf.yaml, Logs.tar.gz, masters, slaves, zoo.cfg
>
>
>  I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 
> TaskManagers. I start the Zookeeper Quorum from JobManager1, I get 
> confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink 
> job on this JobManager1.   
>  
>  The flink-conf.yaml is the same on all 5 VMs (also everything else related 
> to flink because I copied the folder across all VMs as suggested in 
> tutorials) this means jobmanager.rpc.address: points to JobManager1 
> everywhere.
> If I turn off the VM running JobManager1 I would expect Zookeeper to say one 
> of the remaining JobManagers is the leader and the TaskManagers should 
> reconnect to it. Instead a new leader is elected but the slaves keep 
> connecting to the old master
> 2017-03-15 10:28:28,655 INFO  org.apache.flink.core.fs.FileSystem 
>   - Ensuring all FileSystem streams are closed for Async 
> calls on Source: Custom Source -> Flat Map (1/1)
> 2017-03-15 10:28:38,534 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Disassociated] 
> 2017-03-15 10:28:46,606 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:28:52,431 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:02,435 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:10,489 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - TaskManager 
> akka://flink/user/taskmanager disconnects from JobManager 
> akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its 
> leadership.
> 2017-03-15 10:29:10,490 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - Cancelling 
> all computations and discarding all cached data.
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Source: Custom Source 
> -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223).
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Source: Custom Source -> Flat Map (1/1) 
> (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager 
> disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: 
> Old JobManager lost its leadership.
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>   at 
> 

[jira] [Updated] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-17 Thread Razvan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Razvan updated FLINK-6063:
--
Attachment: zoo.cfg
slaves
masters
flink-conf.yaml
Logs.tar.gz

> HA Configuration doesn't work with Flink 1.2
> 
>
> Key: FLINK-6063
> URL: https://issues.apache.org/jira/browse/FLINK-6063
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.2.0
>Reporter: Razvan
>Priority: Critical
> Attachments: flink-conf.yaml, Logs.tar.gz, masters, slaves, zoo.cfg
>
>
>  I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 
> TaskManagers. I start the Zookeeper Quorum from JobManager1, I get 
> confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink 
> job on this JobManager1.   
>  
>  The flink-conf.yaml is the same on all 5 VMs (also everything else related 
> to flink because I copied the folder across all VMs as suggested in 
> tutorials) this means jobmanager.rpc.address: points to JobManager1 
> everywhere.
> If I turn off the VM running JobManager1 I would expect Zookeeper to say one 
> of the remaining JobManagers is the leader and the TaskManagers should 
> reconnect to it. Instead a new leader is elected but the slaves keep 
> connecting to the old master
> 2017-03-15 10:28:28,655 INFO  org.apache.flink.core.fs.FileSystem 
>   - Ensuring all FileSystem streams are closed for Async 
> calls on Source: Custom Source -> Flat Map (1/1)
> 2017-03-15 10:28:38,534 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Disassociated] 
> 2017-03-15 10:28:46,606 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:28:52,431 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:02,435 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:10,489 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - TaskManager 
> akka://flink/user/taskmanager disconnects from JobManager 
> akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its 
> leadership.
> 2017-03-15 10:29:10,490 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - Cancelling 
> all computations and discarding all cached data.
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Source: Custom Source 
> -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223).
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Source: Custom Source -> Flat Map (1/1) 
> (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager 
> disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: 
> Old JobManager lost its leadership.
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>   at 

[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-16 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928047#comment-15928047
 ] 

Razvan commented on FLINK-6063:
---

Sure, here

id 0x15ad68d898c0005
2017-03-16 09:58:17,319 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /1.2.3.5:53748
2017-03-16 09:58:17,320 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /1.2.3.5:53748
2017-03-16 09:58:17,322 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x15ad68d898c0006 with negotiated timeout 
4 for client /1.2.3.5:53748
2017-03-16 09:58:18,336 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.5:53748 which had 
sessionid 0x15ad68d898c0006
2017-03-16 10:10:23,881 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x15ad68d898c0001, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-03-16 10:10:23,885 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.4:45752 which had 
sessionid 0x15ad68d898c0001
2017-03-16 10:10:23,885 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x15ad68d898c0002, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-03-16 10:10:23,887 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.4:45754 which had 
sessionid 0x15ad68d898c0002
2017-03-16 13:11:21,841 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /[Client 2 IP here]:57692
2017-03-16 13:11:34,787 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to renew session 0x35ad68d8b4d0005 at /[Client 
2 IP here]:57692
2017-03-16 13:11:34,787 INFO  org.apache.zookeeper.server.quorum.Learner
- Revalidating client: 0x35ad68d8b4d0005
2017-03-16 13:11:34,788 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Invalid session 0x35ad68d8b4d0005 for client /[Client 2 IP 
here]:57692, probably expired
2017-03-16 13:11:34,789 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /[Client 2 IP here]:57692 
which had sessionid 0x35ad68d8b4d0005


> HA Configuration doesn't work with Flink 1.2
> 
>
> Key: FLINK-6063
> URL: https://issues.apache.org/jira/browse/FLINK-6063
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.2.0
>Reporter: Razvan
>Priority: Critical
>
>  I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 
> TaskManagers. I start the Zookeeper Quorum from JobManager1, I get 
> confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink 
> job on this JobManager1.   
>  
>  The flink-conf.yaml is the same on all 5 VMs (also everything else related 
> to flink because I copied the folder across all VMs as suggested in 
> tutorials) this means jobmanager.rpc.address: points to JobManager1 
> everywhere.
> If I turn off the VM running JobManager1 I would expect Zookeeper to say one 
> of the remaining JobManagers is the leader and the TaskManagers should 
> reconnect to it. Instead a new leader is elected but the slaves keep 
> connecting to the old master
> 2017-03-15 10:28:28,655 INFO  org.apache.flink.core.fs.FileSystem 
>   - Ensuring all FileSystem streams are closed for Async 
> calls on Source: Custom Source -> Flat Map (1/1)
> 2017-03-15 10:28:38,534 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Disassociated] 
> 2017-03-15 10:28:46,606 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> 

[jira] [Comment Edited] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-16 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927872#comment-15927872
 ] 

Razvan edited comment on FLINK-6063 at 3/16/17 11:42 AM:
-

Below the code for the Job (Main Class):


import java.util.concurrent.TimeUnit;

import org.apache.flink.api.common.restartstrategy.RestartStrategies;
import org.apache.flink.api.common.time.Time;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.CheckpointConfig;
import 
org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.streaming.util.serialization.SimpleStringSchema;

import com.test.event.reader.SimpleEventReader;

public class StreamingAccuracyJob {

private final static String JOB_NAME = "Prototype CEP";

final static StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();

private static void configureEnvironment() {
// start a checkpoint every 1 ms
env.enableCheckpointing(1);

// advanced options:

// set mode to exactly-once (this is the default)

env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);

// make sure 500 ms of progress happen between checkpoints
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(5000);

// checkpoints have to complete within one minute, or are 
discarded
env.getCheckpointConfig().setCheckpointTimeout(6);

// allow only one checkpoint to be in progress at the same time
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);

env.setRestartStrategy(RestartStrategies.fixedDelayRestart(50, 
// number

// of

// restart

// attempts
Time.of(10, TimeUnit.SECONDS) // delay
));

env.setParallelism(1);
// env.setMaxParallelism(1);
// https://issues.apache.org/jira/browse/FLINK-5773

CheckpointConfig config = env.getCheckpointConfig();

config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION);

}

public static void main(String[] args) throws Exception {

DataStream stream = env.addSource(new 
FlinkKafkaConsumer010<>(KafkaConfiguration.TOPIC_NAME,
new SimpleStringSchema(), 
KafkaConfiguration.getConnectionProperties()));

configureEnvironment();

DataStream streamTuples = stream.flatMap(new 
JsonToTupleFlatMap());


streamTuples.keyBy(SimpleEventReader.FIELD_USERID_TUPLE_POSITION).flatMap(new 
AccuracyTestEvents());

env.execute(JOB_NAME);
}

}



was (Author: razvan):
Below the code for the Job (Main Class):


import java.util.concurrent.TimeUnit;

import org.apache.flink.api.common.restartstrategy.RestartStrategies;
import org.apache.flink.api.common.time.Time;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.CheckpointConfig;
import 
org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.streaming.util.serialization.SimpleStringSchema;

import com.booxware.event.reader.SimpleEventReader;

public class StreamingAccuracyJob {

private final static String JOB_NAME = "Prototype CEP";

final static StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();

private static void configureEnvironment() {
// start a checkpoint every 1 ms
env.enableCheckpointing(1);

// advanced options:

// set mode to exactly-once (this is the default)

env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);

  

[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-16 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927872#comment-15927872
 ] 

Razvan commented on FLINK-6063:
---

Below the code for the Job (Main Class):


import java.util.concurrent.TimeUnit;

import org.apache.flink.api.common.restartstrategy.RestartStrategies;
import org.apache.flink.api.common.time.Time;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.CheckpointConfig;
import 
org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.streaming.util.serialization.SimpleStringSchema;

import com.booxware.event.reader.SimpleEventReader;

public class StreamingAccuracyJob {

private final static String JOB_NAME = "Prototype CEP";

final static StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();

private static void configureEnvironment() {
// start a checkpoint every 1 ms
env.enableCheckpointing(1);

// advanced options:

// set mode to exactly-once (this is the default)

env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);

// make sure 500 ms of progress happen between checkpoints
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(5000);

// checkpoints have to complete within one minute, or are 
discarded
env.getCheckpointConfig().setCheckpointTimeout(6);

// allow only one checkpoint to be in progress at the same time
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);

env.setRestartStrategy(RestartStrategies.fixedDelayRestart(50, 
// number

// of

// restart

// attempts
Time.of(10, TimeUnit.SECONDS) // delay
));

env.setParallelism(1);
// env.setMaxParallelism(1);
// https://issues.apache.org/jira/browse/FLINK-5773

CheckpointConfig config = env.getCheckpointConfig();

config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION);

}

public static void main(String[] args) throws Exception {

DataStream stream = env.addSource(new 
FlinkKafkaConsumer010<>(KafkaConfiguration.TOPIC_NAME,
new SimpleStringSchema(), 
KafkaConfiguration.getConnectionProperties()));

configureEnvironment();

DataStream streamTuples = stream.flatMap(new 
JsonToTupleFlatMap());


streamTuples.keyBy(SimpleEventReader.FIELD_USERID_TUPLE_POSITION).flatMap(new 
AccuracyTestEvents());

env.execute(JOB_NAME);
}

}


> HA Configuration doesn't work with Flink 1.2
> 
>
> Key: FLINK-6063
> URL: https://issues.apache.org/jira/browse/FLINK-6063
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.2.0
>Reporter: Razvan
>Priority: Critical
>
>  I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 
> TaskManagers. I start the Zookeeper Quorum from JobManager1, I get 
> confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink 
> job on this JobManager1.   
>  
>  The flink-conf.yaml is the same on all 5 VMs (also everything else related 
> to flink because I copied the folder across all VMs as suggested in 
> tutorials) this means jobmanager.rpc.address: points to JobManager1 
> everywhere.
> If I turn off the VM running JobManager1 I would expect Zookeeper to say one 
> of the remaining JobManagers is the leader and the TaskManagers should 
> reconnect to it. Instead a new leader is elected but the slaves keep 
> connecting to the old master
> 2017-03-15 10:28:28,655 INFO  org.apache.flink.core.fs.FileSystem 
>   - Ensuring all FileSystem streams are closed for Async 
> calls on Source: Custom Source -> Flat Map (1/1)
> 2017-03-15 10:28:38,534 WARN  

[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-16 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927844#comment-15927844
 ] 

Razvan commented on FLINK-6063:
---

Hi Dawid,

  Sorry to say but it's not fine, as you can see from the logs the job is not 
resumed, this was the purpose of HA and also TaskManagers connecting to the old 
master shows they didn't acknowledge the leader

> HA Configuration doesn't work with Flink 1.2
> 
>
> Key: FLINK-6063
> URL: https://issues.apache.org/jira/browse/FLINK-6063
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.2.0
>Reporter: Razvan
>Priority: Critical
>
>  I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 
> TaskManagers. I start the Zookeeper Quorum from JobManager1, I get 
> confirmation Zookeeper starts on the other 2 JobManagers then I start a Flink 
> job on this JobManager1.   
>  
>  The flink-conf.yaml is the same on all 5 VMs (also everything else related 
> to flink because I copied the folder across all VMs as suggested in 
> tutorials) this means jobmanager.rpc.address: points to JobManager1 
> everywhere.
> If I turn off the VM running JobManager1 I would expect Zookeeper to say one 
> of the remaining JobManagers is the leader and the TaskManagers should 
> reconnect to it. Instead a new leader is elected but the slaves keep 
> connecting to the old master
> 2017-03-15 10:28:28,655 INFO  org.apache.flink.core.fs.FileSystem 
>   - Ensuring all FileSystem streams are closed for Async 
> calls on Source: Custom Source -> Flat Map (1/1)
> 2017-03-15 10:28:38,534 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Disassociated] 
> 2017-03-15 10:28:46,606 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:28:52,431 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:02,435 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
> ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
> by: [Connection refused: /1.2.3.4:44779]
> 2017-03-15 10:29:10,489 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - TaskManager 
> akka://flink/user/taskmanager disconnects from JobManager 
> akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its 
> leadership.
> 2017-03-15 10:29:10,490 INFO  
> org.apache.flink.runtime.taskmanager.TaskManager  - Cancelling 
> all computations and discarding all cached data.
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Source: Custom Source 
> -> Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223).
> 2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Source: Custom Source -> Flat Map (1/1) 
> (75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager 
> disconnects from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: 
> Old JobManager lost its leadership.
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>  

[jira] [Comment Edited] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-16 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927786#comment-15927786
 ] 

Razvan edited comment on FLINK-6063 at 3/16/17 10:34 AM:
-

Hi Till, thanks for replying, sure I can attach the logs you mentioned 

  Started Zookeeper then cluster @~9:58, killed leader JobManager @~10:10 
TaskManagers stop retrying @~10:14

Job log:

Cluster configuration: Standalone cluster with JobManager at /1.2.3.4:45164
Using address 1.2.3.4:45164 to connect to JobManager.
JobManager web interface address http://1.2.3.4:8081
Starting execution of program
Submitting job with JobID: a7c96ad4345c1f07fe666bc5fd78256f. Waiting for job 
completion.
Connected to JobManager at 
Actor[akka.tcp://flink@1.2.3.4:45164/user/jobmanager#-1418996734]
03/16/2017 09:58:23 Job execution switched to status RUNNING.
03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to 
SCHEDULED 
03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to 
DEPLOYING 
03/16/2017 09:58:23 Flat Map(1/1) switched to SCHEDULED 
03/16/2017 09:58:23 Flat Map(1/1) switched to DEPLOYING 
03/16/2017 09:58:23 Flat Map(1/1) switched to RUNNING 
03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to 
RUNNING 


New JobManager elected. Connecting to null
Connected to JobManager at 
Actor[akka.tcp://flink@1.2.3.5:34987/user/jobmanager#-27372488]



Killed JobManager log:

2017-03-16 09:58:14,953 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /[Client 1 IP here]:40858
2017-03-16 09:58:14,953 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /[Client 1 IP 
here]:40858
2017-03-16 09:58:14,957 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x35ad68d8b4d0004 with negotiated timeout 
4 for client /[Client 1 IP here]:40858
2017-03-16 09:58:15,523 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /[Client 2 IP here]:40276
2017-03-16 09:58:15,528 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /[Client 2 IP 
here]:40276
2017-03-16 09:58:15,531 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x35ad68d8b4d0005 with negotiated timeout 
4 for client /[Client 2 IP here]:40276
2017-03-16 10:10:25,118 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x35ad68d8b4d0002, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-03-16 10:10:25,120 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.4:47872 which had 
sessionid 0x35ad68d8b4d0002



New Leader log:

2017-03-16 09:58:17,319 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /1.2.3.5:53748
2017-03-16 09:58:17,320 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /1.2.3.5:53748
2017-03-16 09:58:17,322 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x15ad68d898c0006 with negotiated timeout 
4 for client /1.2.3.5:53748
2017-03-16 09:58:18,336 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.5:53748 which had 
sessionid 0x15ad68d898c0006
2017-03-16 10:10:23,881 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x15ad68d898c0001, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-03-16 10:10:23,885 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.4:45752 which had 
sessionid 0x15ad68d898c0001
2017-03-16 10:10:23,885 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x15ad68d898c0002, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 

[jira] [Comment Edited] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-16 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927786#comment-15927786
 ] 

Razvan edited comment on FLINK-6063 at 3/16/17 10:28 AM:
-

Hi Till, thanks for replying, sure I can attach the logs you mentioned 

  Started Zookeeper then cluster @~9:58, killed leader JobManager @~10:10 Whole 
cluster died @~10:14

Job log:

Cluster configuration: Standalone cluster with JobManager at /1.2.3.4:45164
Using address 1.2.3.4:45164 to connect to JobManager.
JobManager web interface address http://1.2.3.4:8081
Starting execution of program
Submitting job with JobID: a7c96ad4345c1f07fe666bc5fd78256f. Waiting for job 
completion.
Connected to JobManager at 
Actor[akka.tcp://flink@1.2.3.4:45164/user/jobmanager#-1418996734]
03/16/2017 09:58:23 Job execution switched to status RUNNING.
03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to 
SCHEDULED 
03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to 
DEPLOYING 
03/16/2017 09:58:23 Flat Map(1/1) switched to SCHEDULED 
03/16/2017 09:58:23 Flat Map(1/1) switched to DEPLOYING 
03/16/2017 09:58:23 Flat Map(1/1) switched to RUNNING 
03/16/2017 09:58:23 Source: Custom Source -> Flat Map(1/1) switched to 
RUNNING 


New JobManager elected. Connecting to null
Connected to JobManager at 
Actor[akka.tcp://flink@1.2.3.5:34987/user/jobmanager#-27372488]



Killed JobManager log:

2017-03-16 09:58:14,953 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /[Client 1 IP here]:40858
2017-03-16 09:58:14,953 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /[Client 1 IP 
here]:40858
2017-03-16 09:58:14,957 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x35ad68d8b4d0004 with negotiated timeout 
4 for client /[Client 1 IP here]:40858
2017-03-16 09:58:15,523 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /[Client 2 IP here]:40276
2017-03-16 09:58:15,528 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /[Client 2 IP 
here]:40276
2017-03-16 09:58:15,531 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x35ad68d8b4d0005 with negotiated timeout 
4 for client /[Client 2 IP here]:40276
2017-03-16 10:10:25,118 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x35ad68d8b4d0002, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-03-16 10:10:25,120 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.4:47872 which had 
sessionid 0x35ad68d8b4d0002



New Leader log:

2017-03-16 09:58:17,319 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /1.2.3.5:53748
2017-03-16 09:58:17,320 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /1.2.3.5:53748
2017-03-16 09:58:17,322 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x15ad68d898c0006 with negotiated timeout 
4 for client /1.2.3.5:53748
2017-03-16 09:58:18,336 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.5:53748 which had 
sessionid 0x15ad68d898c0006
2017-03-16 10:10:23,881 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x15ad68d898c0001, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-03-16 10:10:23,885 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.4:45752 which had 
sessionid 0x15ad68d898c0001
2017-03-16 10:10:23,885 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x15ad68d898c0002, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 

[jira] [Comment Edited] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-16 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927786#comment-15927786
 ] 

Razvan edited comment on FLINK-6063 at 3/16/17 10:24 AM:
-

Hi Till, thanks for replying, sure I can attach the logs you mentioned 

  Started Zookeeper then cluster @~9:58, killed leader JobManager @~10:10 Whole 
cluster died @~10:14

Job log:

Cluster configuration: Standalone cluster with JobManager at /1.2.3.4:44307
Using address 1.2.3.4:44307 to connect to JobManager.
JobManager web interface address http://1.2.3.4:8081
Starting execution of program
Submitting job with JobID: 2c64b1126f327261b0c43f33f3cf43ee. Waiting for job 
completion.
Connected to JobManager at 
Actor[akka.tcp://flink@1.2.3.4:44307/user/jobmanager#2001981191]
03/16/2017 09:40:10 Job execution switched to status RUNNING.
03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to 
SCHEDULED 
03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to 
DEPLOYING 
03/16/2017 09:40:10 Flat Map(1/1) switched to SCHEDULED 
03/16/2017 09:40:10 Flat Map(1/1) switched to DEPLOYING 
03/16/2017 09:40:10 Flat Map(1/1) switched to RUNNING 
03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to 
RUNNING 
New JobManager elected. Connecting to null
Connected to JobManager at 
Actor[akka.tcp://flink@1.2.3.5:43828/user/jobmanager#1400235434]


Killed JobManager log:

2017-03-16 09:58:14,953 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /[Client 1 IP here]:40858
2017-03-16 09:58:14,953 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /[Client 1 IP 
here]:40858
2017-03-16 09:58:14,957 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x35ad68d8b4d0004 with negotiated timeout 
4 for client /[Client 1 IP here]:40858
2017-03-16 09:58:15,523 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /[Client 2 IP here]:40276
2017-03-16 09:58:15,528 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /[Client 2 IP 
here]:40276
2017-03-16 09:58:15,531 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x35ad68d8b4d0005 with negotiated timeout 
4 for client /[Client 2 IP here]:40276
2017-03-16 10:10:25,118 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x35ad68d8b4d0002, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-03-16 10:10:25,120 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.4:47872 which had 
sessionid 0x35ad68d8b4d0002



New Leader log:

2017-03-16 09:58:17,319 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /1.2.3.5:53748
2017-03-16 09:58:17,320 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /1.2.3.5:53748
2017-03-16 09:58:17,322 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x15ad68d898c0006 with negotiated timeout 
4 for client /1.2.3.5:53748
2017-03-16 09:58:18,336 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.5:53748 which had 
sessionid 0x15ad68d898c0006
2017-03-16 10:10:23,881 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x15ad68d898c0001, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-03-16 10:10:23,885 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.4:45752 which had 
sessionid 0x15ad68d898c0001
2017-03-16 10:10:23,885 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x15ad68d898c0002, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 

[jira] [Commented] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-16 Thread Razvan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927786#comment-15927786
 ] 

Razvan commented on FLINK-6063:
---

Hi Till, thanks for replying, sure I can attach the logs you mentioned 


Cluster configuration: Standalone cluster with JobManager at /1.2.3.4:44307
Using address 1.2.3.4:44307 to connect to JobManager.
JobManager web interface address http://1.2.3.4:8081
Starting execution of program
Submitting job with JobID: 2c64b1126f327261b0c43f33f3cf43ee. Waiting for job 
completion.
Connected to JobManager at 
Actor[akka.tcp://flink@1.2.3.4:44307/user/jobmanager#2001981191]
03/16/2017 09:40:10 Job execution switched to status RUNNING.
03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to 
SCHEDULED 
03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to 
DEPLOYING 
03/16/2017 09:40:10 Flat Map(1/1) switched to SCHEDULED 
03/16/2017 09:40:10 Flat Map(1/1) switched to DEPLOYING 
03/16/2017 09:40:10 Flat Map(1/1) switched to RUNNING 
03/16/2017 09:40:10 Source: Custom Source -> Flat Map(1/1) switched to 
RUNNING 
New JobManager elected. Connecting to null
Connected to JobManager at 
Actor[akka.tcp://flink@1.2.3.5:43828/user/jobmanager#1400235434]


Killed JobManager

2017-03-16 09:58:14,953 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /[Client 1 IP here]:40858
2017-03-16 09:58:14,953 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /[Client 1 IP 
here]:40858
2017-03-16 09:58:14,957 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x35ad68d8b4d0004 with negotiated timeout 
4 for client /[Client 1 IP here]:40858
2017-03-16 09:58:15,523 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /[Client 2 IP here]:40276
2017-03-16 09:58:15,528 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /[Client 2 IP 
here]:40276
2017-03-16 09:58:15,531 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x35ad68d8b4d0005 with negotiated timeout 
4 for client /[Client 2 IP here]:40276
2017-03-16 10:10:25,118 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x35ad68d8b4d0002, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-03-16 10:10:25,120 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.4:47872 which had 
sessionid 0x35ad68d8b4d0002



New Leader

2017-03-16 09:58:17,319 INFO  org.apache.zookeeper.server.NIOServerCnxnFactory  
- Accepted socket connection from /1.2.3.5:53748
2017-03-16 09:58:17,320 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Client attempting to establish new session at /1.2.3.5:53748
2017-03-16 09:58:17,322 INFO  org.apache.zookeeper.server.ZooKeeperServer   
- Established session 0x15ad68d898c0006 with negotiated timeout 
4 for client /1.2.3.5:53748
2017-03-16 09:58:18,336 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.5:53748 which had 
sessionid 0x15ad68d898c0006
2017-03-16 10:10:23,881 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x15ad68d898c0001, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-03-16 10:10:23,885 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- Closed socket connection for client /1.2.3.4:45752 which had 
sessionid 0x15ad68d898c0001
2017-03-16 10:10:23,885 WARN  org.apache.zookeeper.server.NIOServerCnxn 
- caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x15ad68d898c0002, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-03-16 10:10:23,887 INFO  org.apache.zookeeper.server.NIOServerCnxn 
- 

[jira] [Created] (FLINK-6063) HA Configuration doesn't work with Flink 1.2

2017-03-15 Thread Razvan (JIRA)
Razvan created FLINK-6063:
-

 Summary: HA Configuration doesn't work with Flink 1.2
 Key: FLINK-6063
 URL: https://issues.apache.org/jira/browse/FLINK-6063
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 1.2.0
Reporter: Razvan
Priority: Critical


 I have a setup with flink 1.2 cluster, made up of 3 JobManagers and 2 
TaskManagers. I start the Zookeeper Quorum from JobManager1, I get confirmation 
Zookeeper starts on the other 2 JobManagers then I start a Flink job on this 
JobManager1.   
 
 The flink-conf.yaml is the same on all 5 VMs (also everything else related to 
flink because I copied the folder across all VMs as suggested in tutorials) 
this means jobmanager.rpc.address: points to JobManager1 everywhere.

If I turn off the VM running JobManager1 I would expect Zookeeper to say one of 
the remaining JobManagers is the leader and the TaskManagers should reconnect 
to it. Instead a new leader is elected but the slaves keep connecting to the 
old master

2017-03-15 10:28:28,655 INFO  org.apache.flink.core.fs.FileSystem   
- Ensuring all FileSystem streams are closed for Async calls on 
Source: Custom Source -> Flat Map (1/1)
2017-03-15 10:28:38,534 WARN  akka.remote.ReliableDeliverySupervisor
- Association with remote system 
[akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
ms. Reason: [Disassociated] 
2017-03-15 10:28:46,606 WARN  akka.remote.ReliableDeliverySupervisor
- Association with remote system 
[akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
by: [Connection refused: /1.2.3.4:44779]
2017-03-15 10:28:52,431 WARN  akka.remote.ReliableDeliverySupervisor
- Association with remote system 
[akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
by: [Connection refused: /1.2.3.4:44779]
2017-03-15 10:29:02,435 WARN  akka.remote.ReliableDeliverySupervisor
- Association with remote system 
[akka.tcp://flink@1.2.3.4:44779] has failed, address is now gated for [5000] 
ms. Reason: [Association failed with [akka.tcp://flink@1.2.3.4:44779]] Caused 
by: [Connection refused: /1.2.3.4:44779]
2017-03-15 10:29:10,489 INFO  
org.apache.flink.runtime.taskmanager.TaskManager  - TaskManager 
akka://flink/user/taskmanager disconnects from JobManager 
akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager lost its 
leadership.
2017-03-15 10:29:10,490 INFO  
org.apache.flink.runtime.taskmanager.TaskManager  - Cancelling all 
computations and discarding all cached data.
2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task 
- Attempting to fail task externally Source: Custom Source -> 
Flat Map (1/1) (75fd495cc6acfd72fbe957e60e513223).
2017-03-15 10:29:10,491 INFO  org.apache.flink.runtime.taskmanager.Task 
- Source: Custom Source -> Flat Map (1/1) 
(75fd495cc6acfd72fbe957e60e513223) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects 
from JobManager akka.tcp://flink@1.2.3.4:44779/user/jobmanager: Old JobManager 
lost its leadership.
at 
org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1074)
at 
org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerLeaderAddress(TaskManager.scala:1426)
at 
org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:286)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at 
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at 
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at 
org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at