Re: Failing to recover once checkpoint fails

2017-10-04 Thread Vishal Santoshi
To add to it, my pipeline is a simple keyBy(0) .timeWindow(Time.of(window_size, TimeUnit.MINUTES)) .allowedLateness(Time.of(late_by, TimeUnit.SECONDS)) .reduce(new ReduceFunction(), new WindowFunction()) On Wed, Oct 4, 2017 at 8:19 PM, Vishal Santoshi

Failing to recover once checkpoint fails

2017-10-04 Thread Vishal Santoshi
Hello folks, As far as I know checkpoint failure should be ignored and retried with potentially larger state. I had this situation * hdfs went into a safe mode b'coz of Name Node issues * exception was thrown org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):

Unusual log message - Emitter thread got interrupted

2017-10-04 Thread Ken Krugler
Hi all, I’ve got a streaming topology with an iteration, and a RichAsyncFunction in that iteration. When the iteration terminates due to no activity, I see this message in the logs: 17/10/04 16:01:36 DEBUG async.Emitter:91 - Emitter thread got interrupted. This indicates that the emitter

Re: Classloader error after SSL setup

2017-10-04 Thread Aniket Deshpande
So, according to Eron's suggestion I tried *security.ssl.verify-hostname: false *configuration and that does the trick. I no longer get the classloader error even with *blob.service.ssl.enabled: true *configuration. Do you think the hostname verification fails because we are running flink

Re: Classloader error after SSL setup

2017-10-04 Thread Chesnay Schepler
I don't think this is a configuration problem, but a bug in Flink. But we'll have to dig a little deeper to be sure. Besides the actual SSL problem, what concerns me is that we didn't fail earlier. If a bug in the SSL setup prevents the up- or download of jars then we should fail earlier.

Re: Classloader error after SSL setup

2017-10-04 Thread Aniket Deshpande
Hi Chesnay, Thanks for the reply. After your suggestion, I found out that setting *blob.service.ssl.enabled: false* solved the issue and now all the pipelines run as expected. So, the issue is kinda narrowed down to blob service ssl now. I also checked the jobmanager logs when blob ssl is enabled

Re: Classloader error after SSL setup

2017-10-04 Thread Eron Wright
By following Chesney's recommendation we will hopefully uncover an SSL error that is being masked. Another thing to try is to disable hostname verification (it is enabled by default) to see whether the certificate is being rejected. On Wed, Oct 4, 2017 at 5:15 AM, Chesnay Schepler

Custom sliding window

2017-10-04 Thread G.S.Vijay Raajaa
Hi, I would like to implement a custom time based sliding window . The idea is that the window is of 1 hr size and slides every 5 second. I would like to process the window function after 10 records are accumulated in the window till it reaches 1 hr, post that since it slides every 5 second, the

Re: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated for S3 access

2017-10-04 Thread Hao Sun
Here is what my docker file says: ENV FLINK_VERSION=1.3.2 \ HADOOP_VERSION=27 \ SCALA_VERSION=2.11 \ On Wed, Oct 4, 2017 at 8:23 AM Hao Sun wrote: > I am running Flink 1.3.2 with docker on kubernetes. My docker is using > openjdk-8, I do not have hadoop, the version

Re: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated for S3 access

2017-10-04 Thread Hao Sun
I am running Flink 1.3.2 with docker on kubernetes. My docker is using openjdk-8, I do not have hadoop, the version is 2.7, scala is 2.11. Thanks! FROM openjdk:8-jre-alpine On Wed, Oct 4, 2017 at 8:11 AM Chesnay Schepler wrote: > I've found a few threads where an outdated

Re: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated for S3 access

2017-10-04 Thread Chesnay Schepler
I've found a few threads where an outdated jdk version on the server/client may be the cause. Which Flink binary (specifically, for which hadoop version) are you using? On 03.10.2017 20:48, Hao Sun wrote: com.amazonaws.http.AmazonHttpClient - Unable to execute HTTP

Re: Sink buffering

2017-10-04 Thread nragon
Got it :) Thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Weird error in submitting a flink job to yarn cluster

2017-10-04 Thread Chesnay Schepler
This isn't related to FLink but i might be able to help you out anyway. Does the ParquestFileWriter set the 'overwrite' flag when calling 'FileSystem#create()'? My suspicion is that you create a file for the first batch, write it out, but not delete it. For the next batch, the file cannot be

Re: Fw: Question on Flink on Window

2017-10-04 Thread Chesnay Schepler
You're attempting to start flink from the wrong directory, specifically from within the source directory. If you download Flinks source release from the downloads page you have to build it manually, in which case the 'start-local.bat' to run will be

Re: Classloader error after SSL setup

2017-10-04 Thread Chesnay Schepler
something that would also help us narrow down the problematic area is to enable SSL for one component at a time and see which one causesd the job to fail. On 04.10.2017 14:11, Chesnay Schepler wrote: The configuration looks reasonable. Just to be sure, are the paths accessible by all nodes?

Re: Classloader error after SSL setup

2017-10-04 Thread Chesnay Schepler
The configuration looks reasonable. Just to be sure, are the paths accessible by all nodes? As a first step, could you set the logging level to DEBUG (by modifying the 'conf/log4j.properties' file), resubmit the job (after a cluster restart) and check the Job- and TaskManager logs for any

Re: At end of complex parallel flow, how to force end step with parallel=1?

2017-10-04 Thread Fabian Hueske
Hi Garrett, that's strange. DataSet.reduceGroup() will create a non-parallel GroupReduce operator. So even without setting the parallelism manually to 1, the operator should not run in parallel. What might happen though is that a combiner is applied to locally reduce the data before it is shipped

Re: Noisy org.apache.flink.configuration.GlobalConfiguration

2017-10-04 Thread Stephan Ewen
Will be opening a PR for file system configuring which should fix that. On Tue, Sep 19, 2017 at 5:28 PM, Elias Levy wrote: > Till, > > Using 1.3.2 and like Ufuk mentioned, using S3 for checkpointing. > > On Tue, Sep 19, 2017 at 4:28 AM, Till Rohrmann

Re: History Server

2017-10-04 Thread Stephan Ewen
To add to this: The History Server is mainly useful in cases where one runs a Flink-cluster-per-job. One the job finished, the processes disappear. The History Server should be longer lived to make past executions' stats available. On Mon, Sep 25, 2017 at 3:44 PM, Nico Kruber

Re: Sink buffering

2017-10-04 Thread nragon
checkpointing interval ~= transactions are being committed on each Flink checkpoint So, if i set my checkpoint interval to 1ms, every 1ms there will be a commit, right? If I understoop correctly, TwoPhaseCommitSinkFunction stores transactions into it's state as for GenericWriteAheadSink it

Re: kafka consumer parallelism

2017-10-04 Thread r. r.
Thanks Timo & Tovi - this helped me get a better idea how it works @Carst, I have rebalance after the map() (messageStream.map(...).rebalance()) - doesn't it mean the load will be redistributed across all job managers' slots anyway? Or is the map() spread out only if I do as you suggest

Re: Sink buffering

2017-10-04 Thread Piotr Nowojski
What do you mean by "This always depends on checkpointing interval right?”? In TwoPhaseCommitSinkFunction, transactions are being committed on each Flink checkpoint. I guess same applies to GenericWriteAheadSink. The first one just commits/pre-commits the data on checkpoint, second rewrites

Re: Sink buffering

2017-10-04 Thread nragon
Thanks for you opinion on this. TwoPhaseCommitSinkFunction would probably be the best solution overall. Using this with something like Phoenix or Tephra would probably work. This always depends on checkpointing interval right? -- Sent from:

Re: Sink buffering

2017-10-04 Thread Piotr Nowojski
Hi, Do you mean buffer on state and you want to achieve exactly-once HBase sink? If so keep in mind that you will need some kind of transactions support in HBase to make it 100% reliable. Without transactions, buffering messages on state only reduces chance of duplicated records. How much

Re: How flink monitor source stream task(Time Trigger) is running?

2017-10-04 Thread Piotr Nowojski
You are welcome :) Piotrek > On Oct 2, 2017, at 1:19 PM, yunfan123 wrote: > > Thank you. > "If SourceFunction.run methods returns without an exception Flink assumes > that it has cleanly shutdown and that there were simply no more elements to > collect/create by