Re: Consumed input splits

2018-05-16 Thread Fabian Hueske
I think this would be a very good feature. There's a pretty old JIRA for it [1]. It's even from pre-Apache times because it was imported from the original Github repository. Cheers, Fabian [1] https://issues.apache.org/jira/browse/FLINK-766 2018-05-14 16:46 GMT+02:00 Flavio Pompermaier : > Any

Re: Missing MapState when Timer fires after restored state

2018-05-16 Thread Stefan Richter
Hi, I think that is very good to know and fix. It feels a bit like a not so nice API design in RocksDB that iterators are required to check on two methods and the documentation of this is also newer than most of our RocksDB code, so an update there clearly makes sense. @Sihua: if you want to f

Re: Missing MapState when Timer fires after restored state

2018-05-16 Thread Juho Autio
Yes, I'm rescaling from a checkpoint. > that behavior is not guaranteed yet If restoring + rescaling a checkpoint is not supported properly, I don't understand why Flink doesn't entirely refuse to restore in that case? Note that I'm using a rather new 1.5-SNAPSHOT, not release 1.4. To be exact,

Re: Missing MapState when Timer fires after restored state

2018-05-16 Thread sihua zhou
Hi, Juho > If restoring + rescaling a checkpoint is not supported properly, I don't > understand why Flink doesn't entirely refuse to restore in that case? I think you're asking the question I have asked in https://github.com/apache/flink/pull/5490, you can refer to it and find the comments

Re: Missing MapState when Timer fires after restored state

2018-05-16 Thread Juho Autio
Thanks Sihua. Stefan wrote: "we do not want that user rely on their checkpoints to be rescalable" ( https://github.com/apache/flink/pull/5490#issuecomment-365887734) This raises a couple of questions: - Is it a bug though, that the state restoring goes wrong like it does for my job? Based on my e

Re: Flink does not read from some Kafka Partitions

2018-05-16 Thread Tzu-Li (Gordon) Tai
Hi, Timo is correct - partition discovery is supported by the consumer only starting from Flink 1.4. The expected behaviour without partition discovery on, is that the list of partitions picked up on the first execution of the job will be the list of subscribed partition across all executions. Wh

Re: Missing MapState when Timer fires after restored state

2018-05-16 Thread sihua zhou
Hi Juho, I'm agree with you that sometimes it seems like a nightmare to create a savepoint for a state heavy job, it's also the reason that we use the checkpoint to recover the job. In fact, in our cases, it often takes more that 10min to take a savepoint successfully...Even though, we didn't me

[Flink 1.5.0] BlobServer data for a job is not getting cleaned up at JM

2018-05-16 Thread Amit Jain
Hi, We are running Flink 1.5.0 rc3 with YARN as cluster manager and found Job Manager is getting killed due to out of disk error. Upon further analysis, we found blob server data for a job is not getting cleaned up. Right now, we wrote directory cleanup script based on directory creation time of

Re: [Flink 1.5.0] BlobServer data for a job is not getting cleaned up at JM

2018-05-16 Thread Chesnay Schepler
Please open a JIRA. On 16.05.2018 13:58, Amit Jain wrote: Hi, We are running Flink 1.5.0 rc3 with YARN as cluster manager and found Job Manager is getting killed due to out of disk error. Upon further analysis, we found blob server data for a job is not getting cleaned up. Right now, we wrote

Re: [Flink 1.5.0] BlobServer data for a job is not getting cleaned up at JM

2018-05-16 Thread Amit Jain
Thanks Chesnay for the quick reply. I raised the ticket https://issues.apache.org/jira/browse/FLINK-9381. On Wed, May 16, 2018 at 5:33 PM, Chesnay Schepler wrote: > Please open a JIRA. > > > On 16.05.2018 13:58, Amit Jain wrote: >> >> Hi, >> >> We are running Flink 1.5.0 rc3 with YARN as cluster

Re: Best way to clean-up states in memory

2018-05-16 Thread Fabian Hueske
Hi Ashish, I had a look at your Trigger and couldn't spot anything that would explain leaking state. You're properly cleaning up in clear(). However, I might have found the problem for the increasing state size. A window is only completely deleted when the time passes its end timestamp (Window.ma

Fwd: Decrease initial source read speed

2018-05-16 Thread Andrei Shumanski
Hi, I am trying to use Flink for data ingestion. Input is a Kafka topic with strings - paths to incoming archive files. The job is unpacking the archives, reads data in them, parses and stores data in another format. Everything works fine if the topic is empty at the beginning of execution and th

Re: Flink does not read from some Kafka Partitions

2018-05-16 Thread Ruby Andrews
Thank you both for your responses. Looks like I may have inadvertently used 1.3.1 libraries instead of 1.4. Ruby On Wed, May 16, 2018 at 3:12 AM Tzu-Li (Gordon) Tai wrote: > Hi, > > Timo is correct - partition discovery is supported by the consumer only > starting from Flink 1.4. > > The expect

Re: Akka heartbeat configurations

2018-05-16 Thread Bajaj, Abhinav
I had the same feeling. Thanks Timo for clarifying. ~ Abhinav From: Timo Walther Date: Tuesday, May 15, 2018 at 6:05 AM To: "user@flink.apache.org" Subject: Re: Akka heartbeat configurations Hi, increasing the time to detect a dead task manager usually increases the amount of elements that

How to sleep for 1 sec and then call keyBy for partitioning

2018-05-16 Thread Vijay Balakrishnan
Hi, Newbie question - What I am trying to do is the following: CameraWithCubeSource source sends data-containing tuples of (cameraNbr,TS). 1. Need to partition data by cameraNbr. *2. Then sleep for 1 sec to simulate a heavy process in the task.* *3. Then need to partition data by TS and finally get

Re: How to sleep for 1 sec and then call keyBy for partitioning

2018-05-16 Thread Jörn Franke
Just some advice - do not use sleep to simulate a heavy task. Use real data or generated data to simulate. This sleep is garbage from a software quality point of view. Furthermore, it is often forgotten etc. > On 16. May 2018, at 22:32, Vijay Balakrishnan wrote: > > Hi, > Newbie question - Wha

Re: DFS problem with removing checkpoint

2018-05-16 Thread Szymon Szczypiński
Hi, now i know why those files wasn't "remove". They remove but very slow. In my case(Flink 1.3) the problem is in line client.delete().inBackground(backgroundCallback, executor).forPath(path); where deletion is in background in executor pool where size is equal to 2. When i have more files/d

Re: How to sleep for 1 sec and then call keyBy for partitioning

2018-05-16 Thread Vijay Balakrishnan
Hi, This worked out after looking at https://stackoverflow.com/questions/44436401/some-puzzles-for-the-operator-parallelism-in-flink?rq=1 Why cannot I use setParallelism after keyBy-is it not an operator ? DataStream cameraWithCubeDataStream = env .addSource(new CameraWithCubeSource(camer

Re: Question about the behavior of TM when it lost the zookeeper client session in HA mode

2018-05-16 Thread Tony Wei
Hi Ufuk, Piotr Thanks for all of your replies. I knew that jobs are cancelled if the JM looses the connection to ZK, but JM didn't loose connection in my case. My job failed because of the exception from KafkaProducer. However, it happened before and after that exception that TM lost ZK connection