Re: Flink doesn't free YARN slots after restarting

2017-09-14 Thread Till Rohrmann
Sorry for my late answer Bowen, I think this only works if you implement your own WindowAssigner. With the built-in sliding window this is not possible since all windows have the same offset. Cheers, Till On Fri, Aug 25, 2017 at 9:44 AM, Bowen Li wrote: > Hi Till, >

Re: Flink doesn't free YARN slots after restarting

2017-08-25 Thread Bowen Li
Hi Till, What I mean is: can the sliding windows for different item have different start time? Here's an example of what we want: - for item A: its first event arrives at 2017/8/24-01:*12:24*, so the 1st window should be 2017/8/24-01:*12:24* - 2017/8/25-01:*12:23*, the 2nd window would be

Re: Flink doesn't free YARN slots after restarting

2017-08-25 Thread Till Rohrmann
Hi Bowen, having a sliding window of one day with a slide of one hour basically means that each window is closed after 24 hours and the next closing happens one hour later. Only when the window is closed/triggered, you compute the window function which generates the window output. That's why you

Re: Flink doesn't free YARN slots after restarting

2017-08-24 Thread Bowen Li
Hi Till, Thank you very much for looking into it! According to our investigation, this is indeed a Kinesis issue. Flink (FlinkKinesisProducer) uses KPL(Kinesis Producer Library), but hasn't tune it up yet. I have identified a bunch of issues, opened the following Flink tickets, and are working on

Re: Flink doesn't free YARN slots after restarting

2017-08-22 Thread Till Rohrmann
Hi Bowen, sorry for my late answer. I dug through some of the logs and it seems that you have the following problem: 1. Once in a while the Kinesis producer fails with a UserRecordFailedException saying “Expired while waiting in HttpClient queue Record has reached expiration”. This

Re: Flink doesn't free YARN slots after restarting

2017-08-10 Thread Bowen Li
Hi Till, Any idea why it happened? I've tried different configurations for configuring our Flink cluster, but the cluster always fails after 4 or 5 hours. According to the log, looks like the total number of slots becomes 0 at the end, and YarnClusterClient shuts down application master

Re: Flink doesn't free YARN slots after restarting

2017-08-09 Thread Bowen Li
Hi Till, Thanks for taking this issue. We are not comfortable sending logs to a email list which is this open. I'll send logs to you. Thanks, Bowen On Wed, Aug 9, 2017 at 2:46 AM, Till Rohrmann wrote: > Hi Bowen, > > if I'm not mistaken, then Flink's current

Re: Flink doesn't free YARN slots after restarting

2017-08-09 Thread Till Rohrmann
Hi Bowen, if I'm not mistaken, then Flink's current Yarn implementation does not actively releases containers. The `YarnFlinkResourceManager` is started with a fixed number of containers it always tries to acquire. If a container should die, then it will request a new one. In case of a failure

Flink doesn't free YARN slots after restarting

2017-08-09 Thread Bowen Li
Hi guys, I was running a Flink job (12 parallelism) on an EMR cluster with 48 YARN slots. When the job starts, I can see from Flink UI that the job took 12 slots, and 36 slots were left available. I would expect that when the job fails, it would restart from checkpointing by taking