Re: Count windows missing last elements?

2016-04-22 Thread Konstantin Kulagin
I was trying to implement this (force flink to handle all values from input) but had no success... Probably I am not getting smth with flink windowing mechanism I've created my 'finishing' trigger which is basically a copy of purging trigger But was not able to make it work:

Re: Count windows missing last elements?

2016-04-22 Thread Konstantin Kulagin
No problems at all, there is not much flink people and a lot of asking guys - it should be hard to understand each person's issues :) Yes, it is not as easy as 'contains' operator: I need to collect some amount of tuples in order to create a in-memory lucene index. After that I will filter

Re: Count windows missing last elements?

2016-04-21 Thread Kostya Kulagin
Thanks for reply. Maybe I would need some advise in this case. My situation: we have a stream of data, generally speaking tuples where long is a unique key (ie there are no tuples with the same key) I need to filter out all tuples that do not match certain lucene query. Creating

Re: Count windows missing last elements?

2016-04-21 Thread Aljoscha Krettek
Hi, if you are doing the windows not for their actual semantics I would suggest not using count based windows and also not using the *All windows. The *All windows are all non-parallel, i.e. you always only get one parallel instance of your window operator even if you have a huge cluster. Also,

Re: Count windows missing last elements?

2016-04-21 Thread Kostya Kulagin
Maybe if it is not the first time it worth considering adding this thing as an option? ;-) My usecase - I have a pretty big amount of data basically for ETL. It is finite but it is big. I see it more as a stream not as a dataset. Also I would re-use the same code for infinite stream later... And

Re: Count windows missing last elements?

2016-04-21 Thread Aljoscha Krettek
People have wondered about that a few times, yes. My opinion is that a stream is potentially infinite and processing only stops for anomalous reasons: when the job crashes, when stopping a job to later redeploy it. In those cases you would not want to flush out your data but keep them and restart

Count windows missing last elements?

2016-04-20 Thread Kostya Kulagin
I have a pretty big but final stream and I need to be able to window it by number of elements. In this case from my observations flink can 'skip' the latest chunk of data if it has lower amount of elements than window size: StreamExecutionEnvironment env =