Hi Akshay, Could you be more specific on what you are trying to achieve with this scheme?
I am asking because if your source is too fast and you want it to slow it down so that it produces data at the same rate as your process function can consume them, then Flink's backpressure will eventually do this. If you want your process function to discard incoming elements (and not queue them) if it is in the process of processing another element, then this implies a multithreaded process function and I would look maybe towards the AsyncIO [1] pattern with the AsyncFunction somehow setting a flag as busy while processing and as false when it is done and ready to process the next element. Also, in order to help, I would need more information about the stream being keyed or non-keyed and the parallelism of the source compared to that of the process function. I hope this helps, Kostas [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html On Wed, Feb 12, 2020 at 3:34 AM Akshay Shinde <akshay.shi...@oracle.com> wrote: > > Hi Community > > In our Flink job, in source we are creating our own stream to process n > number of objects per 2 minutes. And in process function for each object from > generated source stream we are doing some operation which we expect to get > finished in 2 minutes. > > Every 2 minutes we are generating same ’N’ objects in stream which process > function will process. But in some cases process function is taking longer > time around 10 minutes. In this case stream will have 5 number of sets for > ’N’ objects as process function is waiting for 10 minutes as source is adding > ’N’ objects in stream at every 2 minutes. Problem is we don’t want to process > these objects 5 times, we want it to process only once for the latest ’N’ > objects. > > This lag can be more or less from process function which results in lag from > source to process function in job execution. > > > Thanks in advance !!!