Hi Tao,

there's no built-in mechanism for this in Flink.
Throttling streams and creating back pressure is not a good idea in general
because it prevents checkpoint barriers (which must be aligned with the
events) to arrive a the operators.
This might cause checkpoints to time out.

The only way to prevent this is to not emit records from a source.
Once a records is emitted, it should be processed if possible.

Best, Fabian

2018-04-30 23:15 GMT+02:00 Tao Xia <[email protected]>:

> I am running into a problem when processing the past 7 days of data from
> multiple streams.  I am trying to union the streams based on event
> timestamp.
>
> The problem is that there are streams are significant big than other
> streams. For example if one stream has 1,000 event/sec and the other stream
> has 1,000,000 event/sec.
>
> I am using a PrirotyQueue to sort the event based on event timestamp.
> Since the fast(smaller) streams watermarks moves much faster than the
> slow(bigger) streams, there are lots of events  from the faster streams
> ended up in the Queue waiting for the slower stream to catch up and
> eventually ran out of memory.
>
> Is there anyway we can send back pressure on the fast streams so they can
> slow down?  or somehow to coordinate the watermarks between all the streams?
>
> I am planning to use an external storage to tracking the low watermarks
> between all the streams. so we don't read the event we cannot handle into
> the PriorityQueue.
>
> Any better suggestions?
>
> Thanks,
> Tao
>

Reply via email to