Hi Tao, there's no built-in mechanism for this in Flink. Throttling streams and creating back pressure is not a good idea in general because it prevents checkpoint barriers (which must be aligned with the events) to arrive a the operators. This might cause checkpoints to time out.
The only way to prevent this is to not emit records from a source. Once a records is emitted, it should be processed if possible. Best, Fabian 2018-04-30 23:15 GMT+02:00 Tao Xia <[email protected]>: > I am running into a problem when processing the past 7 days of data from > multiple streams. I am trying to union the streams based on event > timestamp. > > The problem is that there are streams are significant big than other > streams. For example if one stream has 1,000 event/sec and the other stream > has 1,000,000 event/sec. > > I am using a PrirotyQueue to sort the event based on event timestamp. > Since the fast(smaller) streams watermarks moves much faster than the > slow(bigger) streams, there are lots of events from the faster streams > ended up in the Queue waiting for the slower stream to catch up and > eventually ran out of memory. > > Is there anyway we can send back pressure on the fast streams so they can > slow down? or somehow to coordinate the watermarks between all the streams? > > I am planning to use an external storage to tracking the low watermarks > between all the streams. so we don't read the event we cannot handle into > the PriorityQueue. > > Any better suggestions? > > Thanks, > Tao >
