Also note that if I were to start 2 pipelines 1. Working off the head of the topic and thus not prone to the pathological case described above 2. Doing a replay and thus prone to the pathological case described above
Than the 2nd pipe will stall the 1st pipeline. This seems to to point to - All channels multiplexed into the same TCP connection stall together, as soon as one channel has backpressure. of the jira issue. This has to be a priority IMHO, in a shared VM where jobs should have at least some isolation. On Tue, Jan 2, 2018 at 2:19 PM, Vishal Santoshi <vishal.santo...@gmail.com> wrote: > Thank you. > > On Tue, Jan 2, 2018 at 1:31 PM, Nico Kruber <n...@data-artisans.com> > wrote: > >> Hi Vishal, >> let me already point you towards the JIRA issue for the credit-based >> flow control: https://issues.apache.org/jira/browse/FLINK-7282 >> >> I'll have a look at the rest of this email thread tomorrow... >> >> >> Regards, >> Nico >> >> On 02/01/18 17:52, Vishal Santoshi wrote: >> > Could you please point me to any documentation on the "credit-based >> > flow control" approach.... >> > >> > On Tue, Jan 2, 2018 at 10:35 AM, Timo Walther <twal...@apache.org >> > <mailto:twal...@apache.org>> wrote: >> > >> > Hi Vishal, >> > >> > your assumptions sound reasonable to me. The community is currently >> > working on a more fine-grained back pressuring with credit-based >> > flow control. It is on the roamap for 1.5 [1]/[2]. I will loop in >> > Nico that might tell you more about the details. Until then I guess >> > you have to implement a custom source/adapt an existing source to >> > let the data flow in more realistic. >> > >> > Regards, >> > Timo >> > >> > [1] >> > http://flink.apache.org/news/2017/11/22/release-1.4-and-1.5 >> -timeline.html >> > <http://flink.apache.org/news/2017/11/22/release-1.4- >> and-1.5-timeline.html> >> > [2] https://www.youtube.com/watch?v=scStdhz9FHc >> > <https://www.youtube.com/watch?v=scStdhz9FHc> >> > >> > >> > Am 1/2/18 um 4:02 PM schrieb Vishal Santoshi: >> > >> > I did a simulation on session windows ( in 2 modes ) and let it >> > rip for about 12 hours >> > >> > 1. Replay where a kafka topic with retention of 7 days was the >> > source ( earliest ) >> > 2. Start the pipe with kafka source ( latest ) >> > >> > I saw results that differed dramatically. >> > >> > On replay the pipeline stalled after good ramp up while in the >> > second case the pipeline hummed on without issues. For the same >> > time period the data consumed is significantly more in the >> > second case with the WM progression stalled in the first case >> > with no hint of resolution ( the incoming data on source topic >> > far outstrips the WM progression ) I think I know the reasons >> > and this is my hypothesis. >> > >> > In replay mode the number of windows open do not have an upper >> > bound. While buffer exhaustion ( and data in flight with >> > watermark ) is the reason for throttle, it does not really >> > limit the open windows and in fact creates windows that reflect >> > futuristic data ( future is relative to the current WM ) . So if >> > partition x has data for watermark time t(x) and partition y for >> > watermark time t(y) and t(x) << t(y) where the overall watermark >> > is t(x) nothing significantly throttles consumption from the y >> > partition ( in fact for x too ) , the bounded buffer based >> > approach does not give minute control AFAIK as one would hope >> > and that implies there are far more open windows than the system >> > can handle and that leads to the pathological case where the >> > buffers fill up ( I believe that happens way late ) and >> > throttling occurs but the WM does not proceed and windows that >> > could ease the glut the throttling cannot proceed..... In the >> > replay mode the amount of data implies that the Fetchers keep >> > pulling data at the maximum consumption allowed by the open >> > ended buffer approach. >> > >> > My question thus is, is there any way to have a finer control of >> > back pressure, where in the consumption from a source is >> > throttled preemptively ( by for example decreasing the buffers >> > associated for a pipe or the size allocated ) or sleeps in the >> > Fetcher code that can help aligning the performance to have real >> > time consumption characteristics >> > >> > Regards, >> > >> > Vishal. >> > >> > >> > >> > >> > >> > >> >> >