Also note that if I were to start 2 pipelines

1. Working off the head of the topic and thus not prone to the pathological
case described above
2. Doing a replay and thus prone to the  pathological case described above

Than the 2nd pipe will stall the 1st pipeline. This seems to to point to

   - All channels multiplexed into the same TCP connection stall together,
   as soon as one channel has backpressure.


of the jira issue. This has to be a priority IMHO, in a shared VM where
jobs should have at least some isolation.

On Tue, Jan 2, 2018 at 2:19 PM, Vishal Santoshi <vishal.santo...@gmail.com>
wrote:

> Thank you.
>
> On Tue, Jan 2, 2018 at 1:31 PM, Nico Kruber <n...@data-artisans.com>
> wrote:
>
>> Hi Vishal,
>> let me already point you towards the JIRA issue for the credit-based
>> flow control: https://issues.apache.org/jira/browse/FLINK-7282
>>
>> I'll have a look at the rest of this email thread tomorrow...
>>
>>
>> Regards,
>> Nico
>>
>> On 02/01/18 17:52, Vishal Santoshi wrote:
>> > Could you please point me to any documentation on the  "credit-based
>> > flow control" approach....
>> >
>> > On Tue, Jan 2, 2018 at 10:35 AM, Timo Walther <twal...@apache.org
>> > <mailto:twal...@apache.org>> wrote:
>> >
>> >     Hi Vishal,
>> >
>> >     your assumptions sound reasonable to me. The community is currently
>> >     working on a more fine-grained back pressuring with credit-based
>> >     flow control. It is on the roamap for 1.5 [1]/[2]. I will loop in
>> >     Nico that might tell you more about the details. Until then I guess
>> >     you have to implement a custom source/adapt an existing source to
>> >     let the data flow in more realistic.
>> >
>> >     Regards,
>> >     Timo
>> >
>> >     [1]
>> >     http://flink.apache.org/news/2017/11/22/release-1.4-and-1.5
>> -timeline.html
>> >     <http://flink.apache.org/news/2017/11/22/release-1.4-
>> and-1.5-timeline.html>
>> >     [2] https://www.youtube.com/watch?v=scStdhz9FHc
>> >     <https://www.youtube.com/watch?v=scStdhz9FHc>
>> >
>> >
>> >     Am 1/2/18 um 4:02 PM schrieb Vishal Santoshi:
>> >
>> >         I did a simulation on session windows ( in 2 modes ) and let it
>> >         rip for about 12 hours
>> >
>> >         1. Replay where a kafka topic with retention of 7 days was the
>> >         source ( earliest )
>> >         2. Start the pipe with kafka source ( latest )
>> >
>> >         I saw results that differed dramatically.
>> >
>> >         On replay the pipeline stalled after  good ramp up while in the
>> >         second case the pipeline hummed on without issues. For the same
>> >         time period the data consumed is significantly more in the
>> >         second case with the WM progression stalled in the first case
>> >         with no hint of resolution ( the incoming data on source topic
>> >         far outstrips the WM progression )  I think I know the reasons
>> >         and this is my hypothesis.
>> >
>> >         In replay mode the number of windows open do not have an upper
>> >         bound. While buffer exhaustion ( and data in flight with
>> >         watermark )  is the reason for throttle, it does not really
>> >         limit the open windows and in fact creates windows that reflect
>> >         futuristic data ( future is relative to the current WM ) . So if
>> >         partition x has data for watermark time t(x) and partition y for
>> >         watermark time t(y) and t(x) << t(y) where the overall watermark
>> >         is t(x) nothing significantly throttles consumption from the y
>> >         partition ( in fact for x too ) , the bounded buffer based
>> >         approach does not give minute control AFAIK as one would hope
>> >         and that implies there are far more open windows than the system
>> >         can handle and that leads to the pathological case where the
>> >         buffers fill up  ( I believe that happens way late ) and
>> >         throttling occurs but the WM does not proceed and windows that
>> >         could ease the glut the throttling cannot proceed..... In the
>> >         replay mode the amount of data implies that the Fetchers keep
>> >         pulling data at the maximum consumption allowed by the open
>> >         ended buffer approach.
>> >
>> >         My question thus is, is there any way to have a finer control of
>> >         back pressure, where in the consumption from a source is
>> >         throttled preemptively ( by for example decreasing the buffers
>> >         associated for a pipe or the size allocated ) or sleeps in the
>> >         Fetcher code that can help aligning the performance to have real
>> >         time consumption  characteristics
>> >
>> >         Regards,
>> >
>> >         Vishal.
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>

Reply via email to