Re: Flink distributed runtime architucture

Piotr Nowojski Mon, 25 Nov 2019 05:56:00 -0800

Hi,

I’m glad to hear that you are interested in Flink! :)


>  In the picture, keyBy window and apply operators share the same circle. Is 
> is because these operators are chaining together? 

It’s not as much about chaining, as the chain of DataStream API invocations 
`someStream.keyBy(…).window(…).apply(…)` creates a single logical operations - 
each one of them on it’s own doesn’t make sense, but together they define how a 
single `WindowOperator` should behave (`keyBy` additionally defines how records 
should be shuffle).

Chaining happens _usually_ between “shuffling” boundaries. So for example:

someStream
        .map(…) // first chain …
        .filter(…) // … still first chain
        .keyBy(…) // operator chain boundary
        .window(…).apply(…) // beginning of 2nd chain
        .map(…) // 2nd chain
        .filter(…) // still 2nd chain …
        .keyBy(…) // operator chain boundary
        (…)

>  the repartition process happens both inside TaskManager and between 
> TaskManager. Inside TaskManger, the data transmit overload maybe just in 
> memory.

Yes, exactly. Data transfer happens in-memory, however records are still being 
serialised and deserialised bot “local input channels” (that’s how we call 
communication between operators inside a single TaskManager).

> Between TaskManager, the data transmit overload may inter-process or 
> inter-container, depending on how I deploy the Flink cluster. Is my 
> understanding right? 

Yes, between TaskManagers network sockets are always used, regardless if they 
are happening on one physical machine (localhost) or not.

> These details may highly related to Actor model? As I have little knowledge 
> of Actor model.

I’m not sure if I fully understand your questions. Flink is not using Actor 
model for the data pipelines.

I hope that helps :)

Piotrek

> On 24 Nov 2019, at 07:35, Lu Weizheng <luweizhen...@hotmail.com> wrote:
> 
> Hi all,
> 
> I have been paying attention on Flink for about half a year and have read 
> official documents several times. I have already got a comprehensive 
> understanding of Flink distributed runtime architecture, but still have some 
> questions that need to be clarify.
> 
> <PastedGraphic-2.png>
> 
> On Flink documents website, this picture shows the dataflow model of Flink. 
> In the picture, keyBy window and apply operators share the same circle. Is is 
> because these operators are chaining together? 
> 
> <PastedGraphic-3.png>
> 
> In the parallelized view, data stream is partition into multiple partitions. 
> Each partition is a subset of source data. Repartition happens when we use 
> keyBy operator. If these tasks share task slots and run like picture above, 
> the repartition process happens both inside TaskManager and between 
> TaskManager. Inside TaskManger, the data transmit overload maybe just in 
> memory. Between TaskManager, the data transmit overload may inter-process or 
> inter-container, depending on how I deploy the Flink cluster. Is my 
> understanding right? These details may highly related to Actor model? As I 
> have little knowledge of Actor model.
> 
> This is my first time to use Flink maillist. Thank you so much if anyone can 
> explain it.
> 
> Weizheng

Re: Flink distributed runtime architucture

Reply via email to