Hi, I’m glad to hear that you are interested in Flink! :)
> In the picture, keyBy window and apply operators share the same circle. Is > is because these operators are chaining together? It’s not as much about chaining, as the chain of DataStream API invocations `someStream.keyBy(…).window(…).apply(…)` creates a single logical operations - each one of them on it’s own doesn’t make sense, but together they define how a single `WindowOperator` should behave (`keyBy` additionally defines how records should be shuffle). Chaining happens _usually_ between “shuffling” boundaries. So for example: someStream .map(…) // first chain … .filter(…) // … still first chain .keyBy(…) // operator chain boundary .window(…).apply(…) // beginning of 2nd chain .map(…) // 2nd chain .filter(…) // still 2nd chain … .keyBy(…) // operator chain boundary (…) > the repartition process happens both inside TaskManager and between > TaskManager. Inside TaskManger, the data transmit overload maybe just in > memory. Yes, exactly. Data transfer happens in-memory, however records are still being serialised and deserialised bot “local input channels” (that’s how we call communication between operators inside a single TaskManager). > Between TaskManager, the data transmit overload may inter-process or > inter-container, depending on how I deploy the Flink cluster. Is my > understanding right? Yes, between TaskManagers network sockets are always used, regardless if they are happening on one physical machine (localhost) or not. > These details may highly related to Actor model? As I have little knowledge > of Actor model. I’m not sure if I fully understand your questions. Flink is not using Actor model for the data pipelines. I hope that helps :) Piotrek > On 24 Nov 2019, at 07:35, Lu Weizheng <luweizhen...@hotmail.com> wrote: > > Hi all, > > I have been paying attention on Flink for about half a year and have read > official documents several times. I have already got a comprehensive > understanding of Flink distributed runtime architecture, but still have some > questions that need to be clarify. > > <PastedGraphic-2.png> > > On Flink documents website, this picture shows the dataflow model of Flink. > In the picture, keyBy window and apply operators share the same circle. Is is > because these operators are chaining together? > > <PastedGraphic-3.png> > > In the parallelized view, data stream is partition into multiple partitions. > Each partition is a subset of source data. Repartition happens when we use > keyBy operator. If these tasks share task slots and run like picture above, > the repartition process happens both inside TaskManager and between > TaskManager. Inside TaskManger, the data transmit overload maybe just in > memory. Between TaskManager, the data transmit overload may inter-process or > inter-container, depending on how I deploy the Flink cluster. Is my > understanding right? These details may highly related to Actor model? As I > have little knowledge of Actor model. > > This is my first time to use Flink maillist. Thank you so much if anyone can > explain it. > > Weizheng