Yes, you can think of it that way. Each Operator has parallel instances and each parallel instance receives input from multiple channels (FIFO from each) and produces output.
Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P, 13:02): > Hi Gyula, > > Thanks a lot. That's enough for my case. > > I do really love Flink Streaming model compare to Spark Streaming. > > So is that true that i can think that Operator as an Actor model in this > system ? Is that a right way to put it ? > > > > Cheers > > On Fri, Jul 3, 2015 at 5:29 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: > >> Hey, >> >> 1. >> Yes, if you use partitionBy the same key will always go to the same >> downstream operator instance. >> >> 2. >> There is only partial ordering guarantee, meaning that data received from >> one input is FIFO. This means that if the same key is coming from multiple >> inputs than there is no ordering guarantee there, only inside one input. >> >> Gyula >> >> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P, >> 11:51): >> >>> Hi Gyula, >>> >>> Thanks for your response. >>> >>> So if i use partitionBy then data point with the same will receive >>> exactly by the same instance of operator ? >>> >>> >>> Another question is if i execute reduce() operator on after partitionBy, >>> will that reduce operator guarantee ordering within the same key ? >>> >>> >>> Cheers >>> >>> On Fri, Jul 3, 2015 at 4:14 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: >>> >>>> Hey! >>>> >>>> Both groupBy and partitionBy will trigger a shuffle over the network >>>> based on some key, assuring that elements with the same keys end up on the >>>> same downstream processing operator. >>>> >>>> The difference between the two is that groupBy in addition to this >>>> returns a GroupedDataStream which lets you execute some special operations, >>>> such as key based rolling aggregates. >>>> >>>> PartitionBy is useful when you are using simple operators but still >>>> want to control the messages received by parallel instances (in a mapper >>>> for example). >>>> >>>> Cheers, >>>> Gyula >>>> >>>> tambunanw <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P, >>>> 10:32): >>>> >>>>> Hi All, >>>>> >>>>> I'm trying to digest what's the difference between this two. From my >>>>> experience in Spark GroupBy will cause shuffling on the network. Is >>>>> that the >>>>> same case in Flink ? >>>>> >>>>> I've watch videos and read a couple docs about Flink that's actually >>>>> Flink >>>>> will compile the user code into it's own optimized graph structure so i >>>>> think Flink engine will take care of this one ? >>>>> >>>>> From the docs for Partitioning >>>>> >>>>> >>>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#partitioning >>>>> >>>>> Is that true that GroupBy is more advanced than PartitionBy ? Can >>>>> someone >>>>> elaborate ? >>>>> >>>>> I think this one is really confusing for me that come from Spark >>>>> world. Any >>>>> help would be really appreciated. >>>>> >>>>> Cheers >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Streaming-PartitionBy-vs-GroupBy-differences-tp1927.html >>>>> Sent from the Apache Flink User Mailing List archive. mailing list >>>>> archive at Nabble.com. >>>>> >>>> >>> >>> >>> -- >>> Welly Tambunan >>> Triplelands >>> >>> http://weltam.wordpress.com >>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>> >> > > > -- > Welly Tambunan > Triplelands > > http://weltam.wordpress.com > http://www.triplelands.com <http://www.triplelands.com/blog/> >