Yes, you can think of it that way. Each Operator has parallel instances and
each parallel instance receives input from multiple channels (FIFO from
each) and produces output.

Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P,
13:02):

> Hi Gyula,
>
> Thanks a lot. That's enough for my case.
>
> I do really love Flink Streaming model compare to Spark Streaming.
>
> So is that true that i can think that Operator as an Actor model in this
> system ? Is that a right way to put it ?
>
>
>
> Cheers
>
> On Fri, Jul 3, 2015 at 5:29 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:
>
>> Hey,
>>
>> 1.
>> Yes, if you use partitionBy the same key will always go to the same
>> downstream operator instance.
>>
>> 2.
>> There is only partial ordering guarantee, meaning that data received from
>> one input is FIFO. This means that if the same key is coming from multiple
>> inputs than there is no ordering guarantee there, only inside one input.
>>
>> Gyula
>>
>> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P,
>> 11:51):
>>
>>> Hi Gyula,
>>>
>>> Thanks for your response.
>>>
>>> So if i use partitionBy then data point with the same will receive
>>> exactly by the same instance of operator ?
>>>
>>>
>>> Another question is if i execute reduce() operator on after partitionBy,
>>> will that reduce operator guarantee ordering within the same key ?
>>>
>>>
>>> Cheers
>>>
>>> On Fri, Jul 3, 2015 at 4:14 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:
>>>
>>>> Hey!
>>>>
>>>> Both groupBy and partitionBy will trigger a shuffle over the network
>>>> based on some key, assuring that elements with the same keys end up on the
>>>> same downstream processing operator.
>>>>
>>>> The difference between the two is that groupBy in addition to this
>>>> returns a GroupedDataStream which lets you execute some special operations,
>>>> such as key based rolling aggregates.
>>>>
>>>> PartitionBy is useful when you are using simple operators but still
>>>> want to control the messages received by parallel instances (in a mapper
>>>> for example).
>>>>
>>>> Cheers,
>>>> Gyula
>>>>
>>>> tambunanw <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P,
>>>> 10:32):
>>>>
>>>>> Hi All,
>>>>>
>>>>> I'm trying to digest what's the difference between this two. From my
>>>>> experience in Spark GroupBy will cause shuffling on the network. Is
>>>>> that the
>>>>> same case in Flink ?
>>>>>
>>>>> I've watch videos and read a couple docs about Flink that's actually
>>>>> Flink
>>>>> will compile the user code into it's own optimized graph structure so i
>>>>> think Flink engine will take care of this one ?
>>>>>
>>>>> From the docs for Partitioning
>>>>>
>>>>>
>>>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#partitioning
>>>>>
>>>>> Is that true that GroupBy is more advanced than PartitionBy ? Can
>>>>> someone
>>>>> elaborate ?
>>>>>
>>>>> I think this one is really confusing for me that come from Spark
>>>>> world. Any
>>>>> help would be really appreciated.
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Streaming-PartitionBy-vs-GroupBy-differences-tp1927.html
>>>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>>>> archive at Nabble.com.
>>>>>
>>>>
>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>

Reply via email to