Re: Flink Streaming : PartitionBy vs GroupBy differences

Gyula Fóra Fri, 03 Jul 2015 03:30:35 -0700

Hey,

1.
Yes, if you use partitionBy the same key will always go to the same
downstream operator instance.


2.
There is only partial ordering guarantee, meaning that data received from
one input is FIFO. This means that if the same key is coming from multiple
inputs than there is no ordering guarantee there, only inside one input.

Gyula

Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P,
11:51):

> Hi Gyula,
>
> Thanks for your response.
>
> So if i use partitionBy then data point with the same will receive exactly
> by the same instance of operator ?
>
>
> Another question is if i execute reduce() operator on after partitionBy,
> will that reduce operator guarantee ordering within the same key ?
>
>
> Cheers
>
> On Fri, Jul 3, 2015 at 4:14 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:
>
>> Hey!
>>
>> Both groupBy and partitionBy will trigger a shuffle over the network
>> based on some key, assuring that elements with the same keys end up on the
>> same downstream processing operator.
>>
>> The difference between the two is that groupBy in addition to this
>> returns a GroupedDataStream which lets you execute some special operations,
>> such as key based rolling aggregates.
>>
>> PartitionBy is useful when you are using simple operators but still want
>> to control the messages received by parallel instances (in a mapper for
>> example).
>>
>> Cheers,
>> Gyula
>>
>> tambunanw <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P,
>> 10:32):
>>
>>> Hi All,
>>>
>>> I'm trying to digest what's the difference between this two. From my
>>> experience in Spark GroupBy will cause shuffling on the network. Is that
>>> the
>>> same case in Flink ?
>>>
>>> I've watch videos and read a couple docs about Flink that's actually
>>> Flink
>>> will compile the user code into it's own optimized graph structure so i
>>> think Flink engine will take care of this one ?
>>>
>>> From the docs for Partitioning
>>>
>>>
>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#partitioning
>>>
>>> Is that true that GroupBy is more advanced than PartitionBy ? Can someone
>>> elaborate ?
>>>
>>> I think this one is really confusing for me that come from Spark world.
>>> Any
>>> help would be really appreciated.
>>>
>>> Cheers
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Streaming-PartitionBy-vs-GroupBy-differences-tp1927.html
>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>> archive at Nabble.com.
>>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>

Re: Flink Streaming : PartitionBy vs GroupBy differences

Reply via email to