Re: Reduce parallelism without network transfer.

2018-02-06 Thread Piotr Nowojski
Hi,

Rebalance is more safe default setting that protects against data skew. And 
even the smallest data skew can create a bottleneck much larger then the 
serialisation/network transfer cost. Especially if one changes the parallelism 
to a value that’s not a result of multiplication or division (like N down to 
N-1). And data skew can be arbitrarily large, while rebalance overhead compare 
to rescale is limited.

Piotrek


> On 6 Feb 2018, at 04:32, Kien Truong  wrote:
> 
> Thanks Piotr, it works.
> May I ask why default behavior when reducing parallelism is rebalance, and 
> not rescale ?
> 
> Regards,
> Kien
> 
> Sent from TypeApp 
> On Feb 5, 2018, at 15:28, Piotr Nowojski  > wrote:
> Hi,
> 
> It should work like this out of the box if you use rescale method:
> 
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html#physical-partitioning
>  
> 
> 
> If it will not work, please let us know.
> 
> Piotrek 
> 
>> On 3 Feb 2018, at 04:39, Kien Truong < duckientru...@gmail.com 
>> > wrote:
>> 
>> Hi, 
>> 
>> Assuming that I have a streaming job, using 30 task managers with 4 slot 
>> each. I want to change the parallelism of 1 operator from 120 to 30. Are 
>> there anyway so that each subtask of this operator get data from 4 upstream 
>> subtasks running in the same task manager, thus avoiding network completely 
>> ? 
>> 
>> Best regards, 
>> Kien 
>> 
>> Sent from TypeApp 



Re: Reduce parallelism without network transfer.

2018-02-05 Thread Kien Truong
Thanks Piotr, it works.
May I ask why default behavior when reducing parallelism is rebalance, and not 
rescale ?

Regards,
Kien

⁣Sent from TypeApp ​

On Feb 5, 2018, 15:28, at 15:28, Piotr Nowojski  wrote:
>Hi,
>
>It should work like this out of the box if you use rescale method:
>
>https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html#physical-partitioning
>
>
>If it will not work, please let us know.
>
>Piotrek
>
>> On 3 Feb 2018, at 04:39, Kien Truong  wrote:
>>
>> Hi,
>>
>> Assuming that I have a streaming job, using 30 task managers with 4
>slot each. I want to change the parallelism of 1 operator from 120 to
>30. Are there anyway so that each subtask of this operator get data
>from 4 upstream subtasks running in the same task manager, thus
>avoiding network completely ?
>>
>> Best regards,
>> Kien
>>
>> Sent from TypeApp 


Re: Reduce parallelism without network transfer.

2018-02-05 Thread Piotr Nowojski
Hi,

It should work like this out of the box if you use rescale method:

https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html#physical-partitioning
 


If it will not work, please let us know.

Piotrek

> On 3 Feb 2018, at 04:39, Kien Truong  wrote:
> 
> Hi,
> 
> Assuming that I have a streaming job, using 30 task managers with 4 slot 
> each. I want to change the parallelism of 1 operator from 120 to 30. Are 
> there anyway so that each subtask of this operator get data from 4 upstream 
> subtasks running in the same task manager, thus avoiding network completely ?
> 
> Best regards, 
> Kien
> 
> Sent from TypeApp 


Reduce parallelism without network transfer.

2018-02-02 Thread Kien Truong
Hi,

Assuming that I have a streaming job, using 30 task managers with 4 slot each. 
I want to change the parallelism of 1 operator from 120 to 30. Are there anyway 
so that each subtask of this operator get data from 4 upstream subtasks running 
in the same task manager, thus avoiding network completely ?

Best regards,
Kien

⁣Sent from TypeApp ​