Hi Rui,

such a situation can occur if you have data skew in your data set
(differently sized partitions if you key by some key). Assume you have 2
TMs with 2 slots each and you key your data by some key x. The partition
assignment could look like:

TM1: slot_1 = Partition_1, slot_2 = Partition_2
TM2: slot_1 = Partition_3, slot_2 = Partition_4

Now assume that partition_1 and partition_3 are ten times bigger than
partition_2 and partition_4. From a TM perspective both TMs would process
the same amount of data.

If you now start 4 TMs with a single slot each you could get the following
assignment:

TM1: slot_1 = Partition_1
TM2: slot_1 = Partition_2
TM3: slot_3 = Partition_3
TM4: slot_4 = Partition_4

Now from a TM perspective, TM1 and TM3 would process ten times more data
than TM2 and TM4.

Does this make sense? What you could check is whether you can detect such a
data skew in your input data (e.g. by counting the occurrences of items
with a specific key).

Cheers,
Till

On Wed, Jan 2, 2019 at 6:13 AM varuy322 <rui2.w...@intel.com> wrote:

> Hi, there
>
> Recently I run streaming benchmark with flink 1.5.2 standalone on the
> cluster with 4 machines(1 as master and others as workers), it appears
> different result as below:
> (1). when I set the parallelism with 96, source, sink and middle operator
> parallelism all set to 96, start 3 taskmanager and each taskmanager slot is
> 32, all goes well.
> (2). when I change (1) to start 6 taskmanager, here 2 taskmanger on each
> work and each taskmanager slot is 16. all goes well too. At this situation,
> I find the subtask on each work processed same data size, but one worker
> processed times than other worker, it seems data skew occur. How could this
> happen?
>
> Someone could explain to me that when set same parallelism, the performance
> between multi taskmanager each worker with slots and one taskmanager with
> more slots?
> Thanks a lot!
>
> Best Regards
> Rui
>
>
>
> -----
> stay hungry, stay foolish.
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply via email to