Can I ask why some operations run only one slot? I understand that file
writes should happen only one one slot but GroupByKey operation could be
distributed across all slots. I am having around 20k distinct keys every
minute. Is there any way to break this operator chain?

I noticed that CombinePerKey operations that don't have IO related
transformation are scheduled across all 32 slots.


My cluster has 32 slots across 2 task managers. Running Beam 2.2. and Flink
1.3.2

2018-01-18, 13:56:28 2018-01-18, 14:37:14 40m 45s GroupByKey ->
ParMultiDo(WriteShardedBundles) -> ParMultiDo(Anonymous) ->
xxx.pipeline.output.io.file.WriteWindowToFile-SumPlaybackBitrateResult2/TextIO.Write/WriteFiles/Reshuffle/Window.Into()/Window.Assign.out
-> ParMultiDo(ReifyValueTimestamp) -> ToKeyedWorkItem 149 MB 333,672 70.8 MB
19 32
00320000
RUNNING

Start TimeEnd TimeDurationBytes receivedRecords receivedBytes sentRecords
sentAttemptHostStatus
2018-01-18, 13:56:28 40m 45s 2.30 MB 0 2.21 MB 0 1 xxx RUNNING
2018-01-18, 13:56:28 40m 45s 2.30 MB 0 2.21 MB 0 1 xxx RUNNING
2018-01-18, 13:56:28 40m 45s 2.30 MB 0 2.21 MB 0 1 xxx RUNNING
2018-01-18, 13:56:28 40m 45s 77.5 MB 333,683 2.21 MB 20 1 xxx RUNNING
2018-01-18, 13:56:28 40m 45s 2.30 MB 0 2.21 MB 0 1 xxx RUNNING
2018-01-18, 13:56:28 40m 45s 2.30 MB 0 2.21 MB 0 1 xxx RUNNING

Thanks,
Pawel

Reply via email to