Re: Multiple (non-consecutive) keyBy operators in a dataflow

Timo Walther Tue, 03 Apr 2018 03:27:08 -0700

Hi Andre,

every keyBy is a shuffle over the network and thus introduces someoverhead. Esp. serialization of records between operators if objectreuse is disabled by default. If you think that not all slots (and thusall nodes) are not fully occupied evenly in the first keyBy operation(e.g. if you key space is just 2 values) than it makes sense to have asecond keyBy to do the heavy computation on the more granular key tohave as much parallelism as possible. It really depends on your job.


I hope this helps.

Regards,
Timo


Am 03.04.18 um 03:22 schrieb 李玥:

Hello,
In my opinion , it would be meaningful only on this situation:
1. The total size of all your stats is huge enough, e.g. 1GB+.
2. Splitting you job to multiple KeyBy process would reduce the sizeof your stats.
Because operation of saving stats is synchronized and all workingthreads are blocked until the saving stats operation finished.Our team is trying to make the process of saving stats async, plzrefer to :http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Slow-flink-checkpoint-td18946.html
LiYue
http://tig.jd.com
liyue2...@gmail.com
在 2018年4月3日，上午8:30，au.fp2018 <au.fp2...@gmail.com<mailto:au.fp2...@gmail.com>> 写道：
Hello Flink Community,
I am relatively new to Flink. In the project I am currently workingon I'vea dataflow with a keyBy() operator, which I want to convert todataflow with
multiple keyBy() operators like this:


 Source -->
 KeyBy() -->
 Stateful process() function that generates a more granular key -->
 KeyBy(<id generated in the previous step>) -->
 More stateful computation(s) -->
 Sink

Are there any downsides to this approach?
My reasoning behind the second keyBy() is to reduce the amount ofstate and
hence improve the processing speed.

Thanks,
Andre




--
Sent from:http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Multiple (non-consecutive) keyBy operators in a dataflow

Reply via email to