Hi Sebastian, I agree, shuffling only specific elements would be a very useful feature, but unfortunately it's not supported (yet). Would you like to open a JIRA for that?
Cheers, Fabian 2015-06-09 17:22 GMT+02:00 Kruse, Sebastian <sebastian.kr...@hpi.de>: > Hi folks, > > > > I would like to do some load balancing within one of my Flink jobs to > achieve good scalability. The rebalance() method is not applicable in my > case, as the runtime is dominated by the processing of very few larger > elements in my dataset. Hence, I need to distribute the processing work for > these elements among the nodes in the cluster. To do so, I subdivide those > elements into partial tasks and want to distribute these partial tasks to > other nodes by employing a custom partitioner. > > > > Now, my question is the following: Actually, I do not need to shuffle the > complete dataset but only a few elements. So is there a way of telling > within the partitioner, that data should reside on the same task manager? > Thanks! > > > > Cheers, > > Sebastian >