Re: Controlling group partitioning with DataStream

2022-03-18 Thread Ken Krugler
Hi Guowei, Thanks for following up on this, sorry I missed your email earlier. Unfortunately I don’t think auto-rebalancing will help my situation, because I have a small number of unique key values (low cardinality). And processing these groups (training one deep-learning model per group) req

Re: Controlling group partitioning with DataStream

2022-03-08 Thread Guowei Ma
Hi, Ken If you are talking about the Batch scene, there may be another idea that the engine automatically and evenly distributes the amount of data to be processed by each Stage to each worker node. This also means that, in some cases, the user does not need to manually define a Partitioner. At p

Re: Controlling group partitioning with DataStream

2022-03-08 Thread Ken Krugler
Hi Dario, Just to close the loop on this, I answered my own question on SO. Unfortunately it seems like the recommended solution is to do the same hack I did a while ago, which is to generate (via trial-and-error) a key that gets assigned to the target slot. I was hoping for something a bit mo

Re: Controlling group partitioning with DataStream

2022-03-04 Thread Dario Heinisch
Hi, I think you are looking for this answer from David: https://stackoverflow.com/questions/69799181/flink-streaming-do-the-events-get-distributed-to-each-task-slots-separately-acc I think then you could technically create your partitioner - though little bit cubersome - by mapping your exist