Re: Controlling group partitioning with DataStream

Ken Krugler Tue, 08 Mar 2022 16:44:53 -0800

Hi Dario,

Just to close the loop on this, I answered my own question on SO.


Unfortunately it seems like the recommended solution is to do the same hack I 
did a while ago, which is to generate (via trial-and-error) a key that gets 
assigned to the target slot.

I was hoping for something a bit more elegant :)

I think it’s likely I could make it work by implementing my own version of 
KeyGroupStreamPartitioner, but as I’d noted in my SO question, that would 
involve use of some internal-only classes, so maybe not a win.

— Ken


> On Mar 4, 2022, at 3:14 PM, Dario Heinisch <dario.heini...@gmail.com> wrote:
> 
> Hi, 
> 
> I think you are looking for this answer from David: 
> https://stackoverflow.com/questions/69799181/flink-streaming-do-the-events-get-distributed-to-each-task-slots-separately-acc
>  
> <https://stackoverflow.com/questions/69799181/flink-streaming-do-the-events-get-distributed-to-each-task-slots-separately-acc>
> I think then you could technically create your partitioner - though little 
> bit cubersome - by mapping your existing keys to new keys who will have then 
> an output to the desired
> group & slot. 
> 
> Hope this may help, 
> 
> Dario
> 
> On 04.03.22 23:54, Ken Krugler wrote:
>> Hi all,
>> 
>> I need to be able to control which slot a keyBy group goes to, in order to 
>> compensate for a badly skewed dataset.
>> 
>> Any recommended approach to use here?
>> 
>> Previously (with a DataSet) I used groupBy followed by a withPartitioner, 
>> and provided my own custom partitioner.
>> 
>> I posted this same question to 
>> https://stackoverflow.com/questions/71357833/equivalent-of-dataset-groupby-withpartitioner-for-datastream
>>  
>> <https://stackoverflow.com/questions/71357833/equivalent-of-dataset-groupby-withpartitioner-for-datastream>
>> 
>> Thanks,
>> 
>> — Ken

--------------------------
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch

Re: Controlling group partitioning with DataStream

Reply via email to