Side note: The Python classes I linked implement SDFs, which still don't work on Flink, hence they still need a shuffle despite being splittable. Would be interesting to see if it works on Dataflow (just the splitting, without the explicit shuffle), but I cannot test it myself. So if anyone wants to give it a try, I'd be interested in the results.

Janek

On 22/03/2022 16:19, Jack McCluskey wrote:
My gut instinct is that Cham's reshuffle suggestion is going to be the better solution in this case since the input bundle is rather small, but yes an SDF has some utility here if you wanted to go the extra mile and let the runner split up the bundle. I don't have enough SDF experience to say whether or not that extra effort would parallelize the reads on the runner side though.

Reply via email to