Hello, We use Kafka to distribute batches work across several boxes. These batches may take anywhere from 1 hour to 24 hours to complete. Currently, we have N partitions, each allocated to one of N consumer worker boxes.
We find that as the batch nears completion, with only M < N partitions still containing work, our throughput suffers as now only M workers are doing the work. This problem is amplified if several of our N boxes are slower than the others, as our throughput is now tied to our slowest box. It would be great if workers that were done with their work could share the load. I am wondering if anyone has dealt with this issue. It doesn't seem that Kafka has a solution for this out of the box (though I could be wrong), but maybe others have clever solutions, best practices, or suggestions for other technologies to add to the mix. I am especially curious how the Kafka philosophy applies to this situation. Thanks, David