On Tue, Jun 7, 2016 at 4:52 AM, Stephan Ewen <se...@apache.org> wrote:
> The concern you raised about the sink being synchronous is exactly what my > last suggestion should address: > > The internal state backends can return a handle that can do the sync in a > background thread. The sink would continue processing messages, and the > checkpoint would only be acknowledged after the background sync did > complete. > We should allow user code to return such a handle as well. > Sorry. Apparently I hadn't had enough coffee and completely missed the last paragraph of your response. The async solution you propose seems ideal. What message ordering guarantees are you worried about? I don't think you can do much about guaranteeing message ordering within Kafka in case of failure, and you'll replay some messages. And there isn't any guarantee if you are writing to a Kafka topic with multiple partitions from multiple sinks using a message key distinct from the key you used in a keyBy in Flink, as you'll be writing from multiple sink instances in parallel in what is essentially a shuffle. It would seem the only ordering guarantee is if you write from a sink into a Kafka topic using a message key that is the same as the key used in a keyBy in Flink, and even that will be violated during a failure and replay by the sink.