Hi Till, Till: Could you please give me a bit more context? Are you asking how Flink realizes exactly once processing guarantees with its connectors? Thank you very much for your response! Flink has a lot of really cool ideas :) I did read more about connectors and I think I can elaborate. The problem I have is about backpressure and failures - all coming from sinks/external apis. (Don't worry we're doing retries and backoff but still there's more to try :) What I'd like to achieve is: Leave backedup events in *cheap durable storage*, not memory. *Decouple* topics with degraded sinks from others *Prioritize* events within one topic when an API is in a degraded state. *Throttle* sink API writes to match defacto sink bandwith My pipeline looks like: Kafka with topics per API:Topic 1: AWSTopic 2: AzureTopic 3: GCPTopic 4: IBM... (more topcs)read Kafka -> (key by API in [AWS, Azure, GCP, IBM, Oracle, etc]) -> batch and write to the API in the key. I'm reading in api scoped topics from kafka, keying by api and then attempting to write to the api in the key. I know from time to time that some of these APIs will have limited capacity. For example, we're writing to AWS in a shared account that has API limits that other users in the account can exhaust, limiting our capacity to write for unpredictable periods. I'd really like to get your feedback on the approaches I'm considering and any suggestions you might have. Handling failed writes Approach 1. Block If I just block in the API write, then backpressure will build up and get to the kafka reader which will then not read in new records, leaving them in kafka? Is this a reasonable approach? What about the filled buffers inside the TaskManagers? Should I worry about them? Will backpressure from one topic and one api delay the others? Kafka will let writers keep appending, and we would catch up when the API is not degraded. Approach 2. Deadletter I'm considering writing the event to a deadletter queue, but then whoever processes the deadletter queue has the same issue, since the API can be degraded at any time. We don't want to manually start deadletter processing, since degraded APIs happen very regularly. Approach 3. Re-enqueue I'm considering writing the task back to the same input queue topic with metadata that it's been attempted. Is it possible to delay emitting records from one topic because it has a degraded sink, but continue to process others? I don't think I can prioritize events within one topic - the AWS topic for instance. I could create topics with prioritization, AWS_P0, AWS_P1, etc. Throttling I'm thinking just keep some state for the API by the API key, about the number of requests and the proportion that are accepted. (Something like https://landing.google.com/sre/sre-book/chapters/handling-overload/#client-side-throttling-a7sYUg) If I decide that I should throttle this event, then I need to apply whatever backpressure mechanism I decided on.
-- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
