Re: Kafka source, committing and retries

jackphelanbroadcom Tue, 04 Aug 2020 19:45:10 -0700

Hi Till,
Till: Could you please give me a bit more context? Are you asking how Flink
realizes exactly once processing guarantees with its connectors?
Thank you very much for your response! Flink has a lot of really cool ideas
:)
I did read more about connectors and I think I can elaborate. The problem I
have is about backpressure and failures - all coming from sinks/external
apis. (Don't worry we're doing retries and backoff but still there's more to
try :)
What I'd like to achieve is:
Leave backedup events in *cheap durable storage*, not memory.
*Decouple* topics with degraded sinks from others
*Prioritize* events within one topic when an API is in a degraded state.
*Throttle* sink API writes to match defacto sink bandwith
My pipeline looks like:
Kafka with topics per API:Topic 1: AWSTopic 2: AzureTopic 3: GCPTopic 4:
IBM... (more topcs)read Kafka -> (key by API in [AWS, Azure, GCP, IBM,
Oracle, etc]) -> batch and write to the API in the key.
I'm reading in api scoped topics from kafka, keying by api and then
attempting to write to the api in the key.
I know from time to time that some of these APIs will have limited capacity.
For example, we're writing to AWS in a shared account that has API limits
that other users in the account can exhaust, limiting our capacity to write
for unpredictable periods.
I'd really like to get your feedback on the approaches I'm considering and
any suggestions you might have.
Handling failed writes
Approach 1. Block
If I just block in the API write, then backpressure will build up and get to
the kafka reader which will then not read in new records, leaving them in
kafka?
Is this a reasonable approach? What about the filled buffers inside the
TaskManagers? Should I worry about them?
Will backpressure from one topic and one api delay the others?
Kafka will let writers keep appending, and we would catch up when the API is
not degraded.
Approach 2. Deadletter
I'm considering writing the event to a deadletter queue, but then whoever
processes the deadletter queue has the same issue, since the API can be
degraded at any time. We don't want to manually start deadletter processing,
since degraded APIs happen very regularly.
Approach 3. Re-enqueue
I'm considering writing the task back to the same input queue topic with
metadata that it's been attempted.
Is it possible to delay emitting records from one topic because it has a
degraded sink, but continue to process others?
I don't think I can prioritize events within one topic - the AWS topic for
instance. I could create topics with prioritization, AWS_P0, AWS_P1, etc.
Throttling
I'm thinking just keep some state for the API by the API key, about the
number of requests and the proportion that are accepted. (Something like
https://landing.google.com/sre/sre-book/chapters/handling-overload/#client-side-throttling-a7sYUg)
If I decide that I should throttle this event, then I need to apply whatever
backpressure mechanism I decided on.





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Kafka source, committing and retries

Reply via email to