Re: Kafka Streams processing.guarantee behavior

Matthias J. Sax Tue, 22 Oct 2019 23:22:38 -0700

Sounds about right.

On commit, both stores would be flushed to local disk and the producer
would be flushed to ensure all write to the changelog topics are done.
Only afterwards, the input topic offsets would be committed.

Because flushing happens one after each other for the store (and in no
particular order), it would happen that one flush success while an error
occurs before the second one succeeds. Similarly, the producer might
have written to the changelog topic successfully or not at this point.

Of course, the input data would be reprocessed after restart and hence,
if your store updates are idempotent, you might be fine with
"at_least_once" guaranteed. If your store updates are not idempotent,
using "exactly_once" would ensure that "partial/dirty" write to the
stores are rolled back, before the input data is reprocessed.

-Matthias

On 10/16/19 4:16 PM, Alex Leung (BLOOMBERG/ SAN FRAN) wrote:
> We're using the Kafka Streams processor API and directly performing get() and 
> put() on two different state stores (KeyValueStore) from a single processor. 
> If two puts are performed from one processor, e.g.:
> 
> 1. store1.put(...)
> 2. store2.put(...)
> 
> my understanding is that if processing.guarantee="at_least_once", it is 
> possible that the commit from 1. succeeds but the commit from 2. fails (or 
> vice versa). And if we need to guarantee that either both or neither succeed, 
> we need to enable processing.guarantee="exactly_once". I believe the relevant 
> code is here: 
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L547
> 
> Is my understanding correct?
> 
> Thanks,
> Alex
>

signature.asc
Description: OpenPGP digital signature

Re: Kafka Streams processing.guarantee behavior

Reply via email to