Sounds about right. On commit, both stores would be flushed to local disk and the producer would be flushed to ensure all write to the changelog topics are done. Only afterwards, the input topic offsets would be committed.
Because flushing happens one after each other for the store (and in no particular order), it would happen that one flush success while an error occurs before the second one succeeds. Similarly, the producer might have written to the changelog topic successfully or not at this point. Of course, the input data would be reprocessed after restart and hence, if your store updates are idempotent, you might be fine with "at_least_once" guaranteed. If your store updates are not idempotent, using "exactly_once" would ensure that "partial/dirty" write to the stores are rolled back, before the input data is reprocessed. -Matthias On 10/16/19 4:16 PM, Alex Leung (BLOOMBERG/ SAN FRAN) wrote: > We're using the Kafka Streams processor API and directly performing get() and > put() on two different state stores (KeyValueStore) from a single processor. > If two puts are performed from one processor, e.g.: > > 1. store1.put(...) > 2. store2.put(...) > > my understanding is that if processing.guarantee="at_least_once", it is > possible that the commit from 1. succeeds but the commit from 2. fails (or > vice versa). And if we need to guarantee that either both or neither succeed, > we need to enable processing.guarantee="exactly_once". I believe the relevant > code is here: > https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L547 > > Is my understanding correct? > > Thanks, > Alex >
signature.asc
Description: OpenPGP digital signature