Kafka transactions allow aborted reads, lost writes, and torn transactions

Kyle Kingsbury Tue, 12 Nov 2024 08:07:20 -0800

Hello all,

I've spent the last few months testing Bufstream, a Kafka-compatiblesystem. In the course of that research, we discovered that the Kafkatransaction protocol allows aborted reads, lost writes, and torntransactions:


https://jepsen.io/analyses/bufstream-0.1.0

In short, the protocol assumes that message delivery is ordered, butsends messages over different TCP connections, to different nodes, withautomatic retries. When network or node hiccups (e.g. garbagecollection) delay delivery of a commit or abort message, that messagecan commit or abort a different, later transaction. Committedtransactions can actually be lost. Aborted transactions can actuallysucceed. Transactions can be torn into parts: some of their effectscommitted, others lost.

We've reproduced these problems in both Bufstream and Kafka itself, andwe believe every Kafka-compatible system is most likely susceptible.KIP-890 may help. Client maintainers may also be able to defend againstthis problem by re-initializing producers on indefinite errors, like RPCtimeouts.


https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235834631#KIP890:TransactionsServerSideDefense-BumpEpochonEachTransactionforNewClients(1).

Yours truly,

--Kyle

Kafka transactions allow aborted reads, lost writes, and torn transactions

Reply via email to