Hi,
Couple of comments on this.
What you're proposing is difficult to do at scale and would require some
type of Paxos style algorithm for the update only if different - it would
be easier in that case to just go ahead and do the update.
Also, it seems like a conflation of concerns - in an event sourcing model,
we save the immutable event, and represent current state in another,
separate data structure. Perhaps cassandra would work well here - that
data model night provide what you're looking for out of the box.
Just as I don't recommend people use data stores as queuing mechanisms, I
also recommend not using a queuing mechanism as a primary datastore - mixed
semantics.
--
*Colin*
+1 612 859-6129
On Mon, Jan 5, 2015 at 4:47 AM, Daniel Schierbeck
daniel.schierb...@gmail.com wrote:
I'm trying to design a system that uses Kafka as its primary data store by
persisting immutable events into a topic and keeping a secondary index in
another data store. The secondary index would store the entities. Each
event would pertain to some entity, e.g. a user, and those entities are
stored in an easily queriable way.
Kafka seems well suited for this, but there's one thing I'm having problems
with. I cannot guarantee that only one process writes events about an
entity, which makes the design vulnerable to integrity issues.
For example, say that a user can have multiple email addresses assigned,
and the EmailAddressRemoved event is published when the user removes one.
There's an integrity constraint, though: every user MUST have at least one
email address. As far as I can see, there's no way to stop two separate
processes from looking up a user entity, seeing that there are two email
addresses assigned, and each publish an event. The end result would violate
the contraint.
If I'm wrong in saying that this isn't possible I'd love some feedback!
My current thinking is that Kafka could relatively easily support this kind
of application with a small additional API. Kafka already has the abstract
notion of entities through its key-based retention policy. If the produce
API was modified in order to allow an integer OffsetConstraint, the
following algorithm could determine whether the request should proceed:
1. For every key seen, keep track of the offset of the latest message
referencing the key.
2. When an OffsetContraint is specified in the produce API call, compare
that value with the latest offset for the message key.
2.1. If they're identical, allow the operation to continue.
2.2. If they're not identical, fail with some OptimisticLockingFailure.
Would such a feature be completely out of scope for Kafka?
Best regards,
Daniel Schierbeck