It's harder in Kafka because the unit of replication is an entire partition, not a single key/value pair. Partitions are large and constantly growing, where key/value pairs are typically much smaller and don't change in size. There would theoretically be no difference if you had one partition per key, but that's not practical. Instead, you end up trying to pick a number of partitions big enough that they'll each be a reasonable size for the foreseeable future but not so big that the cluster overhead is untenable. Even then the clock is ticking towards the day your biggest partition approaches the limit of storage available on a single machine.
It's frustrating because, as you say, there would be enormous benefits to being able to access all data through the same system. Unfortunately, it seems too far away from Kafka's original use case to be practical. On Tue, May 17, 2016 at 12:32 PM, Daniel Schierbeck < da...@zendesk.com.invalid> wrote: > I'm not sure why Kafka at least in theory cannot be used for infinite > retention – any replicated database system would need to have a new node > ingest all the data from failed node from its replicas. Surely this is no > different in S3 itself. Why is this harder to do in Kafka than in other > systems? The benefit of having just a single message log system would be > rather big. > > On Tue, May 17, 2016 at 4:44 AM Tom Crayford <tcrayf...@heroku.com> wrote: > > > Hi Oli, > > > > Inline. > > > > On Tuesday, 17 May 2016, Olivier Lalonde <olalo...@gmail.com> wrote: > > > > > Hi all, > > > > > > I am considering adopting an "event sourcing" architecture for a > system I > > > am developing and Kafka seems like a good choice of store for events. > > > > > > For those who aren't aware, this architecture style consists in storing > > all > > > state changes of the system as an ordered log of events and building > > > derivative views as needed for easier querying (using a SQL database > for > > > example). Those views must be completely derived from the event log > alone > > > so that the log effectively becomes a "single source of truth". > > > > > > I was wondering if anyone else is using Kafka for that purpose and more > > > specifically: > > > > > > 1) Can Kafka store messages permanently? > > > > > > No. Whilst you can tweak config and such to get a very long retention > > period, this doesn't work well with Kafka at all. Keeping data around > > forever has severe impacts on the operability of your cluster. For > example, > > if a machine fails, a replacement would have to catch up with vast > > quantities of data from its replicas. Currently we (Heroku Kafka) > restrict > > our customers to a maximum of 14 days of retention, because of all the > > operational headaches of more retention than that. Of course on your own > > cluster you *can* set it as high as you like, this is just an anecdotal > > experience thing from a team that runs thousands of clusters - infinite > > retention is an operational disaster waiting to happen. > > > > Whilst Kafka does have a replay mechanism, that should mostly be though > of > > as a mechanism for handling other system failures. E.g. If the database > you > > store indexed views is is down, Kafka's replay and retention mechanisms > > mean you're not losing data whilst restoring the availability of that > > database. > > > > What we typically suggest customers do when they ask about this use case > is > > to use Kafka as a messaging system, but use e.g. S3 as the long term > store. > > Kafka can help with batching writes up to S3 (see e.g. Pintrest's secor > > project), and act as a very high throughput, durable, replicated > messaging > > layer for communication. In this paradigm, when you want to replay, you > do > > so out of S3 until you've consumed the last offset there, then start > > replaying out of and catching up with the small amount of remaining data > in > > Kafka. Of course the replay logic there has to be hand rolled, as Kafka > and > > its clients have no knowledge of external stores. > > > > Another potential thing to look at is Kafka's compacted topic mechanism. > > With compacted topics, Kafka keeps the latest element for a given key, > > making it act a little more like a database table. Note that you still > have > > to consume by offset here - there's no "get the value for key Y > > operation". However, this assumes that your keyspace is still tractably > > small, and that you're ok with keeping only the latest value. Compaction > > completely overrides time based retention, so you have to "delete" keys > or > > have a bounded keyspace if you want to retain operational sanity with > > Kafka. I'd recommend reading the docs on compacted topics, they cover the > > use cases quite well. > > > > > > > > > > > > 2) Let's say I throw away my derived view and want to re-build it from > > > scratch, is it possible to consume messages from a topic from its very > > > first message and once it has caught up, listen for new messages like > it > > > would normally do? > > > > > > That's entirely possible, you can catch up from the first retained > message > > and then continue from there very easily. However, see above about > infinite > > retention. > > > > > > > > > > > > 2) Does it support transactions? Let's say I want to push 3 messages > > > atomically but the producer process crashes after sending only 2 > > messages, > > > is it possible to "rollback" the first 2 messages (e.g. "all or > nothing" > > > semantics)? > > > > > > No. Kafka at the moment only supports "at least once" semantics, and > there > > are no cross broker transactions of any kind. Implementing such a thing > > would likely have huge negative impacts on the current performance > > characteristics of Kafka, which would be a issue for many users. > > > > > > > > > > 3) Does it support request/response style semantics or can they be > > > simulated? My system's primary interface with the outside world is an > > HTTP > > > API so it would be nice if I could publish an event and wait for all > the > > > internal services which need to process the event to be "done" > > > processing before returning a response. > > > > > > > > In theory that's possible - the producer can return the offset of the > > message produced, and you could check the latest offset of each consumer > in > > your web request handler. > > > > However, doing so is not going to work that well, unless you're ok with > > your web requests taking on the order of seconds to tens of seconds to > > fulfill. Kafka can do low latency messaging reasonably well, but > > coordinating the offsets of many consumers would likely have a huge > latency > > impact. Writing the code for it and getting it handling failure correctly > > would likely be a lot of work (there's nothing in any of the client > > libraries like this, because it is not a desirable or supported use > case). > > > > Instead I'd like to query *why* you need those semantics? What's the > issue > > with just producing a message and telling the user HTTP 200 and later > > consuming it. > > > > > > > > > > > > PS: I'm a Node.js/Go developer so when possible please avoid Java > centric > > > terminology. > > > > > > Please to note that the node and go clients are notably less mature than > > the JVM clients, and that running Kafka in production means knowing > enough > > about the JVM and Zookeeper to handle that. > > > > Thanks! > > Tom Crayford > > Heroku Kafka > > > > > > > > Thanks! > > > > > > - Oli > > > > > > > > -- > > > - Oli > > > > > > Olivier Lalonde > > > http://www.syskall.com > > <http://www.syskall.com> <-- connect with me! > > > > > >