Not as far as I'm aware. I'd be happy to contribute if there is a desire to have such feature. From experience with other projects, I know that without the initial pitch / discussion, it could be difficult to get such feature in. I can create a jira in the morning, no electricity again tonight :-/
Get Outlook for iOS On Tue, May 17, 2016 at 4:53 PM -0700, "Christian Posta" <christian.po...@gmail.com> wrote: +1 to your solution of log.cleanup.policy. Other brokers (ie, ActiveMQ) have a feature like that. Is there a JIRA for this? On Tue, May 17, 2016 at 4:48 PM, Radoslaw Gruchalski wrote: > I have described a cold storage solution for Kafka: > https://medium.com/@rad_g/the-case-for-kafka-cold-storage-32929d0a57b2#.kf0jf8cwv. > Also described it here a couple of times. Thd potential solution seems > rather straightforward. > Get Outlook for iOS > > _____________________________ > From: Luke Steensen > Sent: Tuesday, May 17, 2016 11:22 pm > Subject: Re: Kafka for event sourcing architecture > To: > > > It's harder in Kafka because the unit of replication is an entire > partition, not a single key/value pair. Partitions are large and constantly > growing, where key/value pairs are typically much smaller and don't change > in size. There would theoretically be no difference if you had one > partition per key, but that's not practical. Instead, you end up trying to > pick a number of partitions big enough that they'll each be a reasonable > size for the foreseeable future but not so big that the cluster overhead is > untenable. Even then the clock is ticking towards the day your biggest > partition approaches the limit of storage available on a single machine. > > It's frustrating because, as you say, there would be enormous benefits to > being able to access all data through the same system. Unfortunately, it > seems too far away from Kafka's original use case to be practical. > > > On Tue, May 17, 2016 at 12:32 PM, Daniel Schierbeck < > da...@zendesk.com.invalid> wrote: > > > I'm not sure why Kafka at least in theory cannot be used for infinite > > retention – any replicated database system would need to have a new node > > ingest all the data from failed node from its replicas. Surely this is no > > different in S3 itself. Why is this harder to do in Kafka than in other > > systems? The benefit of having just a single message log system would be > > rather big. > > > > On Tue, May 17, 2016 at 4:44 AM Tom Crayford > wrote: > > > > > Hi Oli, > > > > > > Inline. > > > > > > On Tuesday, 17 May 2016, Olivier Lalonde wrote: > > > > > > > Hi all, > > > > > > > > I am considering adopting an "event sourcing" architecture for a > > system I > > > > am developing and Kafka seems like a good choice of store for events. > > > > > > > > For those who aren't aware, this architecture style consists in > storing > > > all > > > > state changes of the system as an ordered log of events and building > > > > derivative views as needed for easier querying (using a SQL database > > for > > > > example). Those views must be completely derived from the event log > > alone > > > > so that the log effectively becomes a "single source of truth". > > > > > > > > I was wondering if anyone else is using Kafka for that purpose and > more > > > > specifically: > > > > > > > > 1) Can Kafka store messages permanently? > > > > > > > > > No. Whilst you can tweak config and such to get a very long retention > > > period, this doesn't work well with Kafka at all. Keeping data around > > > forever has severe impacts on the operability of your cluster. For > > example, > > > if a machine fails, a replacement would have to catch up with vast > > > quantities of data from its replicas. Currently we (Heroku Kafka) > > restrict > > > our customers to a maximum of 14 days of retention, because of all the > > > operational headaches of more retention than that. Of course on your > own > > > cluster you *can* set it as high as you like, this is just an anecdotal > > > experience thing from a team that runs thousands of clusters - infinite > > > retention is an operational disaster waiting to happen. > > > > > > Whilst Kafka does have a replay mechanism, that should mostly be though > > of > > > as a mechanism for handling other system failures. E.g. If the database > > you > > > store indexed views is is down, Kafka's replay and retention mechanisms > > > mean you're not losing data whilst restoring the availability of that > > > database. > > > > > > What we typically suggest customers do when they ask about this use > case > > is > > > to use Kafka as a messaging system, but use e.g. S3 as the long term > > store. > > > Kafka can help with batching writes up to S3 (see e.g. Pintrest's secor > > > project), and act as a very high throughput, durable, replicated > > messaging > > > layer for communication. In this paradigm, when you want to replay, you > > do > > > so out of S3 until you've consumed the last offset there, then start > > > replaying out of and catching up with the small amount of remaining > data > > in > > > Kafka. Of course the replay logic there has to be hand rolled, as Kafka > > and > > > its clients have no knowledge of external stores. > > > > > > Another potential thing to look at is Kafka's compacted topic > mechanism. > > > With compacted topics, Kafka keeps the latest element for a given key, > > > making it act a little more like a database table. Note that you still > > have > > > to consume by offset here - there's no "get the value for key Y > > > operation". However, this assumes that your keyspace is still tractably > > > small, and that you're ok with keeping only the latest value. > Compaction > > > completely overrides time based retention, so you have to "delete" keys > > or > > > have a bounded keyspace if you want to retain operational sanity with > > > Kafka. I'd recommend reading the docs on compacted topics, they cover > the > > > use cases quite well. > > > > > > > > > > > > > > > > > 2) Let's say I throw away my derived view and want to re-build it > from > > > > scratch, is it possible to consume messages from a topic from its > very > > > > first message and once it has caught up, listen for new messages like > > it > > > > would normally do? > > > > > > > > > That's entirely possible, you can catch up from the first retained > > message > > > and then continue from there very easily. However, see above about > > infinite > > > retention. > > > > > > > > > > > > > > > > > 2) Does it support transactions? Let's say I want to push 3 messages > > > > atomically but the producer process crashes after sending only 2 > > > messages, > > > > is it possible to "rollback" the first 2 messages (e.g. "all or > > nothing" > > > > semantics)? > > > > > > > > > No. Kafka at the moment only supports "at least once" semantics, and > > there > > > are no cross broker transactions of any kind. Implementing such a thing > > > would likely have huge negative impacts on the current performance > > > characteristics of Kafka, which would be a issue for many users. > > > > > > > > > > > > > > 3) Does it support request/response style semantics or can they be > > > > simulated? My system's primary interface with the outside world is an > > > HTTP > > > > API so it would be nice if I could publish an event and wait for all > > the > > > > internal services which need to process the event to be "done" > > > > processing before returning a response. > > > > > > > > > > > > In theory that's possible - the producer can return the offset of the > > > message produced, and you could check the latest offset of each > consumer > > in > > > your web request handler. > > > > > > However, doing so is not going to work that well, unless you're ok with > > > your web requests taking on the order of seconds to tens of seconds to > > > fulfill. Kafka can do low latency messaging reasonably well, but > > > coordinating the offsets of many consumers would likely have a huge > > latency > > > impact. Writing the code for it and getting it handling failure > correctly > > > would likely be a lot of work (there's nothing in any of the client > > > libraries like this, because it is not a desirable or supported use > > case). > > > > > > Instead I'd like to query *why* you need those semantics? What's the > > issue > > > with just producing a message and telling the user HTTP 200 and later > > > consuming it. > > > > > > > > > > > > > > > > > PS: I'm a Node.js/Go developer so when possible please avoid Java > > centric > > > > terminology. > > > > > > > > > Please to note that the node and go clients are notably less mature > than > > > the JVM clients, and that running Kafka in production means knowing > > enough > > > about the JVM and Zookeeper to handle that. > > > > > > Thanks! > > > Tom Crayford > > > Heroku Kafka > > > > > > > > > > > Thanks! > > > > > > > > - Oli > > > > > > > > > > > -- > > > > - Oli > > > > > > > > Olivier Lalonde > > > > http://www.syskall.com > > > <-- connect with me! > > > > > > > > > > > > > > -- *Christian Posta* twitter: @christianposta http://www.christianposta.com/blog http://fabric8.io