[ https://issues.apache.org/jira/browse/KAFKA-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Radoslaw Gruchalski updated KAFKA-3726: --------------------------------------- Attachment: kafka-cold-storage.txt The text version of the mentioned article. > Enable cold storage option > -------------------------- > > Key: KAFKA-3726 > URL: https://issues.apache.org/jira/browse/KAFKA-3726 > Project: Kafka > Issue Type: Wish > Reporter: Radoslaw Gruchalski > Attachments: kafka-cold-storage.txt > > > This JIRA builds up on the cold storage article I have published on Medium. > The copy of the article attached here. > The need for cold storage or an "indefinite" log seems to be quite often > discussed on the user mailing list. > The cold storage idea would enable the opportunity for the operator to keep > the raw Kafka offset files in a third party storage and allow retrieving the > data back for re-consumption. > The two possible options for enabling such functionality are, from the > article: > First approach: if Kafka provided a notification mechanism and could trigger > a program when a segment file is to be discarded, it would become feasible to > provide a standard method of moving data to cold storage in reaction to those > events. Once the program finishes backing the segments up, it could tell > Kafka “it is now safe to delete these segments”. > The second option is to provide an additional value for the > log.cleanup.policy setting, call it cold-storage. In case of this value, > Kafka would move the segment files — which otherwise would be deleted — to > another destination on the server. They can be picked up from there and moved > to the cold storage. > Both have their limitations. The former one is simply a mechanism exposed to > allow operator building up the tooling necessary to enable this. Events could > be published in a manner similar to Mesos Event Bus > (https://mesosphere.github.io/marathon/docs/event-bus.html) or Kafka itself > could provide a control topic on which such info would be published. The > outcome is, the operator can subscribe to the event bus and get notified > about, at least, two events: > - log segment is complete and can be backed up > - partition leader changed > These two, together with an option to keep the log segment safe from > compaction for a certain amount of time, would be sufficient to reliably > implement cold storage. > The latter option, {{log.cleanup.policy}} setting would be more complete > feature but it is also much more difficult to implement. All brokers would > have keep the backup of the data in the cold storage significantly increasing > the size requirements, also, the de-duplication of the data for the > replicated data would be left completely to the operator. > In any case, the thing to stay away from is having Kafka to deal with the > physical aspect of moving the data to and back from the cold storage. This is > not Kafka's task. The intent is to provide a method for reliable cold storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)