Indeed, the files would have to be moved to some separate, dedicated storage. There are basically 3 options, as kafka does not allow adding logs at runtime:
1. make the consumer able to read from an arbitrary file 2. add ability to drop files in (I believe this adds a lot of complexity) 3. read files with another program, as suggested in my first email I’d love to get some input from someone who knows the code and options a bit better! Kind regards, Radek Gruchalski ra...@gruchalski.com (mailto:ra...@gruchalski.com) (mailto:ra...@gruchalski.com) de.linkedin.com/in/radgruchalski/ (http://de.linkedin.com/in/radgruchalski/) Confidentiality: This communication is intended for the above-named person and may be confidential and/or legally privileged. If it has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender immediately. On Monday, 13 July 2015 at 18:02, Scott Thibault wrote: > Yes, consider my e-mail an up vote! > > I guess the files would automatically moved somewhere else to separate the > active from cold segments? Ideally, one could run an unmodified consumer > application on the cold segments. > > > --Scott > > > On Mon, Jul 13, 2015 at 6:57 AM, Rad Gruchalski <ra...@gruchalski.com > (mailto:ra...@gruchalski.com)> > wrote: > > > Scott, > > > > This is what I was trying to target in one of my previous responses to > > Daniel. The one in which I suggest another compaction setting for kafka. > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > Radek Gruchalski > > ra...@gruchalski.com (mailto:ra...@gruchalski.com) (mailto: > > ra...@gruchalski.com (mailto:ra...@gruchalski.com)) > > de.linkedin.com/in/radgruchalski/ > > (http://de.linkedin.com/in/radgruchalski/) ( > > http://de.linkedin.com/in/radgruchalski/) > > > > Confidentiality: > > This communication is intended for the above-named person and may be > > confidential and/or legally privileged. > > If it has come to you in error you must take no action based on it, nor > > must you copy or show it to anyone; please delete/destroy and inform the > > sender immediately. > > > > > > > > On Monday, 13 July 2015 at 15:41, Scott Thibault wrote: > > > > > We've tried to use Kafka not as a persistent store, but as a long-term > > > archival store. An outstanding issue we've had with that is that the > > > broker holds on to an open file handle on every file in the log! The > > > > > > > other > > > issue we've had is when you create a long-term archival log on shared > > > storage, you can't simply access that data from another cluster b/c of > > > > > > > meta > > > data being stored in zookeeper rather than in the log. > > > > > > --Scott Thibault > > > > > > > > > On Mon, Jul 13, 2015 at 4:44 AM, Daniel Schierbeck < > > > daniel.schierb...@gmail.com (mailto:daniel.schierb...@gmail.com)> wrote: > > > > > > > Would it be possible to document how to configure Kafka to never delete > > > > messages in a topic? It took a good while to figure this out, and I > > > > > > > > > > > > > > see it > > > > as an important use case for Kafka. > > > > > > > > On Sun, Jul 12, 2015 at 3:02 PM Daniel Schierbeck < > > > > daniel.schierb...@gmail.com (mailto:daniel.schierb...@gmail.com)> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > On 10. jul. 2015, at 23.03, Jay Kreps <j...@confluent.io > > > > > > (mailto:j...@confluent.io) (mailto: > > j...@confluent.io (mailto:j...@confluent.io))> wrote: > > > > > > > > > > > > If I recall correctly, setting log.retention.ms > > > > > > (http://log.retention.ms) ( > > http://log.retention.ms) and > > > > log.retention.bytes > > > > > to > > > > > > -1 disables both. > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > On Fri, Jul 10, 2015 at 1:55 PM, Daniel Schierbeck < > > > > > > daniel.schierb...@gmail.com (mailto:daniel.schierb...@gmail.com)> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > On 10. jul. 2015, at 15.16, Shayne S <shaynest...@gmail.com > > > > > > > > (mailto:shaynest...@gmail.com) > > (mailto:shaynest...@gmail.com)> wrote: > > > > > > > > > > > > > > > > There are two ways you can configure your topics, log > > compaction and > > > > > with > > > > > > > > no cleaning. The choice depends on your use case. Are the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > records > > > > > > > > > > > > > > uniquely > > > > > > > > identifiable and will they receive updates? Then log > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > compaction is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > way > > > > > > > > to go. If they are truly read only, you can go without log > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > compaction. > > > > > > > > > > > > > > I'd rather be free to use the key for partitioning, and the > > records > > > > are > > > > > > > immutable — they're event records — so disabling compaction > > > > > > > > > > > > > > > > > > > > > > > > > > altogether > > > > > > > would be preferable. How is that accomplished? > > > > > > > > > > > > > > > > We have a small processes which consume a topic and perform > > upserts > > > > to > > > > > > > our > > > > > > > > various database engines. It's easy to change how it all works > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > > > > > > simply > > > > > > > > consume the single source of truth again. > > > > > > > > > > > > > > > > I've written a bit about log compaction here: > > http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/ > > > > > > > > > > > > > > > > On Fri, Jul 10, 2015 at 3:46 AM, Daniel Schierbeck < > > > > > > > > daniel.schierb...@gmail.com > > > > > > > > (mailto:daniel.schierb...@gmail.com) (mailto: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > daniel.schierb...@gmail.com (mailto:daniel.schierb...@gmail.com))> wrote: > > > > > > > > > > > > > > > > > I'd like to use Kafka as a persistent store – sort of as an > > > > > alternative > > > > > > > to > > > > > > > > > HDFS. The idea is that I'd load the data into various other > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > systems > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > in > > > > > > > > > order to solve specific needs such as full-text search, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > analytics, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > indexing > > > > > > > > > by various attributes, etc. I'd like to keep a single source > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > truth, > > > > > > > > > however. > > > > > > > > > > > > > > > > > > I'm struggling a bit to understand how I can configure a > > topic to > > > > > retain > > > > > > > > > messages indefinitely. I want to make sure that my data isn't > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > deleted. > > > > > > > Is > > > > > > > > > there a guide to configuring Kafka like this? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > *This e-mail is not encrypted. Due to the unsecured nature of unencrypted > > > e-mail, there may be some level of risk that the information in this > > > > > > > e-mail > > > could be read by a third party. Accordingly, the recipient(s) named above > > > are hereby advised to not communicate protected health information using > > > this e-mail address. If you desire to send protected health information > > > electronically, please contact MultiScale Health Networks at > > > > > > > (206)538-6090* > > > > > > > > > > > -- > *This e-mail is not encrypted. Due to the unsecured nature of unencrypted > e-mail, there may be some level of risk that the information in this e-mail > could be read by a third party. Accordingly, the recipient(s) named above > are hereby advised to not communicate protected health information using > this e-mail address. If you desire to send protected health information > electronically, please contact MultiScale Health Networks at (206)538-6090* > >