Re: Using Kafka as a persistent store

2015-07-14 Thread Shayne S
Thanks, I'm on 0.8.2 so that explains it. Should retention.ms affect segment rolling? In my experiment it did ( retention.ms = -1), which was unexpected since I thought only segment.bytes and segment.ms would control that. On Mon, Jul 13, 2015 at 7:57 PM, Daniel Tamai daniel.ta...@gmail.com

Re: Using Kafka as a persistent store

2015-07-13 Thread Daniel Schierbeck
Would it be possible to document how to configure Kafka to never delete messages in a topic? It took a good while to figure this out, and I see it as an important use case for Kafka. On Sun, Jul 12, 2015 at 3:02 PM Daniel Schierbeck daniel.schierb...@gmail.com wrote: On 10. jul. 2015, at

Re: Using Kafka as a persistent store

2015-07-13 Thread Scott Thibault
We've tried to use Kafka not as a persistent store, but as a long-term archival store. An outstanding issue we've had with that is that the broker holds on to an open file handle on every file in the log! The other issue we've had is when you create a long-term archival log on shared storage,

Re: Using Kafka as a persistent store

2015-07-13 Thread Rad Gruchalski
Scott, This is what I was trying to target in one of my previous responses to Daniel. The one in which I suggest another compaction setting for kafka. Kind regards,
 Radek Gruchalski 
ra...@gruchalski.com (mailto:ra...@gruchalski.com)
 (mailto:ra...@gruchalski.com)

Re: Using Kafka as a persistent store

2015-07-13 Thread Gwen Shapira
Hi, 1. What you described sounds like a reasonable architecture, but may I ask why JSON? Avro seems better supported in the ecosystem (Confluent's tools, Hadoop integration, schema evolution, tools, etc). 1.5 If all you do is convert data into JSON, SparkStreaming sounds like a

Re: Using Kafka as a persistent store

2015-07-13 Thread James Cheng
For what it's worth, I did something similar to Rad's suggestion of cold-storage to add long-term archiving when using Amazon Kinesis. Kinesis is also a message bus, but only has a 24 hour retention window. I wrote a Kinesis consumer that would take all messages from Kinesis and save them into

Re: Using Kafka as a persistent store

2015-07-13 Thread Tim Smith
I have had a similar issue where I wanted a single source of truth between Search and HDFS. First, if you zoom out a little, eventually you are going to have some compute engine(s) process the data. If you store it in a compute neutral tier like kafka then you will need to suck the data out at

Re: Using Kafka as a persistent store

2015-07-13 Thread Rad Gruchalski
Sounds like the same idea. The nice thing about having such option is that, with a correct application of containers, backup and restore strategy, one can create an infinite ordered backup of raw input stream using native Kafka storage format. I understand the point of having the data in other

Re: Using Kafka as a persistent store

2015-07-13 Thread Rad Gruchalski
Indeed, the files would have to be moved to some separate, dedicated storage. There are basically 3 options, as kafka does not allow adding logs at runtime: 1. make the consumer able to read from an arbitrary file 2. add ability to drop files in (I believe this adds a lot of complexity) 3. read

Re: Using Kafka as a persistent store

2015-07-13 Thread Daniel Schierbeck
Am I correct in assuming that Kafka will only retain a file handle for the last segment of the log? If the number of handles grows unbounded, then it would be an issue. But I plan on writing to this topic continuously anyway, so not separating data into cold and hot storage is the entire point.

Re: Using Kafka as a persistent store

2015-07-13 Thread Scott Thibault
Yes, consider my e-mail an up vote! I guess the files would automatically moved somewhere else to separate the active from cold segments? Ideally, one could run an unmodified consumer application on the cold segments. --Scott On Mon, Jul 13, 2015 at 6:57 AM, Rad Gruchalski

Re: Using Kafka as a persistent store

2015-07-13 Thread Shayne S
Did this work for you? I set the topic settings to retention.ms=-1 and retention.bytes=-1 and it looks like it is deleting segments immediately. On Sun, Jul 12, 2015 at 8:02 AM, Daniel Schierbeck daniel.schierb...@gmail.com wrote: On 10. jul. 2015, at 23.03, Jay Kreps j...@confluent.io

Re: Using Kafka as a persistent store

2015-07-13 Thread Daniel Tamai
Using -1 for log.retention.ms should work only for 0.8.3 ( https://issues.apache.org/jira/browse/KAFKA-1990). 2015-07-13 17:08 GMT-03:00 Shayne S shaynest...@gmail.com: Did this work for you? I set the topic settings to retention.ms=-1 and retention.bytes=-1 and it looks like it is deleting

Re: Using Kafka as a persistent store

2015-07-12 Thread Daniel Schierbeck
On 10. jul. 2015, at 23.03, Jay Kreps j...@confluent.io wrote: If I recall correctly, setting log.retention.ms and log.retention.bytes to -1 disables both. Thanks! On Fri, Jul 10, 2015 at 1:55 PM, Daniel Schierbeck daniel.schierb...@gmail.com wrote: On 10. jul. 2015, at 15.16,

Re: Using Kafka as a persistent store

2015-07-11 Thread Daniel Schierbeck
Radek: I don't see how data could be stored more efficiently than in Kafka itself. It's optimized for cheap storage and offers high-performance bulk export, exactly what you want from long-term archival. On fre. 10. jul. 2015 at 23.16 Rad Gruchalski ra...@gruchalski.com wrote: Hello all, This

Re: Using Kafka as a persistent store

2015-07-11 Thread Rad Gruchalski
Daniel, I understand your point. From what I understand the mode that suits you is what Jay suggested: log.retention.ms (http://log.retention.ms) and log.retention.bytes both set to -1. A few questions before I continue on something what may already be possible: 1. is it possible to attach

Using Kafka as a persistent store

2015-07-10 Thread Daniel Schierbeck
I'd like to use Kafka as a persistent store – sort of as an alternative to HDFS. The idea is that I'd load the data into various other systems in order to solve specific needs such as full-text search, analytics, indexing by various attributes, etc. I'd like to keep a single source of truth,

Re: Using Kafka as a persistent store

2015-07-10 Thread Shayne S
There are two ways you can configure your topics, log compaction and with no cleaning. The choice depends on your use case. Are the records uniquely identifiable and will they receive updates? Then log compaction is the way to go. If they are truly read only, you can go without log compaction. We

Re: Using Kafka as a persistent store

2015-07-10 Thread noah
I don't want to endorse this use of Kafka, but assuming you can give your message unique identifiers, I believe using log compaction will keep all unique messages forever. You can read about how consumer offsets stored in Kafka are managed using a compacted topic here:

Re: Using Kafka as a persistent store

2015-07-10 Thread Daniel Schierbeck
On 10. jul. 2015, at 15.16, Shayne S shaynest...@gmail.com wrote: There are two ways you can configure your topics, log compaction and with no cleaning. The choice depends on your use case. Are the records uniquely identifiable and will they receive updates? Then log compaction is the way

Re: Using Kafka as a persistent store

2015-07-10 Thread Jay Kreps
If I recall correctly, setting log.retention.ms and log.retention.bytes to -1 disables both. On Fri, Jul 10, 2015 at 1:55 PM, Daniel Schierbeck daniel.schierb...@gmail.com wrote: On 10. jul. 2015, at 15.16, Shayne S shaynest...@gmail.com wrote: There are two ways you can configure your

Re: Using Kafka as a persistent store

2015-07-10 Thread Rad Gruchalski
Hello all, This is a very interesting discussion. I’ve been thinking of a similar use case for Kafka over the last few days. The usual data workflow with Kafka is most likely something this: - ingest with Kafka - process with Storm / Samza / whathaveyou - put some processed data back on