Thanks, I'm on 0.8.2 so that explains it.
Should retention.ms affect segment rolling? In my experiment it did (
retention.ms = -1), which was unexpected since I thought only segment.bytes
and segment.ms would control that.
On Mon, Jul 13, 2015 at 7:57 PM, Daniel Tamai daniel.ta...@gmail.com
Would it be possible to document how to configure Kafka to never delete
messages in a topic? It took a good while to figure this out, and I see it
as an important use case for Kafka.
On Sun, Jul 12, 2015 at 3:02 PM Daniel Schierbeck
daniel.schierb...@gmail.com wrote:
On 10. jul. 2015, at
We've tried to use Kafka not as a persistent store, but as a long-term
archival store. An outstanding issue we've had with that is that the
broker holds on to an open file handle on every file in the log! The other
issue we've had is when you create a long-term archival log on shared
storage,
Scott,
This is what I was trying to target in one of my previous responses to Daniel.
The one in which I suggest another compaction setting for kafka.
Kind regards,
Radek Gruchalski
ra...@gruchalski.com (mailto:ra...@gruchalski.com)
(mailto:ra...@gruchalski.com)
Hi,
1. What you described sounds like a reasonable architecture, but may I
ask why JSON? Avro seems better supported in the ecosystem
(Confluent's tools, Hadoop integration, schema evolution, tools, etc).
1.5 If all you do is convert data into JSON, SparkStreaming sounds
like a
For what it's worth, I did something similar to Rad's suggestion of
cold-storage to add long-term archiving when using Amazon Kinesis. Kinesis is
also a message bus, but only has a 24 hour retention window.
I wrote a Kinesis consumer that would take all messages from Kinesis and save
them into
I have had a similar issue where I wanted a single source of truth between
Search and HDFS. First, if you zoom out a little, eventually you are going
to have some compute engine(s) process the data. If you store it in a
compute neutral tier like kafka then you will need to suck the data out at
Sounds like the same idea. The nice thing about having such option is that,
with a correct application of containers, backup and restore strategy, one can
create an infinite ordered backup of raw input stream using native Kafka
storage format.
I understand the point of having the data in other
Indeed, the files would have to be moved to some separate, dedicated storage.
There are basically 3 options, as kafka does not allow adding logs at runtime:
1. make the consumer able to read from an arbitrary file
2. add ability to drop files in (I believe this adds a lot of complexity)
3. read
Am I correct in assuming that Kafka will only retain a file handle for the last
segment of the log? If the number of handles grows unbounded, then it would be
an issue. But I plan on writing to this topic continuously anyway, so not
separating data into cold and hot storage is the entire point.
Yes, consider my e-mail an up vote!
I guess the files would automatically moved somewhere else to separate the
active from cold segments? Ideally, one could run an unmodified consumer
application on the cold segments.
--Scott
On Mon, Jul 13, 2015 at 6:57 AM, Rad Gruchalski
Did this work for you? I set the topic settings to retention.ms=-1 and
retention.bytes=-1 and it looks like it is deleting segments immediately.
On Sun, Jul 12, 2015 at 8:02 AM, Daniel Schierbeck
daniel.schierb...@gmail.com wrote:
On 10. jul. 2015, at 23.03, Jay Kreps j...@confluent.io
Using -1 for log.retention.ms should work only for 0.8.3 (
https://issues.apache.org/jira/browse/KAFKA-1990).
2015-07-13 17:08 GMT-03:00 Shayne S shaynest...@gmail.com:
Did this work for you? I set the topic settings to retention.ms=-1 and
retention.bytes=-1 and it looks like it is deleting
On 10. jul. 2015, at 23.03, Jay Kreps j...@confluent.io wrote:
If I recall correctly, setting log.retention.ms and log.retention.bytes to
-1 disables both.
Thanks!
On Fri, Jul 10, 2015 at 1:55 PM, Daniel Schierbeck
daniel.schierb...@gmail.com wrote:
On 10. jul. 2015, at 15.16,
Radek: I don't see how data could be stored more efficiently than in Kafka
itself. It's optimized for cheap storage and offers high-performance bulk
export, exactly what you want from long-term archival.
On fre. 10. jul. 2015 at 23.16 Rad Gruchalski ra...@gruchalski.com wrote:
Hello all,
This
Daniel,
I understand your point. From what I understand the mode that suits you is what
Jay suggested: log.retention.ms (http://log.retention.ms) and
log.retention.bytes both set to -1.
A few questions before I continue on something what may already be possible:
1. is it possible to attach
I'd like to use Kafka as a persistent store – sort of as an alternative to
HDFS. The idea is that I'd load the data into various other systems in
order to solve specific needs such as full-text search, analytics, indexing
by various attributes, etc. I'd like to keep a single source of truth,
There are two ways you can configure your topics, log compaction and with
no cleaning. The choice depends on your use case. Are the records uniquely
identifiable and will they receive updates? Then log compaction is the way
to go. If they are truly read only, you can go without log compaction.
We
I don't want to endorse this use of Kafka, but assuming you can give your
message unique identifiers, I believe using log compaction will keep all
unique messages forever. You can read about how consumer offsets stored in
Kafka are managed using a compacted topic here:
On 10. jul. 2015, at 15.16, Shayne S shaynest...@gmail.com wrote:
There are two ways you can configure your topics, log compaction and with
no cleaning. The choice depends on your use case. Are the records uniquely
identifiable and will they receive updates? Then log compaction is the way
If I recall correctly, setting log.retention.ms and log.retention.bytes to
-1 disables both.
On Fri, Jul 10, 2015 at 1:55 PM, Daniel Schierbeck
daniel.schierb...@gmail.com wrote:
On 10. jul. 2015, at 15.16, Shayne S shaynest...@gmail.com wrote:
There are two ways you can configure your
Hello all,
This is a very interesting discussion. I’ve been thinking of a similar use case
for Kafka over the last few days.
The usual data workflow with Kafka is most likely something this:
- ingest with Kafka
- process with Storm / Samza / whathaveyou
- put some processed data back on
22 matches
Mail list logo