Would it be possible to document how to configure Kafka to never delete
messages in a topic? It took a good while to figure this out, and I see it
as an important use case for Kafka.
On Sun, Jul 12, 2015 at 3:02 PM Daniel Schierbeck
daniel.schierb...@gmail.com wrote:
On 10. jul. 2015, at
Hi,
Appreciate your response. It works now! It is just a typo of the class
names : (.
It really has nothing to do with whether you are using the binaries or the
source version of kafka.
Thanks everyone!
On Mon, Jul 13, 2015 at 11:18 PM, tao xiao xiaotao...@gmail.com wrote:
org.apache.kafka.clients.tools.ProducerPerformance resides in
kafka-clients-0.8.2.1.jar.
You need to make sure the jar exists in $KAFKA_HOME/libs/. I use
kafka_2.10-0.8.2.1
too and here is the output
% bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
USAGE: java
Tim,
Kafka can be used as a key-value store if you turn on log compaction:
http://kafka.apache.org/documentation.html#compaction You need to be
careful with that since it's purely last-writer-wins and doesn't have
anything like CAS that might help you manage concurrent writers, but the
basic
We've tried to use Kafka not as a persistent store, but as a long-term
archival store. An outstanding issue we've had with that is that the
broker holds on to an open file handle on every file in the log! The other
issue we've had is when you create a long-term archival log on shared
storage,
Scott,
This is what I was trying to target in one of my previous responses to Daniel.
The one in which I suggest another compaction setting for kafka.
Kind regards,
Radek Gruchalski
ra...@gruchalski.com (mailto:ra...@gruchalski.com)
(mailto:ra...@gruchalski.com)
Ah, just saw this. I actually just submitted a patch this evening --
just for the partitionwide version at the moment, since it turns out
to be pretty simple to implement. Still very interested in moving
forward with this stuff, though not always as much time as I would
like...
On Thu, Jul 9,
I implemented (nearly) the same basic set of tests in the system test
framework we started at Confluent and that is going to move into Kafka --
see the wip patch for KIP-25 here: https://github.com/apache/kafka/pull/70
In particular, that test is implemented in benchmark_test.py:
Hi,
1. What you described sounds like a reasonable architecture, but may I
ask why JSON? Avro seems better supported in the ecosystem
(Confluent's tools, Hadoop integration, schema evolution, tools, etc).
1.5 If all you do is convert data into JSON, SparkStreaming sounds
like a
Hi all,
I was wondering if any of you guys have done benchmarks on Kafka
performance before, and if they or their details (# nodes in cluster, #
records / size(s) of messages, etc.) could be shared.
For comparison purposes, I am trying to benchmark Kafka against some
similar services such as
For what it's worth, I did something similar to Rad's suggestion of
cold-storage to add long-term archiving when using Amazon Kinesis. Kinesis is
also a message bus, but only has a 24 hour retention window.
I wrote a Kinesis consumer that would take all messages from Kinesis and save
them into
I have had a similar issue where I wanted a single source of truth between
Search and HDFS. First, if you zoom out a little, eventually you are going
to have some compute engine(s) process the data. If you store it in a
compute neutral tier like kafka then you will need to suck the data out at
Hi,
In the big data ecosystem, I have started to use kafka, essentially, as a:
- unordered list/array, and
- a cluster-wide pipe
I guess you could argue that any message bus product is a simple array/pipe
but kafka's scale and model make things so easy :)
I am wondering if there are any
Sounds like the same idea. The nice thing about having such option is that,
with a correct application of containers, backup and restore strategy, one can
create an infinite ordered backup of raw input stream using native Kafka
storage format.
I understand the point of having the data in other
Indeed, the files would have to be moved to some separate, dedicated storage.
There are basically 3 options, as kafka does not allow adding logs at runtime:
1. make the consumer able to read from an arbitrary file
2. add ability to drop files in (I believe this adds a lot of complexity)
3. read
Am I correct in assuming that Kafka will only retain a file handle for the last
segment of the log? If the number of handles grows unbounded, then it would be
an issue. But I plan on writing to this topic continuously anyway, so not
separating data into cold and hot storage is the entire point.
2)You need to implement MetricReporter and provider that implementation
class name against producer side configuration metric.reporters
On Mon, Jul 13, 2015 at 9:08 PM, Swati Suman swatisuman1...@gmail.com
wrote:
Hi Team,
We are using Kafka 0.8.2
I have two questions:
1)Is there any Java
Hi Team,
We are using Kafka 0.8.2
I have two questions:
1)Is there any Java Api in Kafka that gives me the list of all the consumer
groups along with the topic/partition from which they are consuming
Also, is there any way that I can fetch the zookeeper list from the kafka
server side .
Note: I
Yes, consider my e-mail an up vote!
I guess the files would automatically moved somewhere else to separate the
active from cold segments? Ideally, one could run an unmodified consumer
application on the cold segments.
--Scott
On Mon, Jul 13, 2015 at 6:57 AM, Rad Gruchalski
Did this work for you? I set the topic settings to retention.ms=-1 and
retention.bytes=-1 and it looks like it is deleting segments immediately.
On Sun, Jul 12, 2015 at 8:02 AM, Daniel Schierbeck
daniel.schierb...@gmail.com wrote:
On 10. jul. 2015, at 23.03, Jay Kreps j...@confluent.io
Hi guys,
I am trying to replicate the test of benchmarking kafka at
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
.
When I run
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test7 5000 100 -1 acks=1
Thank you. I see that in run-class.sh, they have the following lines:
63 for file in $base_dir/clients/build/libs/kafka-clients*.jar;
64 do
65 CLASSPATH=$CLASSPATH:$file
66 done
So I believe all the jars in the libs/ directory have already been included
in the classpath?
Which
I am using the binaries of kafka_2.10-0.8.2.1. Could that be the problem?
Should I use the source of kafka-0.8.2.1-src.tgz to each of my machiines,
build them and run the test?
Thanks.
On Mon, Jul 13, 2015 at 4:37 PM, JIEFU GONG jg...@berkeley.edu wrote:
You may need to open up your
You may need to open up your run-class.sh in a text editor and modify the
classpath -- I believe I had a similar error before.
On Mon, Jul 13, 2015 at 1:16 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:
Hi guys,
I am trying to replicate the test of benchmarking kafka at
I am trying to replace ActiveMQ with Kafka in our environment however I
have encountered a strange problem that basically prevents from using Kafka
in production. The problem is that sometimes the offsets are not committed.
I am using Kafka 0.8.2.1, offset storage = kafka, high level consumer,
Using -1 for log.retention.ms should work only for 0.8.3 (
https://issues.apache.org/jira/browse/KAFKA-1990).
2015-07-13 17:08 GMT-03:00 Shayne S shaynest...@gmail.com:
Did this work for you? I set the topic settings to retention.ms=-1 and
retention.bytes=-1 and it looks like it is deleting
Hi,
I'm trying to understand the new producer, and the order in which the Callbacks
will be called.
From my understanding, records are batched up per partition. So all records
destined for a specific partition will be sent in order, and that means that
their callbacks will be called in order.
James,
There are separate queues for each partition, so there are no
guarantees on the order of the sends (or callbacks) between
partitions.
(Actually, IIRC, the code intentionally randomizes the partition order
a bit, possibly to avoid starvation)
Gwen
On Mon, Jul 13, 2015 at 5:41 PM, James
28 matches
Mail list logo