Re: Using Kafka as a persistent store

2015-07-13 Thread Daniel Schierbeck
Would it be possible to document how to configure Kafka to never delete messages in a topic? It took a good while to figure this out, and I see it as an important use case for Kafka. On Sun, Jul 12, 2015 at 3:02 PM Daniel Schierbeck daniel.schierb...@gmail.com wrote: On 10. jul. 2015, at

Re: performance benchmarking of kafka

2015-07-13 Thread Yuheng Du
Hi, Appreciate your response. It works now! It is just a typo of the class names : (. It really has nothing to do with whether you are using the binaries or the source version of kafka. Thanks everyone! On Mon, Jul 13, 2015 at 11:18 PM, tao xiao xiaotao...@gmail.com wrote:

Re: performance benchmarking of kafka

2015-07-13 Thread tao xiao
org.apache.kafka.clients.tools.ProducerPerformance resides in kafka-clients-0.8.2.1.jar. You need to make sure the jar exists in $KAFKA_HOME/libs/. I use kafka_2.10-0.8.2.1 too and here is the output % bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance USAGE: java

Re: Data Structure abstractions over kafka

2015-07-13 Thread Ewen Cheslack-Postava
Tim, Kafka can be used as a key-value store if you turn on log compaction: http://kafka.apache.org/documentation.html#compaction You need to be careful with that since it's purely last-writer-wins and doesn't have anything like CAS that might help you manage concurrent writers, but the basic

Re: Using Kafka as a persistent store

2015-07-13 Thread Scott Thibault
We've tried to use Kafka not as a persistent store, but as a long-term archival store. An outstanding issue we've had with that is that the broker holds on to an open file handle on every file in the log! The other issue we've had is when you create a long-term archival log on shared storage,

Re: Using Kafka as a persistent store

2015-07-13 Thread Rad Gruchalski
Scott, This is what I was trying to target in one of my previous responses to Daniel. The one in which I suggest another compaction setting for kafka. Kind regards,
 Radek Gruchalski 
ra...@gruchalski.com (mailto:ra...@gruchalski.com)
 (mailto:ra...@gruchalski.com)

Re: Kafka as an event store for Event Sourcing

2015-07-13 Thread Ben Kirwin
Ah, just saw this. I actually just submitted a patch this evening -- just for the partitionwide version at the moment, since it turns out to be pretty simple to implement. Still very interested in moving forward with this stuff, though not always as much time as I would like... On Thu, Jul 9,

Re: kafka benchmark tests

2015-07-13 Thread Ewen Cheslack-Postava
I implemented (nearly) the same basic set of tests in the system test framework we started at Confluent and that is going to move into Kafka -- see the wip patch for KIP-25 here: https://github.com/apache/kafka/pull/70 In particular, that test is implemented in benchmark_test.py:

Re: Using Kafka as a persistent store

2015-07-13 Thread Gwen Shapira
Hi, 1. What you described sounds like a reasonable architecture, but may I ask why JSON? Avro seems better supported in the ecosystem (Confluent's tools, Hadoop integration, schema evolution, tools, etc). 1.5 If all you do is convert data into JSON, SparkStreaming sounds like a

kafka benchmark tests

2015-07-13 Thread JIEFU GONG
Hi all, I was wondering if any of you guys have done benchmarks on Kafka performance before, and if they or their details (# nodes in cluster, # records / size(s) of messages, etc.) could be shared. For comparison purposes, I am trying to benchmark Kafka against some similar services such as

Re: Using Kafka as a persistent store

2015-07-13 Thread James Cheng
For what it's worth, I did something similar to Rad's suggestion of cold-storage to add long-term archiving when using Amazon Kinesis. Kinesis is also a message bus, but only has a 24 hour retention window. I wrote a Kinesis consumer that would take all messages from Kinesis and save them into

Re: Using Kafka as a persistent store

2015-07-13 Thread Tim Smith
I have had a similar issue where I wanted a single source of truth between Search and HDFS. First, if you zoom out a little, eventually you are going to have some compute engine(s) process the data. If you store it in a compute neutral tier like kafka then you will need to suck the data out at

Data Structure abstractions over kafka

2015-07-13 Thread Tim Smith
Hi, In the big data ecosystem, I have started to use kafka, essentially, as a: - unordered list/array, and - a cluster-wide pipe I guess you could argue that any message bus product is a simple array/pipe but kafka's scale and model make things so easy :) I am wondering if there are any

Re: Using Kafka as a persistent store

2015-07-13 Thread Rad Gruchalski
Sounds like the same idea. The nice thing about having such option is that, with a correct application of containers, backup and restore strategy, one can create an infinite ordered backup of raw input stream using native Kafka storage format. I understand the point of having the data in other

Re: Using Kafka as a persistent store

2015-07-13 Thread Rad Gruchalski
Indeed, the files would have to be moved to some separate, dedicated storage. There are basically 3 options, as kafka does not allow adding logs at runtime: 1. make the consumer able to read from an arbitrary file 2. add ability to drop files in (I believe this adds a lot of complexity) 3. read

Re: Using Kafka as a persistent store

2015-07-13 Thread Daniel Schierbeck
Am I correct in assuming that Kafka will only retain a file handle for the last segment of the log? If the number of handles grows unbounded, then it would be an issue. But I plan on writing to this topic continuously anyway, so not separating data into cold and hot storage is the entire point.

Re: Fetching details from Kafka Server

2015-07-13 Thread pushkar priyadarshi
2)You need to implement MetricReporter and provider that implementation class name against producer side configuration metric.reporters On Mon, Jul 13, 2015 at 9:08 PM, Swati Suman swatisuman1...@gmail.com wrote: Hi Team, We are using Kafka 0.8.2 I have two questions: 1)Is there any Java

Fetching details from Kafka Server

2015-07-13 Thread Swati Suman
Hi Team, We are using Kafka 0.8.2 I have two questions: 1)Is there any Java Api in Kafka that gives me the list of all the consumer groups along with the topic/partition from which they are consuming Also, is there any way that I can fetch the zookeeper list from the kafka server side . Note: I

Re: Using Kafka as a persistent store

2015-07-13 Thread Scott Thibault
Yes, consider my e-mail an up vote! I guess the files would automatically moved somewhere else to separate the active from cold segments? Ideally, one could run an unmodified consumer application on the cold segments. --Scott On Mon, Jul 13, 2015 at 6:57 AM, Rad Gruchalski

Re: Using Kafka as a persistent store

2015-07-13 Thread Shayne S
Did this work for you? I set the topic settings to retention.ms=-1 and retention.bytes=-1 and it looks like it is deleting segments immediately. On Sun, Jul 12, 2015 at 8:02 AM, Daniel Schierbeck daniel.schierb...@gmail.com wrote: On 10. jul. 2015, at 23.03, Jay Kreps j...@confluent.io

performance benchmarking of kafka

2015-07-13 Thread Yuheng Du
Hi guys, I am trying to replicate the test of benchmarking kafka at http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines . When I run bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1

Re: performance benchmarking of kafka

2015-07-13 Thread Yuheng Du
Thank you. I see that in run-class.sh, they have the following lines: 63 for file in $base_dir/clients/build/libs/kafka-clients*.jar; 64 do 65 CLASSPATH=$CLASSPATH:$file 66 done So I believe all the jars in the libs/ directory have already been included in the classpath? Which

Re: performance benchmarking of kafka

2015-07-13 Thread Yuheng Du
I am using the binaries of kafka_2.10-0.8.2.1. Could that be the problem? Should I use the source of kafka-0.8.2.1-src.tgz to each of my machiines, build them and run the test? Thanks. On Mon, Jul 13, 2015 at 4:37 PM, JIEFU GONG jg...@berkeley.edu wrote: You may need to open up your

Re: performance benchmarking of kafka

2015-07-13 Thread JIEFU GONG
You may need to open up your run-class.sh in a text editor and modify the classpath -- I believe I had a similar error before. On Mon, Jul 13, 2015 at 1:16 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi guys, I am trying to replicate the test of benchmarking kafka at

Offset not committed

2015-07-13 Thread Vadim Bobrov
I am trying to replace ActiveMQ with Kafka in our environment however I have encountered a strange problem that basically prevents from using Kafka in production. The problem is that sometimes the offsets are not committed. I am using Kafka 0.8.2.1, offset storage = kafka, high level consumer,

Re: Using Kafka as a persistent store

2015-07-13 Thread Daniel Tamai
Using -1 for log.retention.ms should work only for 0.8.3 ( https://issues.apache.org/jira/browse/KAFKA-1990). 2015-07-13 17:08 GMT-03:00 Shayne S shaynest...@gmail.com: Did this work for you? I set the topic settings to retention.ms=-1 and retention.bytes=-1 and it looks like it is deleting

New producer and ordering of Callbacks when sending to multiple partitions

2015-07-13 Thread James Cheng
Hi, I'm trying to understand the new producer, and the order in which the Callbacks will be called. From my understanding, records are batched up per partition. So all records destined for a specific partition will be sent in order, and that means that their callbacks will be called in order.

Re: New producer and ordering of Callbacks when sending to multiple partitions

2015-07-13 Thread Gwen Shapira
James, There are separate queues for each partition, so there are no guarantees on the order of the sends (or callbacks) between partitions. (Actually, IIRC, the code intentionally randomizes the partition order a bit, possibly to avoid starvation) Gwen On Mon, Jul 13, 2015 at 5:41 PM, James