Number of kafka topics/partitions supported per cluster of n nodes

2015-07-27 Thread Prabhjot Bharaj
Hi, I'm looking forward to a benchmark which can explain how many total number of topics and partitions can be created in a cluster of n nodes, given the message size varies between x and y bytes and how does it vary with varying heap sizes and how it affects the system performance. e.g. the

Best practices - Using kafka (with http server) as source-of-truth

2015-07-27 Thread Prabhjot Bharaj
Hi Folks, I would like to understand the best practices when using kafka as the source-of-truth, given the fact that I want to pump in data to Kafka using http methods. What are the current production configurations for such a use case:- 1. Kafka-http-client - is it scalable the way Nginx is ??

Java API for fetching Consumer group from Kafka Server(Not Zookeeper)

2015-07-27 Thread swati.suman2
Hi Jiangjie, kafka.admin.ConsumerGroupCommand is a scala class. Could you please tell me some Java API for fetching consumer group from Kafka server. Best Regards, Swati Suman The information contained in this electronic message and any attachments to this message are intended for the

Re: Log Deletion Behavior

2015-07-27 Thread Mayuresh Gharat
Hi Jiefu, Any update on this? Were you able to delete those log segments? Thanks, Mayuresh On Fri, Jul 24, 2015 at 7:14 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: To add on, the main thing here is you should be using only one of these properties. Thanks, Mayuresh On Fri,

multiple producer throughput

2015-07-27 Thread Yuheng Du
Hi, I am running 40 producers on 40 nodes cluster. The messages are sent to 6 brokers in another cluster. The producers are running ProducerPerformance test. When 20 nodes are running, the throughput is around 13MB/s and when running 40 nodes, the throughput is around 9MB/s. I have set

Re: Best practices - Using kafka (with http server) as source-of-truth

2015-07-27 Thread Ewen Cheslack-Postava
Hi Prabhjot, Confluent has a REST proxy with docs that may give some guidance: http://docs.confluent.io/1.0/kafka-rest/docs/intro.html The new producer that it uses is very efficient, so you should be able to get pretty good throughput. You take a bit of a hit due to the overhead of sending data

Re: Cache Memory Kafka Process

2015-07-27 Thread Ewen Cheslack-Postava
Having the OS cache the data in Kafka's log files is useful since it means that data doesn't need to be read back from disk when consumed. This is good for the latency and throughput of consumers. Usually this caching works out pretty well, keeping the latest data from your topics in cache and

Re: New consumer - offset one gets in poll is not offset one is supposed to commit

2015-07-27 Thread Jason Gustafson
Hey Stevo, I agree that it's a little unintuitive that what you are committing is the next offset that should be read from and not the one that has already been read. We're probably constrained in that we already have a consumer which implements this behavior. Would it help if we added a method

Re: deleting data automatically

2015-07-27 Thread Yuheng Du
Thank you! On Mon, Jul 27, 2015 at 1:43 PM, Ewen Cheslack-Postava e...@confluent.io wrote: As I mentioned, adjusting any settings such that files are small enough that you don't get the benefits of append-only writes or file creation/deletion become a bottleneck might affect performance. It

Re: deleting data automatically

2015-07-27 Thread Yuheng Du
If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul

Re: deleting data automatically

2015-07-27 Thread Ewen Cheslack-Postava
As I mentioned, adjusting any settings such that files are small enough that you don't get the benefits of append-only writes or file creation/deletion become a bottleneck might affect performance. It looks like the default setting for log.segment.bytes is 1GB, so given fast enough cleanup of old

Re: Log Deletion Behavior

2015-07-27 Thread JIEFU GONG
Mayuresh, Yes, it seems like I misunderstood the behavior of log deletion but indeed my log segments were deleted after a specified amount of time. I have a small follow-up question, it seems that when the logs are deleted the topic persists and can be republished too -- is there a configuration

Re: deleting data automatically

2015-07-27 Thread Yuheng Du
Thank you! what performance impacts will it be if I change log.segment.bytes? Thanks. On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io wrote: I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search

Re: deleting data automatically

2015-07-27 Thread Ewen Cheslack-Postava
I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http://kafka.apache.org/documentation.html As for setting the values too low hurting performance, I'd guess it's probably only

Re: Controlled Shutdown Tool?

2015-07-27 Thread Binh Nguyen Van
You can initiate controlled shutdown by run bin/kafka-server-stop.sh. This will send a SIGTERM to broker to tell it to do the controlled shutdown. I also got confused before and had to look at the code to figure that out. I think it is better if we can add this to the document. -Binh On Mon, Jul

Re: Controlled Shutdown Tool?

2015-07-27 Thread Andrew Otto
Ah, thank you, SIGTERM is what I was looking for. The docs are unclear on that, it would be useful to fix those. Thanks! On Jul 27, 2015, at 14:59, Binh Nguyen Van binhn...@gmail.com wrote: You can initiate controlled shutdown by run bin/kafka-server-stop.sh. This will send a SIGTERM to

Re: Controlled Shutdown Tool?

2015-07-27 Thread Andrew Otto
Thanks! But how do I initiate a controlled shutdown on a running broker? Editing server.properties is not going to cause this to happen. Don’t I have to tell the broker to shutdown nicely? All I really want to do is tell the controller to move leadership to other replicas, so I can shutdown

Re: Controlled Shutdown Tool?

2015-07-27 Thread Sriharsha Chintalapani
controlled.shutdown built into broker when this config set to true it makes request to controller to initiate the controlled shutdown, waits till the request is succeeded and incase of failure retries the shutdown   controlled.shutdown.max.retries times.

Re: Controlled Shutdown Tool?

2015-07-27 Thread Sriharsha Chintalapani
You can set controlled.shutdown.enable to true in kafka’s server.properties  , this is enabled by default in 0.8.2 on wards and also you can set max retries using controlled.shutdown.max.retries defaults to 3 . Thanks, Harsha On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org)

Re: Log Deletion Behavior

2015-07-27 Thread Mayuresh Gharat
Hi Jiefu, The topic will stay forever. You can do delete topic operation to get rid of the topic. Thanks, Mayuresh On Mon, Jul 27, 2015 at 11:19 AM, JIEFU GONG jg...@berkeley.edu wrote: Mayuresh, Yes, it seems like I misunderstood the behavior of log deletion but indeed my log segments

Controlled Shutdown Tool?

2015-07-27 Thread Andrew Otto
I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I can tell, this has been intentionally removed in favor of a JMX(?) config “controlled.shutdown.enable”. It is unclear from the documentation how one

Re: multiple producer throughput

2015-07-27 Thread Yuheng Du
The message size is 100 bytes and each producer sends out 50million messages. It's the number used by the benchmarking kafka post. http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Thanks. On Mon, Jul 27, 2015 at 4:15 PM, Prabhjot Bharaj

Re: multiple producer throughput

2015-07-27 Thread Prabhjot Bharaj
Hi, Have you tried with acks=1 and -1 as well? Please share the numbers and the message size Regards, Prabcs On Jul 27, 2015 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am running 40 producers on 40 nodes cluster. The messages are sent to 6 brokers in another cluster. The

Re: Cache Memory Kafka Process

2015-07-27 Thread Daniel Compton
http://www.linuxatemyram.com may be a helpful resource to explain this better. On Tue, 28 Jul 2015 at 5:32 AM Ewen Cheslack-Postava e...@confluent.io wrote: Having the OS cache the data in Kafka's log files is useful since it means that data doesn't need to be read back from disk when consumed.

Re: Choosing brokers when creating topics

2015-07-27 Thread Ewen Cheslack-Postava
Try the --replica-assignment option for kafka-topics.sh. It allows you to specify which brokers to assign as replicas instead of relying on the assignments being made automatically. -Ewen On Mon, Jul 27, 2015 at 12:25 AM, Jilin Xie jilinxie1...@gmail.com wrote: Hi Is it possible to

Choosing brokers when creating topics

2015-07-27 Thread Jilin Xie
Hi Is it possible to choose which brokers to use when creating a topic? The general command of creating topic is: *bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test* What I'm looking for is: *bin/kafka-topics.sh --create .

Cache Memory Kafka Process

2015-07-27 Thread Nilesh Chhapru
Hi All, I am facing issues with kafka broker process taking a lot of cache memory, just wanted to know if the process really need that much of cache memory, or can i clear the OS level cache by setting a cron. Regards, Nilesh Chhapru.

Re: Choosing brokers when creating topics

2015-07-27 Thread Jilin Xie
Hi Even Thanks for your reply. I've been using the kafka-reassign-partition tool. But --replica-assignment is exactly what I'm looking for. Thanks On Mon, Jul 27, 2015 at 3:58 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Try the --replica-assignment option