Number of kafka topics/partitions supported per cluster of n nodes
Hi, I'm looking forward to a benchmark which can explain how many total number of topics and partitions can be created in a cluster of n nodes, given the message size varies between x and y bytes and how does it vary with varying heap sizes and how it affects the system performance. e.g. the result should look like: t topics with p partitions each can be supported in a cluster of n nodes with a heap size of h MB, before the cluster sees things like JVM crashes or high mem usage or system slowdown etc. I think such benchmarks must exist so that we can make better decisions on ops side If these details dont exist, I'll be doing this test myself on varying the values of parameters described above. I would be happy to share the numbers with the community Thanks, prabcs
Best practices - Using kafka (with http server) as source-of-truth
Hi Folks, I would like to understand the best practices when using kafka as the source-of-truth, given the fact that I want to pump in data to Kafka using http methods. What are the current production configurations for such a use case:- 1. Kafka-http-client - is it scalable the way Nginx is ?? 2. Using Kafka and Nginx together - If anybody has used this, please explain 3. Any other scalable method ? Regards, prabcs
Java API for fetching Consumer group from Kafka Server(Not Zookeeper)
Hi Jiangjie, kafka.admin.ConsumerGroupCommand is a scala class. Could you please tell me some Java API for fetching consumer group from Kafka server. Best Regards, Swati Suman The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
Re: Log Deletion Behavior
Hi Jiefu, Any update on this? Were you able to delete those log segments? Thanks, Mayuresh On Fri, Jul 24, 2015 at 7:14 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: To add on, the main thing here is you should be using only one of these properties. Thanks, Mayuresh On Fri, Jul 24, 2015 at 6:47 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Yes. It should. Do not set other retention settings. Just use the hours settings. Let me know about this :) Thanks, Mayuresh On Fri, Jul 24, 2015 at 6:43 PM, JIEFU GONG jg...@berkeley.edu wrote: Mayuresh, thanks for your comment. I won't be able to change these settings until next Monday, but just so confirm you are saying that if I restart the brokers my logs should delete themselves with respect to the newest settings, correct? ᐧ On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: No. This should not happen. At Linkedin we just use the log retention hours. Try using that. Chang e it and bounce the broker. It should work. Also looking back at the config's I am not sure why we had 3 different configs for the same property : log.retention.ms log.retention.minutes log.retention.hours We should probably be having just the milliseconds. Thanks, Mayuresh On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG jg...@berkeley.edu wrote: Hi all, I have a few broad questions on how log deletion works, specifically in conjunction with the log.retention.time setting. Say I published some messages to some topics when the configuration was originally set to something like log.retention.hours=168 (default). If I publish these messages successfully, then later set the configuration to something like log.retention.minutes=1, are those logs supposed to persist for the newest settings or the old settings? Right now my logs are refusing to delete themselves unless I specifically mark them for deletion -- is this the correct/anticipated/wanted behavior? Thanks for the help! -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- -Regards, Mayuresh R. Gharat (862) 250-7125
multiple producer throughput
Hi, I am running 40 producers on 40 nodes cluster. The messages are sent to 6 brokers in another cluster. The producers are running ProducerPerformance test. When 20 nodes are running, the throughput is around 13MB/s and when running 40 nodes, the throughput is around 9MB/s. I have set log.retention.ms=9000 to delete the unwanted messages, just to avoid the disk space to be filled. So I want to know how should I tune the system to get better throughput result? Thanks.
Re: Best practices - Using kafka (with http server) as source-of-truth
Hi Prabhjot, Confluent has a REST proxy with docs that may give some guidance: http://docs.confluent.io/1.0/kafka-rest/docs/intro.html The new producer that it uses is very efficient, so you should be able to get pretty good throughput. You take a bit of a hit due to the overhead of sending data through a proxy, but with appropriate batching you can get about 2/3 the performance as you would get using the Java producer directly. There are also a few other proxies you can find here: https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-HTTPREST You can also put nginx (or HAProxy, or a variety of other solutions) in front of REST proxies for load balancing, HA, SSL termination, etc. This is yet another hop, so it might affect throughput and latency. -Ewen On Mon, Jul 27, 2015 at 6:55 AM, Prabhjot Bharaj prabhbha...@gmail.com wrote: Hi Folks, I would like to understand the best practices when using kafka as the source-of-truth, given the fact that I want to pump in data to Kafka using http methods. What are the current production configurations for such a use case:- 1. Kafka-http-client - is it scalable the way Nginx is ?? 2. Using Kafka and Nginx together - If anybody has used this, please explain 3. Any other scalable method ? Regards, prabcs -- Thanks, Ewen
Re: Cache Memory Kafka Process
Having the OS cache the data in Kafka's log files is useful since it means that data doesn't need to be read back from disk when consumed. This is good for the latency and throughput of consumers. Usually this caching works out pretty well, keeping the latest data from your topics in cache and only pulling older data into memory if a consumer reads data from earlier in the log. In other words, by leveraging OS-level caching of files, Kafka gets an in-memory caching layer for free. Generally you shouldn't need to clear this data -- the OS should only be using memory that isn't being used anyway. Is there a particular problem you're encountering that clearing the cache would help with? -Ewen On Mon, Jul 27, 2015 at 2:33 AM, Nilesh Chhapru nilesh.chha...@ugamsolutions.com wrote: Hi All, I am facing issues with kafka broker process taking a lot of cache memory, just wanted to know if the process really need that much of cache memory, or can i clear the OS level cache by setting a cron. Regards, Nilesh Chhapru. -- Thanks, Ewen
Re: New consumer - offset one gets in poll is not offset one is supposed to commit
Hey Stevo, I agree that it's a little unintuitive that what you are committing is the next offset that should be read from and not the one that has already been read. We're probably constrained in that we already have a consumer which implements this behavior. Would it help if we added a method on ConsumerRecords to get the next offset (e.g. nextOffset(partition))? Thanks, Jason On Fri, Jul 24, 2015 at 10:11 AM, Stevo Slavić ssla...@gmail.com wrote: Hello Apache Kafka community, Say there is only one topic with single partition and a single message on it. Result of calling a poll with new consumer will return ConsumerRecord for that message and it will have offset of 0. After processing message, current KafkaConsumer implementation expects one to commit not offset 0 as processed, but to commit offset 1 - next offset/position one would like to consume. Does this sound strange to you as well? Wondering couldn't this offset+1 handling for next position to read been done in one place, in KafkaConsumer implementation or broker or whatever, instead of every user of KafkaConsumer having to do it. Kind regards, Stevo Slavic.
Re: deleting data automatically
Thank you! On Mon, Jul 27, 2015 at 1:43 PM, Ewen Cheslack-Postava e...@confluent.io wrote: As I mentioned, adjusting any settings such that files are small enough that you don't get the benefits of append-only writes or file creation/deletion become a bottleneck might affect performance. It looks like the default setting for log.segment.bytes is 1GB, so given fast enough cleanup of old logs, you may not need to adjust that setting -- assuming you have a reasonable amount of storage, you'll easily fit many dozen log files of that size. -Ewen On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you! what performance impacts will it be if I change log.segment.bytes? Thanks. On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io wrote: I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http://kafka.apache.org/documentation.html As for setting the values too low hurting performance, I'd guess it's probably only an issue if you set them extremely small, such that file creation and cleanup become a bottleneck. -Ewen On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen -- Thanks, Ewen -- Thanks, Ewen
Re: deleting data automatically
If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen
Re: deleting data automatically
As I mentioned, adjusting any settings such that files are small enough that you don't get the benefits of append-only writes or file creation/deletion become a bottleneck might affect performance. It looks like the default setting for log.segment.bytes is 1GB, so given fast enough cleanup of old logs, you may not need to adjust that setting -- assuming you have a reasonable amount of storage, you'll easily fit many dozen log files of that size. -Ewen On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you! what performance impacts will it be if I change log.segment.bytes? Thanks. On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io wrote: I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http://kafka.apache.org/documentation.html As for setting the values too low hurting performance, I'd guess it's probably only an issue if you set them extremely small, such that file creation and cleanup become a bottleneck. -Ewen On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen -- Thanks, Ewen -- Thanks, Ewen
Re: Log Deletion Behavior
Mayuresh, Yes, it seems like I misunderstood the behavior of log deletion but indeed my log segments were deleted after a specified amount of time. I have a small follow-up question, it seems that when the logs are deleted the topic persists and can be republished too -- is there a configuration for how long a topic persists or does it stay forever until it is manually marked for deletion? Also @Grant, thank you very much for your help as well. I ended up using the ms update configuration and understand the broker configs better. Thanks! On Mon, Jul 27, 2015 at 9:27 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Hi Jiefu, Any update on this? Were you able to delete those log segments? Thanks, Mayuresh On Fri, Jul 24, 2015 at 7:14 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: To add on, the main thing here is you should be using only one of these properties. Thanks, Mayuresh On Fri, Jul 24, 2015 at 6:47 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Yes. It should. Do not set other retention settings. Just use the hours settings. Let me know about this :) Thanks, Mayuresh On Fri, Jul 24, 2015 at 6:43 PM, JIEFU GONG jg...@berkeley.edu wrote: Mayuresh, thanks for your comment. I won't be able to change these settings until next Monday, but just so confirm you are saying that if I restart the brokers my logs should delete themselves with respect to the newest settings, correct? ᐧ On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: No. This should not happen. At Linkedin we just use the log retention hours. Try using that. Chang e it and bounce the broker. It should work. Also looking back at the config's I am not sure why we had 3 different configs for the same property : log.retention.ms log.retention.minutes log.retention.hours We should probably be having just the milliseconds. Thanks, Mayuresh On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG jg...@berkeley.edu wrote: Hi all, I have a few broad questions on how log deletion works, specifically in conjunction with the log.retention.time setting. Say I published some messages to some topics when the configuration was originally set to something like log.retention.hours=168 (default). If I publish these messages successfully, then later set the configuration to something like log.retention.minutes=1, are those logs supposed to persist for the newest settings or the old settings? Right now my logs are refusing to delete themselves unless I specifically mark them for deletion -- is this the correct/anticipated/wanted behavior? Thanks for the help! -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
Re: deleting data automatically
Thank you! what performance impacts will it be if I change log.segment.bytes? Thanks. On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io wrote: I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http://kafka.apache.org/documentation.html As for setting the values too low hurting performance, I'd guess it's probably only an issue if you set them extremely small, such that file creation and cleanup become a bottleneck. -Ewen On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen -- Thanks, Ewen
Re: deleting data automatically
I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http://kafka.apache.org/documentation.html As for setting the values too low hurting performance, I'd guess it's probably only an issue if you set them extremely small, such that file creation and cleanup become a bottleneck. -Ewen On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen -- Thanks, Ewen
Re: Controlled Shutdown Tool?
You can initiate controlled shutdown by run bin/kafka-server-stop.sh. This will send a SIGTERM to broker to tell it to do the controlled shutdown. I also got confused before and had to look at the code to figure that out. I think it is better if we can add this to the document. -Binh On Mon, Jul 27, 2015 at 11:50 AM, Andrew Otto ao...@wikimedia.org wrote: Thanks! But how do I initiate a controlled shutdown on a running broker? Editing server.properties is not going to cause this to happen. Don’t I have to tell the broker to shutdown nicely? All I really want to do is tell the controller to move leadership to other replicas, so I can shutdown the broker without clients getting all confused. On Jul 27, 2015, at 14:48, Sriharsha Chintalapani ka...@harsha.io wrote: You can set controlled.shutdown.enable to true in kafka’s server.properties , this is enabled by default in 0.8.2 on wards and also you can set max retries using controlled.shutdown.max.retries defaults to 3 . Thanks, Harsha On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org mailto:ao...@wikimedia.org) wrote: I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I can tell, this has been intentionally removed in favor of a JMX(?) config “controlled.shutdown.enable”. It is unclear from the documentation how one is supposed to set this for a running broker. Do I need a special JMX tool in order to flick this switch? I’d like to add a command to my kafka bin wrapper script so that I can easily use this when restarting brokers. What is the proper way to set controlled.shutdown.enable? Thanks! -Andrew Otto
Re: Controlled Shutdown Tool?
Ah, thank you, SIGTERM is what I was looking for. The docs are unclear on that, it would be useful to fix those. Thanks! On Jul 27, 2015, at 14:59, Binh Nguyen Van binhn...@gmail.com wrote: You can initiate controlled shutdown by run bin/kafka-server-stop.sh. This will send a SIGTERM to broker to tell it to do the controlled shutdown. I also got confused before and had to look at the code to figure that out. I think it is better if we can add this to the document. -Binh On Mon, Jul 27, 2015 at 11:50 AM, Andrew Otto ao...@wikimedia.org wrote: Thanks! But how do I initiate a controlled shutdown on a running broker? Editing server.properties is not going to cause this to happen. Don’t I have to tell the broker to shutdown nicely? All I really want to do is tell the controller to move leadership to other replicas, so I can shutdown the broker without clients getting all confused. On Jul 27, 2015, at 14:48, Sriharsha Chintalapani ka...@harsha.io wrote: You can set controlled.shutdown.enable to true in kafka’s server.properties , this is enabled by default in 0.8.2 on wards and also you can set max retries using controlled.shutdown.max.retries defaults to 3 . Thanks, Harsha On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org mailto:ao...@wikimedia.org) wrote: I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I can tell, this has been intentionally removed in favor of a JMX(?) config “controlled.shutdown.enable”. It is unclear from the documentation how one is supposed to set this for a running broker. Do I need a special JMX tool in order to flick this switch? I’d like to add a command to my kafka bin wrapper script so that I can easily use this when restarting brokers. What is the proper way to set controlled.shutdown.enable? Thanks! -Andrew Otto
Re: Controlled Shutdown Tool?
Thanks! But how do I initiate a controlled shutdown on a running broker? Editing server.properties is not going to cause this to happen. Don’t I have to tell the broker to shutdown nicely? All I really want to do is tell the controller to move leadership to other replicas, so I can shutdown the broker without clients getting all confused. On Jul 27, 2015, at 14:48, Sriharsha Chintalapani ka...@harsha.io wrote: You can set controlled.shutdown.enable to true in kafka’s server.properties , this is enabled by default in 0.8.2 on wards and also you can set max retries using controlled.shutdown.max.retries defaults to 3 . Thanks, Harsha On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org mailto:ao...@wikimedia.org) wrote: I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I can tell, this has been intentionally removed in favor of a JMX(?) config “controlled.shutdown.enable”. It is unclear from the documentation how one is supposed to set this for a running broker. Do I need a special JMX tool in order to flick this switch? I’d like to add a command to my kafka bin wrapper script so that I can easily use this when restarting brokers. What is the proper way to set controlled.shutdown.enable? Thanks! -Andrew Otto
Re: Controlled Shutdown Tool?
controlled.shutdown built into broker when this config set to true it makes request to controller to initiate the controlled shutdown, waits till the request is succeeded and incase of failure retries the shutdown controlled.shutdown.max.retries times. https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/server/KafkaServer.scala#L175 -- Harsha On July 27, 2015 at 11:50:27 AM, Andrew Otto (ao...@wikimedia.org) wrote: Thanks! But how do I initiate a controlled shutdown on a running broker? Editing server.properties is not going to cause this to happen. Don’t I have to tell the broker to shutdown nicely? All I really want to do is tell the controller to move leadership to other replicas, so I can shutdown the broker without clients getting all confused. On Jul 27, 2015, at 14:48, Sriharsha Chintalapani ka...@harsha.io wrote: You can set controlled.shutdown.enable to true in kafka’s server.properties , this is enabled by default in 0.8.2 on wards and also you can set max retries using controlled.shutdown.max.retries defaults to 3 . Thanks, Harsha On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org) wrote: I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I can tell, this has been intentionally removed in favor of a JMX(?) config “controlled.shutdown.enable”. It is unclear from the documentation how one is supposed to set this for a running broker. Do I need a special JMX tool in order to flick this switch? I’d like to add a command to my kafka bin wrapper script so that I can easily use this when restarting brokers. What is the proper way to set controlled.shutdown.enable? Thanks! -Andrew Otto
Re: Controlled Shutdown Tool?
You can set controlled.shutdown.enable to true in kafka’s server.properties , this is enabled by default in 0.8.2 on wards and also you can set max retries using controlled.shutdown.max.retries defaults to 3 . Thanks, Harsha On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org) wrote: I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I can tell, this has been intentionally removed in favor of a JMX(?) config “controlled.shutdown.enable”. It is unclear from the documentation how one is supposed to set this for a running broker. Do I need a special JMX tool in order to flick this switch? I’d like to add a command to my kafka bin wrapper script so that I can easily use this when restarting brokers. What is the proper way to set controlled.shutdown.enable? Thanks! -Andrew Otto
Re: Log Deletion Behavior
Hi Jiefu, The topic will stay forever. You can do delete topic operation to get rid of the topic. Thanks, Mayuresh On Mon, Jul 27, 2015 at 11:19 AM, JIEFU GONG jg...@berkeley.edu wrote: Mayuresh, Yes, it seems like I misunderstood the behavior of log deletion but indeed my log segments were deleted after a specified amount of time. I have a small follow-up question, it seems that when the logs are deleted the topic persists and can be republished too -- is there a configuration for how long a topic persists or does it stay forever until it is manually marked for deletion? Also @Grant, thank you very much for your help as well. I ended up using the ms update configuration and understand the broker configs better. Thanks! On Mon, Jul 27, 2015 at 9:27 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Hi Jiefu, Any update on this? Were you able to delete those log segments? Thanks, Mayuresh On Fri, Jul 24, 2015 at 7:14 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: To add on, the main thing here is you should be using only one of these properties. Thanks, Mayuresh On Fri, Jul 24, 2015 at 6:47 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Yes. It should. Do not set other retention settings. Just use the hours settings. Let me know about this :) Thanks, Mayuresh On Fri, Jul 24, 2015 at 6:43 PM, JIEFU GONG jg...@berkeley.edu wrote: Mayuresh, thanks for your comment. I won't be able to change these settings until next Monday, but just so confirm you are saying that if I restart the brokers my logs should delete themselves with respect to the newest settings, correct? ᐧ On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat gharatmayures...@gmail.com wrote: No. This should not happen. At Linkedin we just use the log retention hours. Try using that. Chang e it and bounce the broker. It should work. Also looking back at the config's I am not sure why we had 3 different configs for the same property : log.retention.ms log.retention.minutes log.retention.hours We should probably be having just the milliseconds. Thanks, Mayuresh On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG jg...@berkeley.edu wrote: Hi all, I have a few broad questions on how log deletion works, specifically in conjunction with the log.retention.time setting. Say I published some messages to some topics when the configuration was originally set to something like log.retention.hours=168 (default). If I publish these messages successfully, then later set the configuration to something like log.retention.minutes=1, are those logs supposed to persist for the newest settings or the old settings? Right now my logs are refusing to delete themselves unless I specifically mark them for deletion -- is this the correct/anticipated/wanted behavior? Thanks for the help! -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- -Regards, Mayuresh R. Gharat (862) 250-7125
Controlled Shutdown Tool?
I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I can tell, this has been intentionally removed in favor of a JMX(?) config “controlled.shutdown.enable”. It is unclear from the documentation how one is supposed to set this for a running broker. Do I need a special JMX tool in order to flick this switch? I’d like to add a command to my kafka bin wrapper script so that I can easily use this when restarting brokers. What is the proper way to set controlled.shutdown.enable? Thanks! -Andrew Otto
Re: multiple producer throughput
The message size is 100 bytes and each producer sends out 50million messages. It's the number used by the benchmarking kafka post. http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Thanks. On Mon, Jul 27, 2015 at 4:15 PM, Prabhjot Bharaj prabhbha...@gmail.com wrote: Hi, Have you tried with acks=1 and -1 as well? Please share the numbers and the message size Regards, Prabcs On Jul 27, 2015 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am running 40 producers on 40 nodes cluster. The messages are sent to 6 brokers in another cluster. The producers are running ProducerPerformance test. When 20 nodes are running, the throughput is around 13MB/s and when running 40 nodes, the throughput is around 9MB/s. I have set log.retention.ms=9000 to delete the unwanted messages, just to avoid the disk space to be filled. So I want to know how should I tune the system to get better throughput result? Thanks.
Re: multiple producer throughput
Hi, Have you tried with acks=1 and -1 as well? Please share the numbers and the message size Regards, Prabcs On Jul 27, 2015 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am running 40 producers on 40 nodes cluster. The messages are sent to 6 brokers in another cluster. The producers are running ProducerPerformance test. When 20 nodes are running, the throughput is around 13MB/s and when running 40 nodes, the throughput is around 9MB/s. I have set log.retention.ms=9000 to delete the unwanted messages, just to avoid the disk space to be filled. So I want to know how should I tune the system to get better throughput result? Thanks.
Re: Cache Memory Kafka Process
http://www.linuxatemyram.com may be a helpful resource to explain this better. On Tue, 28 Jul 2015 at 5:32 AM Ewen Cheslack-Postava e...@confluent.io wrote: Having the OS cache the data in Kafka's log files is useful since it means that data doesn't need to be read back from disk when consumed. This is good for the latency and throughput of consumers. Usually this caching works out pretty well, keeping the latest data from your topics in cache and only pulling older data into memory if a consumer reads data from earlier in the log. In other words, by leveraging OS-level caching of files, Kafka gets an in-memory caching layer for free. Generally you shouldn't need to clear this data -- the OS should only be using memory that isn't being used anyway. Is there a particular problem you're encountering that clearing the cache would help with? -Ewen On Mon, Jul 27, 2015 at 2:33 AM, Nilesh Chhapru nilesh.chha...@ugamsolutions.com wrote: Hi All, I am facing issues with kafka broker process taking a lot of cache memory, just wanted to know if the process really need that much of cache memory, or can i clear the OS level cache by setting a cron. Regards, Nilesh Chhapru. -- Thanks, Ewen -- -- Daniel
Re: Choosing brokers when creating topics
Try the --replica-assignment option for kafka-topics.sh. It allows you to specify which brokers to assign as replicas instead of relying on the assignments being made automatically. -Ewen On Mon, Jul 27, 2015 at 12:25 AM, Jilin Xie jilinxie1...@gmail.com wrote: Hi Is it possible to choose which brokers to use when creating a topic? The general command of creating topic is: *bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test* What I'm looking for is: *bin/kafka-topics.sh --create . --broker-to-use xxx;xxx;xxx* *It's because, I want the topic to be hosted on the brokers which would be closest to the possible producer.* * Thanks in advance.* -- Thanks, Ewen
Choosing brokers when creating topics
Hi Is it possible to choose which brokers to use when creating a topic? The general command of creating topic is: *bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test* What I'm looking for is: *bin/kafka-topics.sh --create . --broker-to-use xxx;xxx;xxx* *It's because, I want the topic to be hosted on the brokers which would be closest to the possible producer.* * Thanks in advance.*
Cache Memory Kafka Process
Hi All, I am facing issues with kafka broker process taking a lot of cache memory, just wanted to know if the process really need that much of cache memory, or can i clear the OS level cache by setting a cron. Regards, Nilesh Chhapru.
Re: Choosing brokers when creating topics
Hi Even Thanks for your reply. I've been using the kafka-reassign-partition tool. But --replica-assignment is exactly what I'm looking for. Thanks On Mon, Jul 27, 2015 at 3:58 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Try the --replica-assignment option for kafka-topics.sh. It allows you to specify which brokers to assign as replicas instead of relying on the assignments being made automatically. -Ewen On Mon, Jul 27, 2015 at 12:25 AM, Jilin Xie jilinxie1...@gmail.com wrote: Hi Is it possible to choose which brokers to use when creating a topic? The general command of creating topic is: *bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test* What I'm looking for is: *bin/kafka-topics.sh --create . --broker-to-use xxx;xxx;xxx* *It's because, I want the topic to be hosted on the brokers which would be closest to the possible producer.* * Thanks in advance.* -- Thanks, Ewen