Number of kafka topics/partitions supported per cluster of n nodes

2015-07-27 Thread Prabhjot Bharaj
Hi,

I'm looking forward to a benchmark which can explain how many total number
of topics and partitions can be created in a cluster of n nodes, given the
message size varies between x and y bytes and how does it vary with varying
heap sizes and how it affects the system performance.

e.g. the result should look like: t topics with p partitions each can be
supported in a cluster of n nodes with a heap size of h MB, before the
cluster sees things like JVM crashes or high mem usage or system slowdown
etc.

I think such benchmarks must exist so that we can make better decisions on
ops side
If these details dont exist, I'll be doing this test myself on varying the
values of parameters described above. I would be happy to share the numbers
with the community

Thanks,
prabcs


Best practices - Using kafka (with http server) as source-of-truth

2015-07-27 Thread Prabhjot Bharaj
Hi Folks,

I would like to understand the best practices when using kafka as the
source-of-truth, given the fact that I want to pump in data to Kafka using
http methods.

What are the current production configurations for such a use case:-

1. Kafka-http-client - is it scalable the way Nginx is ??
2. Using Kafka and Nginx together - If anybody has used this, please explain
3. Any other scalable method ?

Regards,
prabcs


Java API for fetching Consumer group from Kafka Server(Not Zookeeper)

2015-07-27 Thread swati.suman2
Hi Jiangjie,


kafka.admin.ConsumerGroupCommand is a scala class. Could you please tell me 
some Java API for fetching consumer group from Kafka server.


Best Regards,

Swati Suman

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com


Re: Log Deletion Behavior

2015-07-27 Thread Mayuresh Gharat
Hi Jiefu,

Any update on this? Were you able to delete those log segments?

Thanks,

Mayuresh

On Fri, Jul 24, 2015 at 7:14 PM, Mayuresh Gharat gharatmayures...@gmail.com
 wrote:

 To add on, the main thing here is you should be using only one of these
 properties.

 Thanks,

 Mayuresh

 On Fri, Jul 24, 2015 at 6:47 PM, Mayuresh Gharat 
 gharatmayures...@gmail.com wrote:

 Yes. It should. Do not set other retention settings. Just use the hours
 settings.
 Let me know about this :)

 Thanks,

 Mayuresh

 On Fri, Jul 24, 2015 at 6:43 PM, JIEFU GONG jg...@berkeley.edu wrote:

 Mayuresh, thanks for your comment. I won't be able to change these
 settings
 until next Monday, but just so confirm you are saying that if I restart
 the
 brokers my logs should delete themselves with respect to the newest
 settings, correct?
 ᐧ

 On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat 
 gharatmayures...@gmail.com
  wrote:

  No. This should not happen. At Linkedin we just use the log retention
  hours. Try using that. Chang e it and bounce the broker. It should
 work.
  Also looking back at the config's I am not sure why we had 3 different
  configs for the same property :
 
  log.retention.ms
  log.retention.minutes
  log.retention.hours
 
  We should probably be having just the milliseconds.
 
  Thanks,
 
  Mayuresh
 
  On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG jg...@berkeley.edu
 wrote:
 
   Hi all,
  
   I have a few broad questions on how log deletion works, specifically
 in
   conjunction with the log.retention.time setting. Say I published some
   messages to some topics when the configuration was originally set to
   something like log.retention.hours=168 (default). If I publish these
   messages successfully, then later set the configuration to something
 like
   log.retention.minutes=1, are those logs supposed to persist for the
  newest
   settings or the old settings? Right now my logs are refusing to
 delete
   themselves unless I specifically mark them for deletion -- is this
 the
   correct/anticipated/wanted behavior?
  
   Thanks for the help!
  
   --
  
   Jiefu Gong
   University of California, Berkeley | Class of 2017
   B.A Computer Science | College of Letters and Sciences
  
   jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
  
 
 
 
  --
  -Regards,
  Mayuresh R. Gharat
  (862) 250-7125
 



 --

 Jiefu Gong
 University of California, Berkeley | Class of 2017
 B.A Computer Science | College of Letters and Sciences

 jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427




 --
 -Regards,
 Mayuresh R. Gharat
 (862) 250-7125




 --
 -Regards,
 Mayuresh R. Gharat
 (862) 250-7125




-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


multiple producer throughput

2015-07-27 Thread Yuheng Du
Hi,

I am running 40 producers on 40 nodes cluster. The messages are sent to 6
brokers in another cluster. The producers are running ProducerPerformance
test.

When 20 nodes are running, the throughput is around 13MB/s and when running
40 nodes, the throughput is around 9MB/s.

I have set log.retention.ms=9000 to delete the unwanted messages, just to
avoid the disk space to be filled.

So I want to know how should I tune the system to get better throughput
result? Thanks.


Re: Best practices - Using kafka (with http server) as source-of-truth

2015-07-27 Thread Ewen Cheslack-Postava
Hi Prabhjot,

Confluent has a REST proxy with docs that may give some guidance:
http://docs.confluent.io/1.0/kafka-rest/docs/intro.html The new producer
that it uses is very efficient, so you should be able to get pretty good
throughput. You take a bit of a hit due to the overhead of sending data
through a proxy, but with appropriate batching you can get about 2/3 the
performance as you would get using the Java producer directly.

There are also a few other proxies you can find here:
https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-HTTPREST

You can also put nginx (or HAProxy, or a variety of other solutions) in
front of REST proxies for load balancing, HA, SSL termination, etc. This is
yet another hop, so it might affect throughput and latency.

-Ewen

On Mon, Jul 27, 2015 at 6:55 AM, Prabhjot Bharaj prabhbha...@gmail.com
wrote:

 Hi Folks,

 I would like to understand the best practices when using kafka as the
 source-of-truth, given the fact that I want to pump in data to Kafka using
 http methods.

 What are the current production configurations for such a use case:-

 1. Kafka-http-client - is it scalable the way Nginx is ??
 2. Using Kafka and Nginx together - If anybody has used this, please
 explain
 3. Any other scalable method ?

 Regards,
 prabcs




-- 
Thanks,
Ewen


Re: Cache Memory Kafka Process

2015-07-27 Thread Ewen Cheslack-Postava
Having the OS cache the data in Kafka's log files is useful since it means
that data doesn't need to be read back from disk when consumed. This is
good for the latency and throughput of consumers. Usually this caching
works out pretty well, keeping the latest data from your topics in cache
and only pulling older data into memory if a consumer reads data from
earlier in the log. In other words, by leveraging OS-level caching of
files, Kafka gets an in-memory caching layer for free.

Generally you shouldn't need to clear this data -- the OS should only be
using memory that isn't being used anyway. Is there a particular problem
you're encountering that clearing the cache would help with?

-Ewen

On Mon, Jul 27, 2015 at 2:33 AM, Nilesh Chhapru 
nilesh.chha...@ugamsolutions.com wrote:

 Hi All,

 I am facing issues with kafka broker process taking  a lot of cache
 memory, just wanted to know if the process really need that much of
 cache memory, or can i clear the OS level cache by setting a cron.

 Regards,
 Nilesh Chhapru.




-- 
Thanks,
Ewen


Re: New consumer - offset one gets in poll is not offset one is supposed to commit

2015-07-27 Thread Jason Gustafson
Hey Stevo,

I agree that it's a little unintuitive that what you are committing is the
next offset that should be read from and not the one that has already been
read. We're probably constrained in that we already have a consumer which
implements this behavior. Would it help if we added a method on
ConsumerRecords to get the next offset (e.g. nextOffset(partition))?

Thanks,
Jason

On Fri, Jul 24, 2015 at 10:11 AM, Stevo Slavić ssla...@gmail.com wrote:

 Hello Apache Kafka community,

 Say there is only one topic with single partition and a single message on
 it.
 Result of calling a poll with new consumer will return ConsumerRecord for
 that message and it will have offset of 0.

 After processing message, current KafkaConsumer implementation expects one
 to commit not offset 0 as processed, but to commit offset 1 - next
 offset/position one would like to consume.

 Does this sound strange to you as well?

 Wondering couldn't this offset+1 handling for next position to read been
 done in one place, in KafkaConsumer implementation or broker or whatever,
 instead of every user of KafkaConsumer having to do it.

 Kind regards,
 Stevo Slavic.



Re: deleting data automatically

2015-07-27 Thread Yuheng Du
Thank you!

On Mon, Jul 27, 2015 at 1:43 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 As I mentioned, adjusting any settings such that files are small enough
 that you don't get the benefits of append-only writes or file
 creation/deletion become a bottleneck might affect performance. It looks
 like the default setting for log.segment.bytes is 1GB, so given fast enough
 cleanup of old logs, you may not need to adjust that setting -- assuming
 you have a reasonable amount of storage, you'll easily fit many dozen log
 files of that size.

 -Ewen

 On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Thank you! what performance impacts will it be if I change
  log.segment.bytes? Thanks.
 
  On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava 
 e...@confluent.io
  wrote:
 
   I think log.cleanup.interval.mins was removed in the first 0.8 release.
  It
   sounds like you're looking at outdated docs. Search for
   log.retention.check.interval.ms here:
   http://kafka.apache.org/documentation.html
  
   As for setting the values too low hurting performance, I'd guess it's
   probably only an issue if you set them extremely small, such that file
   creation and cleanup become a bottleneck.
  
   -Ewen
  
   On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
If I want to get higher throughput, should I increase the
log.segment.bytes?
   
I don't see log.retention.check.interval.ms, but there is
log.cleanup.interval.mins, is that what you mean?
   
If I set log.roll.ms or log.cleanup.interval.mins too small, will it
   hurt
the throughput? Thanks.
   
On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava 
   e...@confluent.io

wrote:
   
 You'll want to set the log retention policy via
 log.retention.{ms,minutes,hours} or log.retention.bytes. If you
 want
really
 aggressive collection (e.g., on the order of seconds, as you
   specified),
 you might also need to adjust log.segment.bytes/log.roll.{ms,hours}
  and
 log.retention.check.interval.ms.

 On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du 
  yuheng.du.h...@gmail.com
 wrote:

  Hi,
 
  I am testing the kafka producer performance. So I created a queue
  and
  writes a large amount of data to that queue.
 
  Is there a way to delete the data automatically after some time,
  say
  whenever the data size reaches 50GB or the retention time exceeds
  10
  seconds, it will be deleted so my disk won't get filled and new
  data
 can't
  be written in?
 
  Thanks.!
 



 --
 Thanks,
 Ewen

   
  
  
  
   --
   Thanks,
   Ewen
  
 



 --
 Thanks,
 Ewen



Re: deleting data automatically

2015-07-27 Thread Yuheng Du
If I want to get higher throughput, should I increase the
log.segment.bytes?

I don't see log.retention.check.interval.ms, but there is
log.cleanup.interval.mins, is that what you mean?

If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt
the throughput? Thanks.

On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 You'll want to set the log retention policy via
 log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really
 aggressive collection (e.g., on the order of seconds, as you specified),
 you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
 log.retention.check.interval.ms.

 On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Hi,
 
  I am testing the kafka producer performance. So I created a queue and
  writes a large amount of data to that queue.
 
  Is there a way to delete the data automatically after some time, say
  whenever the data size reaches 50GB or the retention time exceeds 10
  seconds, it will be deleted so my disk won't get filled and new data
 can't
  be written in?
 
  Thanks.!
 



 --
 Thanks,
 Ewen



Re: deleting data automatically

2015-07-27 Thread Ewen Cheslack-Postava
As I mentioned, adjusting any settings such that files are small enough
that you don't get the benefits of append-only writes or file
creation/deletion become a bottleneck might affect performance. It looks
like the default setting for log.segment.bytes is 1GB, so given fast enough
cleanup of old logs, you may not need to adjust that setting -- assuming
you have a reasonable amount of storage, you'll easily fit many dozen log
files of that size.

-Ewen

On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:

 Thank you! what performance impacts will it be if I change
 log.segment.bytes? Thanks.

 On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io
 wrote:

  I think log.cleanup.interval.mins was removed in the first 0.8 release.
 It
  sounds like you're looking at outdated docs. Search for
  log.retention.check.interval.ms here:
  http://kafka.apache.org/documentation.html
 
  As for setting the values too low hurting performance, I'd guess it's
  probably only an issue if you set them extremely small, such that file
  creation and cleanup become a bottleneck.
 
  -Ewen
 
  On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
   If I want to get higher throughput, should I increase the
   log.segment.bytes?
  
   I don't see log.retention.check.interval.ms, but there is
   log.cleanup.interval.mins, is that what you mean?
  
   If I set log.roll.ms or log.cleanup.interval.mins too small, will it
  hurt
   the throughput? Thanks.
  
   On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava 
  e...@confluent.io
   
   wrote:
  
You'll want to set the log retention policy via
log.retention.{ms,minutes,hours} or log.retention.bytes. If you want
   really
aggressive collection (e.g., on the order of seconds, as you
  specified),
you might also need to adjust log.segment.bytes/log.roll.{ms,hours}
 and
log.retention.check.interval.ms.
   
On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du 
 yuheng.du.h...@gmail.com
wrote:
   
 Hi,

 I am testing the kafka producer performance. So I created a queue
 and
 writes a large amount of data to that queue.

 Is there a way to delete the data automatically after some time,
 say
 whenever the data size reaches 50GB or the retention time exceeds
 10
 seconds, it will be deleted so my disk won't get filled and new
 data
can't
 be written in?

 Thanks.!

   
   
   
--
Thanks,
Ewen
   
  
 
 
 
  --
  Thanks,
  Ewen
 




-- 
Thanks,
Ewen


Re: Log Deletion Behavior

2015-07-27 Thread JIEFU GONG
Mayuresh,

Yes, it seems like I misunderstood the behavior of log deletion but indeed
my log segments were deleted after a specified amount of time. I have a
small follow-up question, it seems that when the logs are deleted the topic
persists and can be republished too -- is there a configuration for how
long a topic persists or does it stay forever until it is manually marked
for deletion?

Also @Grant, thank you very much for your help as well. I ended up using
the ms update configuration and understand the broker configs better.
Thanks!

On Mon, Jul 27, 2015 at 9:27 AM, Mayuresh Gharat gharatmayures...@gmail.com
 wrote:

 Hi Jiefu,

 Any update on this? Were you able to delete those log segments?

 Thanks,

 Mayuresh

 On Fri, Jul 24, 2015 at 7:14 PM, Mayuresh Gharat 
 gharatmayures...@gmail.com
  wrote:

  To add on, the main thing here is you should be using only one of these
  properties.
 
  Thanks,
 
  Mayuresh
 
  On Fri, Jul 24, 2015 at 6:47 PM, Mayuresh Gharat 
  gharatmayures...@gmail.com wrote:
 
  Yes. It should. Do not set other retention settings. Just use the
 hours
  settings.
  Let me know about this :)
 
  Thanks,
 
  Mayuresh
 
  On Fri, Jul 24, 2015 at 6:43 PM, JIEFU GONG jg...@berkeley.edu wrote:
 
  Mayuresh, thanks for your comment. I won't be able to change these
  settings
  until next Monday, but just so confirm you are saying that if I restart
  the
  brokers my logs should delete themselves with respect to the newest
  settings, correct?
  ᐧ
 
  On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat 
  gharatmayures...@gmail.com
   wrote:
 
   No. This should not happen. At Linkedin we just use the log retention
   hours. Try using that. Chang e it and bounce the broker. It should
  work.
   Also looking back at the config's I am not sure why we had 3
 different
   configs for the same property :
  
   log.retention.ms
   log.retention.minutes
   log.retention.hours
  
   We should probably be having just the milliseconds.
  
   Thanks,
  
   Mayuresh
  
   On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG jg...@berkeley.edu
  wrote:
  
Hi all,
   
I have a few broad questions on how log deletion works,
 specifically
  in
conjunction with the log.retention.time setting. Say I published
 some
messages to some topics when the configuration was originally set
 to
something like log.retention.hours=168 (default). If I publish
 these
messages successfully, then later set the configuration to
 something
  like
log.retention.minutes=1, are those logs supposed to persist for the
   newest
settings or the old settings? Right now my logs are refusing to
  delete
themselves unless I specifically mark them for deletion -- is this
  the
correct/anticipated/wanted behavior?
   
Thanks for the help!
   
--
   
Jiefu Gong
University of California, Berkeley | Class of 2017
B.A Computer Science | College of Letters and Sciences
   
jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
   
  
  
  
   --
   -Regards,
   Mayuresh R. Gharat
   (862) 250-7125
  
 
 
 
  --
 
  Jiefu Gong
  University of California, Berkeley | Class of 2017
  B.A Computer Science | College of Letters and Sciences
 
  jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
 
 
 
 
  --
  -Regards,
  Mayuresh R. Gharat
  (862) 250-7125
 
 
 
 
  --
  -Regards,
  Mayuresh R. Gharat
  (862) 250-7125
 



 --
 -Regards,
 Mayuresh R. Gharat
 (862) 250-7125




-- 

Jiefu Gong
University of California, Berkeley | Class of 2017
B.A Computer Science | College of Letters and Sciences

jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427


Re: deleting data automatically

2015-07-27 Thread Yuheng Du
Thank you! what performance impacts will it be if I change
log.segment.bytes? Thanks.

On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 I think log.cleanup.interval.mins was removed in the first 0.8 release. It
 sounds like you're looking at outdated docs. Search for
 log.retention.check.interval.ms here:
 http://kafka.apache.org/documentation.html

 As for setting the values too low hurting performance, I'd guess it's
 probably only an issue if you set them extremely small, such that file
 creation and cleanup become a bottleneck.

 -Ewen

 On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  If I want to get higher throughput, should I increase the
  log.segment.bytes?
 
  I don't see log.retention.check.interval.ms, but there is
  log.cleanup.interval.mins, is that what you mean?
 
  If I set log.roll.ms or log.cleanup.interval.mins too small, will it
 hurt
  the throughput? Thanks.
 
  On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava 
 e...@confluent.io
  
  wrote:
 
   You'll want to set the log retention policy via
   log.retention.{ms,minutes,hours} or log.retention.bytes. If you want
  really
   aggressive collection (e.g., on the order of seconds, as you
 specified),
   you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
   log.retention.check.interval.ms.
  
   On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
Hi,
   
I am testing the kafka producer performance. So I created a queue and
writes a large amount of data to that queue.
   
Is there a way to delete the data automatically after some time, say
whenever the data size reaches 50GB or the retention time exceeds 10
seconds, it will be deleted so my disk won't get filled and new data
   can't
be written in?
   
Thanks.!
   
  
  
  
   --
   Thanks,
   Ewen
  
 



 --
 Thanks,
 Ewen



Re: deleting data automatically

2015-07-27 Thread Ewen Cheslack-Postava
I think log.cleanup.interval.mins was removed in the first 0.8 release. It
sounds like you're looking at outdated docs. Search for
log.retention.check.interval.ms here:
http://kafka.apache.org/documentation.html

As for setting the values too low hurting performance, I'd guess it's
probably only an issue if you set them extremely small, such that file
creation and cleanup become a bottleneck.

-Ewen

On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:

 If I want to get higher throughput, should I increase the
 log.segment.bytes?

 I don't see log.retention.check.interval.ms, but there is
 log.cleanup.interval.mins, is that what you mean?

 If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt
 the throughput? Thanks.

 On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io
 
 wrote:

  You'll want to set the log retention policy via
  log.retention.{ms,minutes,hours} or log.retention.bytes. If you want
 really
  aggressive collection (e.g., on the order of seconds, as you specified),
  you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
  log.retention.check.interval.ms.
 
  On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
   Hi,
  
   I am testing the kafka producer performance. So I created a queue and
   writes a large amount of data to that queue.
  
   Is there a way to delete the data automatically after some time, say
   whenever the data size reaches 50GB or the retention time exceeds 10
   seconds, it will be deleted so my disk won't get filled and new data
  can't
   be written in?
  
   Thanks.!
  
 
 
 
  --
  Thanks,
  Ewen
 




-- 
Thanks,
Ewen


Re: Controlled Shutdown Tool?

2015-07-27 Thread Binh Nguyen Van
You can initiate controlled shutdown by run bin/kafka-server-stop.sh. This
will send a SIGTERM to broker to tell it to do the controlled shutdown.
I also got confused before and had to look at the code to figure that out.
I think it is better if we can add this to the document.

-Binh

On Mon, Jul 27, 2015 at 11:50 AM, Andrew Otto ao...@wikimedia.org wrote:

 Thanks!

 But how do I initiate a controlled shutdown on a running broker?  Editing
 server.properties is not going to cause this to happen.  Don’t I have to
 tell the broker to shutdown nicely?  All I really want to do is tell the
 controller to move leadership to other replicas, so I can shutdown the
 broker without clients getting all confused.


  On Jul 27, 2015, at 14:48, Sriharsha Chintalapani ka...@harsha.io
 wrote:
 
  You can set controlled.shutdown.enable to true in kafka’s
 server.properties  , this is enabled by default in 0.8.2 on wards
  and also you can set max retries using controlled.shutdown.max.retries
 defaults to 3 .
 
 
  Thanks,
  Harsha
 
 
  On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org
 mailto:ao...@wikimedia.org) wrote:
 
  I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve
 noticed that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I
 can tell, this has been intentionally removed in favor of a JMX(?) config
 “controlled.shutdown.enable”. It is unclear from the documentation how one
 is supposed to set this for a running broker. Do I need a special JMX tool
 in order to flick this switch? I’d like to add a command to my kafka bin
 wrapper script so that I can easily use this when restarting brokers.
 
  What is the proper way to set controlled.shutdown.enable?
 
  Thanks!
  -Andrew Otto
 
 




Re: Controlled Shutdown Tool?

2015-07-27 Thread Andrew Otto
Ah, thank you, SIGTERM is what I was looking for.  The docs are unclear on 
that, it would be useful to fix those.  Thanks!


 On Jul 27, 2015, at 14:59, Binh Nguyen Van binhn...@gmail.com wrote:
 
 You can initiate controlled shutdown by run bin/kafka-server-stop.sh. This
 will send a SIGTERM to broker to tell it to do the controlled shutdown.
 I also got confused before and had to look at the code to figure that out.
 I think it is better if we can add this to the document.
 
 -Binh
 
 On Mon, Jul 27, 2015 at 11:50 AM, Andrew Otto ao...@wikimedia.org wrote:
 
 Thanks!
 
 But how do I initiate a controlled shutdown on a running broker?  Editing
 server.properties is not going to cause this to happen.  Don’t I have to
 tell the broker to shutdown nicely?  All I really want to do is tell the
 controller to move leadership to other replicas, so I can shutdown the
 broker without clients getting all confused.
 
 
 On Jul 27, 2015, at 14:48, Sriharsha Chintalapani ka...@harsha.io
 wrote:
 
 You can set controlled.shutdown.enable to true in kafka’s
 server.properties  , this is enabled by default in 0.8.2 on wards
 and also you can set max retries using controlled.shutdown.max.retries
 defaults to 3 .
 
 
 Thanks,
 Harsha
 
 
 On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org
 mailto:ao...@wikimedia.org) wrote:
 
 I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve
 noticed that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I
 can tell, this has been intentionally removed in favor of a JMX(?) config
 “controlled.shutdown.enable”. It is unclear from the documentation how one
 is supposed to set this for a running broker. Do I need a special JMX tool
 in order to flick this switch? I’d like to add a command to my kafka bin
 wrapper script so that I can easily use this when restarting brokers.
 
 What is the proper way to set controlled.shutdown.enable?
 
 Thanks!
 -Andrew Otto
 
 
 
 



Re: Controlled Shutdown Tool?

2015-07-27 Thread Andrew Otto
Thanks!

But how do I initiate a controlled shutdown on a running broker?  Editing 
server.properties is not going to cause this to happen.  Don’t I have to tell 
the broker to shutdown nicely?  All I really want to do is tell the controller 
to move leadership to other replicas, so I can shutdown the broker without 
clients getting all confused.


 On Jul 27, 2015, at 14:48, Sriharsha Chintalapani ka...@harsha.io wrote:
 
 You can set controlled.shutdown.enable to true in kafka’s server.properties  
 , this is enabled by default in 0.8.2 on wards
 and also you can set max retries using controlled.shutdown.max.retries 
 defaults to 3 .
 
 
 Thanks,
 Harsha
 
 
 On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org 
 mailto:ao...@wikimedia.org) wrote:
 
 I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed 
 that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I can tell, 
 this has been intentionally removed in favor of a JMX(?) config 
 “controlled.shutdown.enable”. It is unclear from the documentation how one 
 is supposed to set this for a running broker. Do I need a special JMX tool 
 in order to flick this switch? I’d like to add a command to my kafka bin 
 wrapper script so that I can easily use this when restarting brokers. 
 
 What is the proper way to set controlled.shutdown.enable? 
 
 Thanks! 
 -Andrew Otto 
 
 



Re: Controlled Shutdown Tool?

2015-07-27 Thread Sriharsha Chintalapani
controlled.shutdown built into broker when this config set to true it makes 
request to controller to initiate the controlled shutdown, waits till the 
request is succeeded and incase of failure retries the shutdown  
controlled.shutdown.max.retries times.

https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/server/KafkaServer.scala#L175

-- 
Harsha


On July 27, 2015 at 11:50:27 AM, Andrew Otto (ao...@wikimedia.org) wrote:

Thanks!

But how do I initiate a controlled shutdown on a running broker?  Editing 
server.properties is not going to cause this to happen.  Don’t I have to tell 
the broker to shutdown nicely?  All I really want to do is tell the controller 
to move leadership to other replicas, so I can shutdown the broker without 
clients getting all confused.


On Jul 27, 2015, at 14:48, Sriharsha Chintalapani ka...@harsha.io wrote:

You can set controlled.shutdown.enable to true in kafka’s server.properties  , 
this is enabled by default in 0.8.2 on wards
and also you can set max retries using controlled.shutdown.max.retries defaults 
to 3 .


Thanks,
Harsha


On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org) wrote:

I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed 
that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I can tell, 
this has been intentionally removed in favor of a JMX(?) config 
“controlled.shutdown.enable”. It is unclear from the documentation how one is 
supposed to set this for a running broker. Do I need a special JMX tool in 
order to flick this switch? I’d like to add a command to my kafka bin wrapper 
script so that I can easily use this when restarting brokers.

What is the proper way to set controlled.shutdown.enable?

Thanks!
-Andrew Otto





Re: Controlled Shutdown Tool?

2015-07-27 Thread Sriharsha Chintalapani
You can set controlled.shutdown.enable to true in kafka’s server.properties  , 
this is enabled by default in 0.8.2 on wards
and also you can set max retries using controlled.shutdown.max.retries defaults 
to 3 .


Thanks,
Harsha


On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org) wrote:

I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed 
that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I can tell, 
this has been intentionally removed in favor of a JMX(?) config 
“controlled.shutdown.enable”. It is unclear from the documentation how one is 
supposed to set this for a running broker. Do I need a special JMX tool in 
order to flick this switch? I’d like to add a command to my kafka bin wrapper 
script so that I can easily use this when restarting brokers.  

What is the proper way to set controlled.shutdown.enable?  

Thanks!  
-Andrew Otto  




Re: Log Deletion Behavior

2015-07-27 Thread Mayuresh Gharat
Hi Jiefu,

The topic will stay forever. You can do delete topic operation to get rid
of the topic.

Thanks,

Mayuresh

On Mon, Jul 27, 2015 at 11:19 AM, JIEFU GONG jg...@berkeley.edu wrote:

 Mayuresh,

 Yes, it seems like I misunderstood the behavior of log deletion but indeed
 my log segments were deleted after a specified amount of time. I have a
 small follow-up question, it seems that when the logs are deleted the topic
 persists and can be republished too -- is there a configuration for how
 long a topic persists or does it stay forever until it is manually marked
 for deletion?

 Also @Grant, thank you very much for your help as well. I ended up using
 the ms update configuration and understand the broker configs better.
 Thanks!

 On Mon, Jul 27, 2015 at 9:27 AM, Mayuresh Gharat 
 gharatmayures...@gmail.com
  wrote:

  Hi Jiefu,
 
  Any update on this? Were you able to delete those log segments?
 
  Thanks,
 
  Mayuresh
 
  On Fri, Jul 24, 2015 at 7:14 PM, Mayuresh Gharat 
  gharatmayures...@gmail.com
   wrote:
 
   To add on, the main thing here is you should be using only one of these
   properties.
  
   Thanks,
  
   Mayuresh
  
   On Fri, Jul 24, 2015 at 6:47 PM, Mayuresh Gharat 
   gharatmayures...@gmail.com wrote:
  
   Yes. It should. Do not set other retention settings. Just use the
  hours
   settings.
   Let me know about this :)
  
   Thanks,
  
   Mayuresh
  
   On Fri, Jul 24, 2015 at 6:43 PM, JIEFU GONG jg...@berkeley.edu
 wrote:
  
   Mayuresh, thanks for your comment. I won't be able to change these
   settings
   until next Monday, but just so confirm you are saying that if I
 restart
   the
   brokers my logs should delete themselves with respect to the newest
   settings, correct?
   ᐧ
  
   On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat 
   gharatmayures...@gmail.com
wrote:
  
No. This should not happen. At Linkedin we just use the log
 retention
hours. Try using that. Chang e it and bounce the broker. It should
   work.
Also looking back at the config's I am not sure why we had 3
  different
configs for the same property :
   
log.retention.ms
log.retention.minutes
log.retention.hours
   
We should probably be having just the milliseconds.
   
Thanks,
   
Mayuresh
   
On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG jg...@berkeley.edu
   wrote:
   
 Hi all,

 I have a few broad questions on how log deletion works,
  specifically
   in
 conjunction with the log.retention.time setting. Say I published
  some
 messages to some topics when the configuration was originally set
  to
 something like log.retention.hours=168 (default). If I publish
  these
 messages successfully, then later set the configuration to
  something
   like
 log.retention.minutes=1, are those logs supposed to persist for
 the
newest
 settings or the old settings? Right now my logs are refusing to
   delete
 themselves unless I specifically mark them for deletion -- is
 this
   the
 correct/anticipated/wanted behavior?

 Thanks for the help!

 --

 Jiefu Gong
 University of California, Berkeley | Class of 2017
 B.A Computer Science | College of Letters and Sciences

 jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427

   
   
   
--
-Regards,
Mayuresh R. Gharat
(862) 250-7125
   
  
  
  
   --
  
   Jiefu Gong
   University of California, Berkeley | Class of 2017
   B.A Computer Science | College of Letters and Sciences
  
   jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
  
  
  
  
   --
   -Regards,
   Mayuresh R. Gharat
   (862) 250-7125
  
  
  
  
   --
   -Regards,
   Mayuresh R. Gharat
   (862) 250-7125
  
 
 
 
  --
  -Regards,
  Mayuresh R. Gharat
  (862) 250-7125
 



 --

 Jiefu Gong
 University of California, Berkeley | Class of 2017
 B.A Computer Science | College of Letters and Sciences

 jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427




-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Controlled Shutdown Tool?

2015-07-27 Thread Andrew Otto
I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed 
that kafka.admin.ShutdownBroker doesn’t exist anymore.  From what I can tell, 
this has been intentionally removed in favor of a JMX(?) config 
“controlled.shutdown.enable”.  It is unclear from the documentation how one is 
supposed to set this for a running broker.  Do I need a special JMX tool in 
order to flick this switch?  I’d like to add a command to my kafka bin wrapper 
script so that I can easily use this when restarting brokers.

What is the proper way to set controlled.shutdown.enable?

Thanks!
-Andrew Otto




Re: multiple producer throughput

2015-07-27 Thread Yuheng Du
The message size is 100 bytes and each producer sends out 50million
messages. It's the number used by the benchmarking kafka post.
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

Thanks.

On Mon, Jul 27, 2015 at 4:15 PM, Prabhjot Bharaj prabhbha...@gmail.com
wrote:

 Hi,

 Have you tried with acks=1 and -1 as well?
 Please share the numbers and the message size

 Regards,
 Prabcs
 On Jul 27, 2015 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:

  Hi,
 
  I am running 40 producers on 40 nodes cluster. The messages are sent to 6
  brokers in another cluster. The producers are running ProducerPerformance
  test.
 
  When 20 nodes are running, the throughput is around 13MB/s and when
 running
  40 nodes, the throughput is around 9MB/s.
 
  I have set log.retention.ms=9000 to delete the unwanted messages, just
 to
  avoid the disk space to be filled.
 
  So I want to know how should I tune the system to get better throughput
  result? Thanks.
 



Re: multiple producer throughput

2015-07-27 Thread Prabhjot Bharaj
Hi,

Have you tried with acks=1 and -1 as well?
Please share the numbers and the message size

Regards,
Prabcs
On Jul 27, 2015 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:

 Hi,

 I am running 40 producers on 40 nodes cluster. The messages are sent to 6
 brokers in another cluster. The producers are running ProducerPerformance
 test.

 When 20 nodes are running, the throughput is around 13MB/s and when running
 40 nodes, the throughput is around 9MB/s.

 I have set log.retention.ms=9000 to delete the unwanted messages, just to
 avoid the disk space to be filled.

 So I want to know how should I tune the system to get better throughput
 result? Thanks.



Re: Cache Memory Kafka Process

2015-07-27 Thread Daniel Compton
http://www.linuxatemyram.com may be a helpful resource to explain this
better.
On Tue, 28 Jul 2015 at 5:32 AM Ewen Cheslack-Postava e...@confluent.io
wrote:

 Having the OS cache the data in Kafka's log files is useful since it means
 that data doesn't need to be read back from disk when consumed. This is
 good for the latency and throughput of consumers. Usually this caching
 works out pretty well, keeping the latest data from your topics in cache
 and only pulling older data into memory if a consumer reads data from
 earlier in the log. In other words, by leveraging OS-level caching of
 files, Kafka gets an in-memory caching layer for free.

 Generally you shouldn't need to clear this data -- the OS should only be
 using memory that isn't being used anyway. Is there a particular problem
 you're encountering that clearing the cache would help with?

 -Ewen

 On Mon, Jul 27, 2015 at 2:33 AM, Nilesh Chhapru 
 nilesh.chha...@ugamsolutions.com wrote:

  Hi All,
 
  I am facing issues with kafka broker process taking  a lot of cache
  memory, just wanted to know if the process really need that much of
  cache memory, or can i clear the OS level cache by setting a cron.
 
  Regards,
  Nilesh Chhapru.
 



 --
 Thanks,
 Ewen

-- 
--
Daniel


Re: Choosing brokers when creating topics

2015-07-27 Thread Ewen Cheslack-Postava
Try the --replica-assignment option for kafka-topics.sh. It allows you to
specify which brokers to assign as replicas instead of relying on the
assignments being made automatically.

-Ewen

On Mon, Jul 27, 2015 at 12:25 AM, Jilin Xie jilinxie1...@gmail.com wrote:

 Hi
   Is it possible to choose which brokers to use when creating a topic?
 The general command of creating topic is:

 *bin/kafka-topics.sh --create --zookeeper localhost:2181
 --replication-factor 1 --partitions 1 --topic test*

 What I'm looking for is:

 *bin/kafka-topics.sh --create .  --broker-to-use xxx;xxx;xxx*


 *It's because, I want the topic to be hosted on the brokers which
 would be closest to the possible producer.*


 *   Thanks in advance.*




-- 
Thanks,
Ewen


Choosing brokers when creating topics

2015-07-27 Thread Jilin Xie
Hi
  Is it possible to choose which brokers to use when creating a topic?
The general command of creating topic is:

*bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic test*

What I'm looking for is:

*bin/kafka-topics.sh --create .  --broker-to-use xxx;xxx;xxx*


*It's because, I want the topic to be hosted on the brokers which
would be closest to the possible producer.*


*   Thanks in advance.*


Cache Memory Kafka Process

2015-07-27 Thread Nilesh Chhapru
Hi All,

I am facing issues with kafka broker process taking  a lot of cache
memory, just wanted to know if the process really need that much of
cache memory, or can i clear the OS level cache by setting a cron.

Regards,
Nilesh Chhapru.


Re: Choosing brokers when creating topics

2015-07-27 Thread Jilin Xie
Hi Even
   Thanks for your reply.
I've been using  the kafka-reassign-partition tool.
   But --replica-assignment is exactly what I'm looking for.

   Thanks

On Mon, Jul 27, 2015 at 3:58 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 Try the --replica-assignment option for kafka-topics.sh. It allows you to
 specify which brokers to assign as replicas instead of relying on the
 assignments being made automatically.

 -Ewen

 On Mon, Jul 27, 2015 at 12:25 AM, Jilin Xie jilinxie1...@gmail.com
 wrote:

  Hi
Is it possible to choose which brokers to use when creating a
 topic?
  The general command of creating topic is:
 
  *bin/kafka-topics.sh --create --zookeeper localhost:2181
  --replication-factor 1 --partitions 1 --topic test*
 
  What I'm looking for is:
 
  *bin/kafka-topics.sh --create .  --broker-to-use xxx;xxx;xxx*
 
 
  *It's because, I want the topic to be hosted on the brokers which
  would be closest to the possible producer.*
 
 
  *   Thanks in advance.*
 



 --
 Thanks,
 Ewen