How to prevent custom Partitioner from increasing the number of producer's requests?

2015-06-02 Thread Sebastien Falquier
Hi guys, I am new to Kafka and I am facing a problem I am not able to sort out. To smooth traffic over all my brokers' partitions, I have coded a custom Paritioner for my producers, using a simple round robin algorithm that jumps from a partition to another on every batch of messages

Re: Kafka partitions unbalanced

2015-06-02 Thread Vijay Patil
I ran into similar issue. I configured 3 disks, but partitions were allocated only to 2 disks (disk2 and disk3). Then I found that the left out disk (disk1) was already hosting lot number of other partitions from different topics. So may be partition allocation happens based on how many partitions

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Daniel Nelson
On Jun 2, 2015, at 10:39 AM, Wes Chow w...@chartbeat.com wrote: We have run d2 instances with Kafka. They're currently unstable -- Amazon confirmed a host issue with d2 instances that gets tickled by a Kafka workload yesterday. Otherwise, it seems the d2 instance type is ideal as it

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Wes Chow
Our workaround is to switch to i2's. Amazon didn't mention anything, though we're getting on a call with them soon so I'll be sure to ask. Fwiw, we're also on 12.04. Wes Daniel Nelson mailto:daniel.nel...@vungle.com June 2, 2015 at 2:42 PM Do you have any workarounds for the d2 issues?

Consumer lag lies - orphaned offsets?

2015-06-02 Thread Otis Gospodnetic
Hi, I've noticed that when we restart our Kafka consumers our consumer lag metric sometimes looks weird. Here's an example: https://apps.sematext.com/spm-reports/s/0Hq5zNb4hH You can see lag go up around 15:00, when some consumers were restarted. The weird thing is that the lag remains flat!

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Steven Wu
Wes/Daniel, can you elaborate what kind of instability you have encountered? we are on Ubuntu 14.04.2 and haven't encountered any issues so far. in the announcement, they did mention using Ubuntu 14.04 for better disk throughput. not sure whether 14.04 also addresses any instability issue you

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Daniel Nelson
On Jun 2, 2015, at 1:22 PM, Steven Wu stevenz...@gmail.com wrote: can you elaborate what kind of instability you have encountered? We have seen the nodes become completely non-responsive. Usually they get rebooted automatically after 10-20 minutes, but occasionally they get stuck for days in

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Henry Cai
Steven, Do you have the AWS case # (or the Ubuntu bug/case #) when you hit that kernel panic issue? Our company will still be running on AMI image 12.04 for a while, I will see whether the fix was also ported onto Ubuntu 12.04 On Tue, Jun 2, 2015 at 2:53 PM, Steven Wu stevenz...@gmail.com

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Wes Chow
Daniel Nelson mailto:daniel.nel...@vungle.com June 2, 2015 at 4:39 PM On Jun 2, 2015, at 1:22 PM, Steven Wustevenz...@gmail.com wrote: can you elaborate what kind of instability you have encountered? We have seen the nodes become completely non-responsive. Usually they get rebooted

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Steven Wu
now I remember we had same kernel panic issue in the first week of D2 rolling-out. then AWS fixed it and we haven't seen any issue since. try Ubuntu 14.04 and see if it resolves your remaining kernel/instability issue. On Tue, Jun 2, 2015 at 2:30 PM, Wes Chow w...@chartbeat.com wrote: Daniel

Re: Using SimpleConsumer to get messages from offset until now

2015-06-02 Thread luo.fucong
I think the SimpleConsumer Example(https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example) in the wiki is a very good starting point. You can pass in the offset to the FetchRequest. And you

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Theo Hultberg
Henry: We run Kafka on the old and trusty m1.xlarge. We avoid EBS completely, it's network storage that pretends to be local and when the network, which is AWS' weak spot, acts up EBS is a big liability. It's also slow and expensive. Others: Thanks for sharing your experience with the d2's. We

RE: leader update partitions fail with KeeperErrorCode = BadVersion,kafka version=0.8.1.1

2015-06-02 Thread chenlax
i create a topic with 72 partitions 2 replicas,then increased to 108,and the cluster is run ok. some days later i find the topic has 2 partition which ISR only include leader,check follow log-segment with partitions,the log-segment does not later. and i can not find more useful logs from kafka

Re: potential bug with offset request and just rolled log segment

2015-06-02 Thread Alfred Landrum
I filed KAFKA-2236: https://issues.apache.org/jira/browse/KAFKA-2236 Is there any guidance on when 0.8.3 might be released?

Re: How to prevent custom Partitioner from increasing the number of producer's requests?

2015-06-02 Thread Jason Rosenberg
Hi Sebastien, You might just try using the default partitioner (which is random). It works by choosing a random partition each time it re-polls the meta-data for the topic. By default, this happens every 10 minutes for each topic you produce to (so it evenly distributes load at a granularity of

HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Henry Cai
We have been hosting kafka brokers in Amazon EC2 and we are using EBS disk. But periodically we were hit by long I/O wait time on EBS in some Availability Zones. We are thinking to change the instance types to a local HDD or local SSD. HDD is cheaper and bigger and seems quite fit for the Kafka

Using SimpleConsumer to get messages from offset until now

2015-06-02 Thread Kevin Sjöberg
Hello, I'm trying to create a custom consumer that given a offset returns all messages until now. After this is done, the consumer is not needed anymore, hence, the consumer does not have to continue consuming messages that are being produced. The Kafka cluster exists of one broker and we only

RE: Kafka JMS metrics meaning

2015-06-02 Thread Aditya Auradkar
Number of underreplicated partitions, total request time are some good bets. Aditya From: Otis Gospodnetic [otis.gospodne...@gmail.com] Sent: Tuesday, June 02, 2015 9:56 AM To: users@kafka.apache.org; Marina Subject: Re: Kafka JMS metrics meaning Hi, On

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Steven Wu
EBS (network attached storage) has got a lot better over the last a few years. we don't quite trust it for kafka workload. At Netflix, we were going with the new d2 instance type (HDD). our perf/load testing shows it satisfy our workload. SSD is better in latency curve but pretty comparable in

Kafka JMS metrics meaning

2015-06-02 Thread Marina
Hi, I have enabled JMX_PORT for KAfka server and am trying to understand some of the metrics that are being exposed. I have two questions: 1. what are the best metrics to monitor to quickly spot unhealthy Kafka cluster? 2. what do these metrics mean: ReplicaManager - LeaderCount ? and

Re: Kafka JMS metrics meaning

2015-06-02 Thread Otis Gospodnetic
Hi, On Tue, Jun 2, 2015 at 12:50 PM, Marina ppi...@yahoo.com.invalid wrote: Hi, I have enabled JMX_PORT for KAfka server and am trying to understand some of the metrics that are being exposed. I have two questions: 1. what are the best metrics to monitor to quickly spot unhealthy Kafka

Re: Offset management: client vs broker side responsibility

2015-06-02 Thread Otis Gospodnetic
Hi, I haven't followed the changes to offset tracking closely, other than that storing them in ZK is not the only option any more. I think what Stevo is asking about/suggesting is that there there be a single API from which offset information can be retrieved (e.g. by monitoring tools), so that

Re: Kafka JMS metrics meaning

2015-06-02 Thread Todd Palino
Under replicated is a must. Offline partitions is also good to monitor. We also use the active controller metric (it's 1 or 0) in aggregate for a cluster to know that the controller is running somewhere. For more general metrics, all topics bytes in and bytes out is good. We also watch the

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Wes Chow
We have run d2 instances with Kafka. They're currently unstable -- Amazon confirmed a host issue with d2 instances that gets tickled by a Kafka workload yesterday. Otherwise, it seems the d2 instance type is ideal as it gets an enormous amount of disk throughput and you'll likely be network

Re: Kafka JMS metrics meaning

2015-06-02 Thread Marina
Thanks a lot to everybody for your suggestions!  In addition to the Consumer lag (on the Consumers side though), under-replicated partitions, offline partitions, active controller count, I am also thinking of monitoring the total size of partitions to not exceed some MAX (like 10G, for example)