Re: Broker brought down and under replicated partitions

2014-10-16 Thread Jean-Pascal Billaud
The only thing that I find very weird is the fact that brokers that are dead are still part of the ISR set for hours... and are basically not removed. Note this is not constantly the case, most of the dead brokers are properly removed and it is really just in a few cases. I am not sure why this

Re: Kafka - NotLeaderForPartitionException / LeaderNotAvailableException

2014-10-16 Thread Abraham Jacob
Will definitely take a thread dump! So, far its been running fine. -Jacob On Wed, Oct 15, 2014 at 8:40 PM, Jun Rao jun...@gmail.com wrote: If you see the hanging again, it would be great if you can take a thread dump so that we know where it is hanging. Thanks, Jun On Tue, Oct 14, 2014

getOffsetsBefore(...) = kafka.common.UnknownException

2014-10-16 Thread Magnus Vojbacke
Hi, I’m trying to make a request for offset information from my broker, and I get a kafka.common.UnknownException as the result. I’m trying to use the Simple Consumer API val topicAndPartition = new TopicAndPartition(“topic3”, 0) val requestInfo = new

Re: Consistency and Availability on Node Failures

2014-10-16 Thread Gwen Shapira
Just note that this is not a universal solution. Many use-cases care about which partition you end up writing to since partitions are used to... well, partition logical entities such as customers and users. On Wed, Oct 15, 2014 at 9:03 PM, Jun Rao jun...@gmail.com wrote: Kyle, What you

Re: Cross-Data-Center Mirroring, and Guaranteed Minimum Time Period on Data

2014-10-16 Thread Gwen Shapira
I assume the messages themselves contain the timestamp? If you use Flume, you can configure a Kafka source to pull data from Kafka, use an interceptor to pull the date out of your message and place it in the event header and then the HDFS sink can write to a partition based on the timestamp.

Re: Cross-Data-Center Mirroring, and Guaranteed Minimum Time Period on Data

2014-10-16 Thread Andrew Otto
Check out Camus. It was built to do parallel loads from Kafka into time bucketed directories in HDFS. On Oct 16, 2014, at 9:32 AM, Gwen Shapira gshap...@cloudera.com wrote: I assume the messages themselves contain the timestamp? If you use Flume, you can configure a Kafka source to pull

Re: Consistency and Availability on Node Failures

2014-10-16 Thread Kyle Banker
I didn't realize that anyone used partitions to logically divide a topic. When would that be preferable to simply having a separate topic? Isn't this a minority case? On Thu, Oct 16, 2014 at 7:28 AM, Gwen Shapira gshap...@cloudera.com wrote: Just note that this is not a universal solution.

Re: 0.8.x = 0.8.2 upgrade - live seamless?

2014-10-16 Thread Neha Narkhede
Yes, you should be able to upgrade seamlessly. On Wed, Oct 15, 2014 at 10:07 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Some of our SPM users who are eager to monitor their Kafka 0.8.x clusters with SPM are asking us whether the upgrade to 0.8.2 from 0.8.1 will be seamless.

Re: Kafka/Zookeeper deployment Questions

2014-10-16 Thread Neha Narkhede
In other words, if I change the number of partitions, can I restart the brokers one at a time so that I can continue processing data? Changing the # of partitions is an online operation and doesn't require restarting the brokers. However, any other configuration (with the exception of a few

Re: Broker brought down and under replicated partitions

2014-10-16 Thread Neha Narkhede
Is there a known issue in the 0.8.0 version that was fixed later on? What can I do to diagnose/fix the situation? Yes, quite a few bugs related to this have been fixed since 0.8.0. I'd suggest upgrading to 0.8.1.1 On Wed, Oct 15, 2014 at 11:09 PM, Jean-Pascal Billaud j...@tellapart.com wrote:

[Kafka-users] Producer not distributing across all partitions

2014-10-16 Thread Mungeol Heo
Hi, I have a question about 'topic.metadata.refresh.interval.ms' configuration. As I know, the default value of it is 10 minutes. Does it means that producer will change the partition at every 10 minutes? What I am experiencing is producer does not change to another partition at every 10 minutes.

Monitoring connection with kafka client

2014-10-16 Thread Alex Objelean
Hi, I'm trying to monitor the kafka connection on the consumer side. In other words, if the broker cluster is unavailable (or zookeer dies), I would like to know about that problem as soon as possible. Unfortunately, I didn't find anything useful to achieve that when using kafka library. Are

read N items from topic

2014-10-16 Thread Josh J
hi, How do I read N items from a topic? I also would like to do this for a consumer group, so that each consumer can specify an N number of tuples to read, and each consumer reads distinct tuples. Thanks, Josh

Re: java api code and javadoc

2014-10-16 Thread 4mayank
Thanks Joseph. I built the javadoc but its incomplete. Where can I find the code itself for classes like KafkaStream, MessageAndOffset, CosumerConnector etc? On Wed, Oct 15, 2014 at 11:10 AM, Joseph Lawson jlaw...@roomkey.com wrote: You probably have to build your own right now. Check out

Re: Consistency and Availability on Node Failures

2014-10-16 Thread gshapira
It may be a minority, I can't tell yet. But in some apps we need to know that a consumer, who is assigned a single partition, will get all data about a subset of users. This is way more flexible than multiple topics since we still have the benefits of partition reassignment, load balancing

Re: read N items from topic

2014-10-16 Thread gshapira
Using the high level consumer, each consumer in the group can call iter.next () in a loop until they get the number of messages you need. — Sent from Mailbox On Thu, Oct 16, 2014 at 10:18 AM, Josh J joshjd...@gmail.com wrote: hi, How do I read N items from a topic? I also would like to do

Re: java api code and javadoc

2014-10-16 Thread Ewen Cheslack-Postava
KafkaStream and MessageAndOffset are Scala classes, so you'll find them under the scaladocs. The ConsumerConnector interface should show up in the javadocs with good documentation coverage. Some classes like MessageAndOffset are so simple (just compositions of other data) that they aren't going to

Re: read N items from topic

2014-10-16 Thread Neha Narkhede
Josh, The consumer's API doesn't allow you to specify N messages, but you can invoke iter.next() as Gwen suggested and count the messages. Note that the iterator can block if you have less than N messages so you will have to careful design around it. The new consumer's API provides a non blocking

Re: [DISCUSS] Release 0.8.2-beta before 0.8.2?

2014-10-16 Thread Neha Narkhede
Another JIRA that will be nice to include as part of 0.8.2-beta is https://issues.apache.org/jira/browse/KAFKA-1481 that fixes the mbean naming. Looking for people's thoughts on 2 things here - 1. How do folks feel about doing a 0.8.2-beta release right now and 0.8.2 final 4-5 weeks later? 2. Do

Re: [Kafka-users] Producer not distributing across all partitions

2014-10-16 Thread Neha Narkhede
A topic.metadata.refresh.interval.ms of 10 mins means that the producer will take 10 mins to detect new partitions. So newly added or reassigned partitions might not get data for 10 mins. In general, if you're still at prototyping stages, I'd recommend using the new producer available on kafka

Re: Monitoring connection with kafka client

2014-10-16 Thread Neha Narkhede
If you want to know if the Kafka and zookeeper cluster is healthy or not, you'd want to monitor the cluster directly. Here are pointers for monitoring the Kafka brokers - http://kafka.apache.org/documentation.html#monitoring Thanks, Neha On Thu, Oct 16, 2014 at 3:09 AM, Alex Objelean

Re: getOffsetsBefore(...) = kafka.common.UnknownException

2014-10-16 Thread Neha Narkhede
Do you see any errors on the broker? Are you sure that the consumer's fetch offset is set higher than the largest message in your topic? It should be higher than message.max.bytes on the broker (which defaults to 1MB). On Thu, Oct 16, 2014 at 3:56 AM, Magnus Vojbacke

Re: Consistency and Availability on Node Failures

2014-10-16 Thread cac...@gmail.com
Knowing that the partitioning is consistent for a given key means that (apart from other benefits) a given consumer only deals with a partition of the keyspace. So if you are in a system with tens of millions of users each consumer only has to store state on a small number of them with

Re: [DISCUSS] Release 0.8.2-beta before 0.8.2?

2014-10-16 Thread Joe Stein
+1 for a 0.8.2-beta next week and 0.8.2.0 final 4-5 weeks later. I agree to the tickets you brought up to have in 0.8.2-beta and also https://issues.apache.org/jira/browse/KAFKA-1493 for lz4 compression. /*** Joe Stein Founder, Principal Consultant Big

Re: getOffsetsBefore(...) = kafka.common.UnknownException

2014-10-16 Thread Jun Rao
The OffsetRequest can only be answered by the leader of the partition. Did you connect the SimpleConsumer to the leader broker? If not, you need to use TopicMetadataRequest to find out the leader broker first. Thanks, Jun On Thu, Oct 16, 2014 at 3:56 AM, Magnus Vojbacke

Re: ConsumerOffsetChecker shows none partitions assigned

2014-10-16 Thread Jun Rao
Which version of ZK are you using? Also, see https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog ? Thanks, Jun On Thu, Oct 16, 2014 at 3:29 PM, Hari Gorak hari.go...@rediffmail.com wrote: Project: Kafka Issue Type: Bug Components: consumer