Re: Recovery of Kafka cluster takes very long time

2015-08-10 Thread Alexey Sverdelov
Hi Todd, It is a good idea, thanks. There is no recovery.threads.per.data.dir entry in our server.properties (so, we run our cluster with default value 1). I will set it to 8 and try again. Alexey On Mon, Aug 10, 2015 at 6:13 PM, Todd Palino tpal...@gmail.com wrote: It looks like you did an

Re: Recovery of Kafka cluster takes very long time

2015-08-10 Thread Todd Palino
It looks like you did an unclean shutdown of the cluster, in which case each open log segment in each partition needs to be checked upon startup. It doesn't really have anything to do with RF=3 specifically, but it does mean that each of your brokers has 6000 partitions to check. What is the

Recovering from Kafka NoReplicaOnlineException with one node

2015-08-10 Thread Mike Thomsen
We have a really simple Kafka set up in our development lab. It's just one node. Periodically, we run into this error: [2015-08-10 13:45:52,405] ERROR Controller 0 epoch 488 initiated state change for partition [test-data,1] from OfflinePartition to OnlinePartition failed (state.change.logger)

Re: best way to call ReassignPartitionsCommand programmatically

2015-08-10 Thread Ewen Cheslack-Postava
It's not public API so it may not be stable between releases, but you could try using the ReassignPartitionsCommand class directly. Or, you can see that the code in that class is a very simple use of ZkUtils, so you could just make the necessary calls to ZkUtils directly. In the future, when

Recovery of Kafka cluster takes very long time

2015-08-10 Thread Alexey Sverdelov
Hi all, I have a 3 node Kafka cluster. There are ten topics, every topic has 600 partitions with RF3. So, after cluster restart I can see the following log message like INFO Recovering unflushed segment 0 in log... and the complete recovery of 3 nodes takes about 2+ hours. I don't know why it

Re: How to read messages from Kafka by specific time?

2015-08-10 Thread Ewen Cheslack-Postava
You can use SimpleConsumer.getOffsetsBefore to get a list of offsets before a Unix timestamp. However, this isn't per-message. The offests returned are for the log segments stored on the broker, so the granularity will depend on your log rolling settings. -Ewen On Wed, Aug 5, 2015 at 2:11 AM,

Re: abstracting ZooKeeper

2015-08-10 Thread Daniel Nelson
I’m definitely looking forward to progress on this front. We’re currently running ZK only for Kafka. If we could have Kafka use our existing Etcd cluster, it would be one less critical piece of infrastructure to worry about, which would be great. -- Daniel Nelson On Aug 9, 2015, at 6:23 PM,

Re: how to get single record from kafka topic+partition @ specified offset

2015-08-10 Thread Ewen Cheslack-Postava
Right now I think the only place the new API is documented is in the javadocs. Here are the relevant sections for replacing the simple consumer. Subscribing to specific partitions:

Re: how to get single record from kafka topic+partition @ specified offset

2015-08-10 Thread Joe Lawson
Ewen, Do you have an example or link for the changes/plans that will bring the benefits you describe? Cheers, Joe Lawson On Aug 10, 2015 3:27 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You can do this using the SimpleConsumer. See

Re: OffsetOutOfRangeError with Kafka-Spark streaming

2015-08-10 Thread Cassa L
Ok. Problem is resolved when I increased retention policy for topic. But now I see that whenever I restart Spark job, some old messages are being pulled up by Spark stream. For new Spark stream API, do we need to keep track of offsets? LCassa On Thu, Aug 6, 2015 at 4:58 PM, Grant Henke

Re: how to get single record from kafka topic+partition @ specified offset

2015-08-10 Thread Ewen Cheslack-Postava
You can do this using the SimpleConsumer. See https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example for details with some code. When the new consumer is released in 0.8.3, this will get a *lot* simpler. -Ewen On Fri, Aug 7, 2015 at 9:26 AM, Padgett, Ben

Re: problem about the offset

2015-08-10 Thread Ewen Cheslack-Postava
Kafka doesn't track per-message timestamps. The request you're using gets a list of offsets for *log segments* with timestamps earlier than the one you specify. If you start consuming from the offset returned, you should find the timestamp you specified in the same log file. -Ewen On Mon, Aug

Re: Recovering from Kafka NoReplicaOnlineException with one node

2015-08-10 Thread Gwen Shapira
Maybe it is not ZooKeeper itself, but the Broker connection to ZK timed-out and caused the controller to believe that the broker is dead and therefore attempted to elect a new leader (which doesn't exist, since you have just one node). Increasing the zookeeper session timeout value may help.

Re: Recovering from Kafka NoReplicaOnlineException with one node

2015-08-10 Thread Mike Thomsen
Thanks, I'll give that a shot. I noticed that our configuration used the default timeouts for session and sync, so I upped those zookeeper configuration settings for kafka as well. On Mon, Aug 10, 2015 at 4:37 PM, Gwen Shapira g...@confluent.io wrote: Maybe it is not ZooKeeper itself, but the

Re: Partition and consumer configuration

2015-08-10 Thread Manikumar Reddy
Hi, 1. Will Kafka distribute the 100 serialized files randomly say 20 files go to Partition 1, 25 to Partition 2 etc or do I have an option to configure how many files go to which partition . Assuming you are using new producer, All keyed messages will be distributed based on the

problem about the offset

2015-08-10 Thread jinhong lu
Hi, all I try to use SimpleConsumer follow the example at https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example. I modify the offset in the code: long readOffset =

Re: Partition and consumer configuration

2015-08-10 Thread Shashidhar Rao
Thanks Kumar, for explaining in depth. On Mon, Aug 10, 2015 at 1:37 PM, Manikumar Reddy ku...@nmsworks.co.in wrote: Hi, 1. Will Kafka distribute the 100 serialized files randomly say 20 files go to Partition 1, 25 to Partition 2 etc or do I have an option to configure how many

Partition and consumer configuration

2015-08-10 Thread Shashidhar Rao
Hi, Could somebody help me whether my understanding is correct as I am very new to kafka. 1. Topic name- ProdCategory, with 4 partitions. All the messages are XML files . And consumer numbers also 4. Multi-Broker -4. 2. XML files vary in size from 10 KB- 1 MB. 3. Say if there are 100 XML

Re: Kafka metadata

2015-08-10 Thread Andrew Otto
Note that broker metadata is not necessarily kept in sync with zookeeper on all brokers at all times: https://issues.apache.org/jira/browse/KAFKA-1367 This looks like it is fixed in the upcoming 0.8.3 On Aug 8, 2015, at 01:08, Abdoulaye Diallo abdoulaye...@gmail.com wrote: @Rahul If