Re: Broker Exceptions

2015-03-17 Thread Kazim Zakee
Hi Mayuresh,

Here are the logs.

Broker-4

[2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 17:49:40,762] INFO Partition [Topic22kv,5] on broker 4: Expanding 
ISR for partition [Topic22kv,5] from 2,4 to 2,4,3 (kafka.cluster.Partition)
[2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 18:47:11,539] INFO Partition [Topic22kv,5] on broker 4: Expanding 
ISR for partition [Topic22kv,5] from 2,4 to 2,4,3 (kafka.cluster.Partition)
[2015-03-13 20:09:10,460] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001213156892.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 20:23:10,377] INFO Scheduling log segment 1218520176 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 20:24:10,377] INFO Deleting segment 1218520176 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 20:24:10,444] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001218520176.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 20:26:28,789] INFO Rolled new log segment for 'Topic22kv-5' in 1 
ms. (kafka.log.Log)
[2015-03-13 20:38:10,333] INFO Scheduling log segment 1223883126 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 20:39:10,334] INFO Deleting segment 1223883126 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 20:39:10,478] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001223883126.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 20:46:16,924] INFO Rolled new log segment for 'Topic22kv-5' in 1 
ms. (kafka.log.Log)
[2015-03-13 20:53:10,370] INFO Scheduling log segment 1229245987 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 20:54:10,371] INFO Deleting segment 1229245987 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 20:54:10,444] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001229245987.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 21:03:10,347] INFO Scheduling log segment 1234609321 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 21:04:10,348] INFO Deleting segment 1234609321 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 21:04:10,519] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001234609321.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 21:06:07,810] INFO Rolled new log segment for 'Topic22kv-5' in 0 
ms. (kafka.log.Log)
[2015-03-13 21:18:10,355] INFO Scheduling log segment 1239972496 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 21:19:10,408] INFO Deleting segment 1239972496 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 21:19:10,516] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001239972496.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 21:25:49,058] INFO Rolled new log segment for 'Topic22kv-5' in 0 
ms. (kafka.log.Log)
[2015-03-13 21:38:10,344] INFO Scheduling log segment 1245335417 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 21:39:10,443] INFO Deleting segment 1245335417 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 21:39:10,539] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001245335417.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 21:45:30,609] INFO Rolled new log segment for 'Topic22kv-5' in 1 
ms. (kafka.log.Log)
[2015-03-13 21:53:10,340] INFO Scheduling log segment 1250698493 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 21:54:10,340] INFO Deleting segment 1250698493 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 21:54:10,456] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001250698493.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 22:05:31,719] INFO Rolled new log segment for 'Topic22kv-5' in 1 
ms. (kafka.log.Log)
[2015-03-13 22:13:10,333] INFO Scheduling log segment 1256061631 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 22:14:10,333] INFO Deleting segment 1256061631 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 22:14:10,406] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001256061631.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 22:25:12,659] INFO Rolled new log segment for 'Topic22kv-5' in 0 
ms. (kafka.log.Log)
[2015-03-13 22:33:10,390] INFO Scheduling log segment 1261424153 for log 
Topic22kv-5 for deletion. 

Re: Broker Exceptions

2015-03-17 Thread Zakee
Hi Mayuresh,

Here are the logs.


Old School Yearbook Pics
View Class Yearbooks Online Free. Search by School  Year. Look Now!
http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vucBroker-4

[2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 17:49:40,762] INFO Partition [Topic22kv,5] on broker 4: Expanding 
ISR for partition [Topic22kv,5] from 2,4 to 2,4,3 (kafka.cluster.Partition)
[2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking 
ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition)
[2015-03-13 18:47:11,539] INFO Partition [Topic22kv,5] on broker 4: Expanding 
ISR for partition [Topic22kv,5] from 2,4 to 2,4,3 (kafka.cluster.Partition)
[2015-03-13 20:09:10,460] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001213156892.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 20:23:10,377] INFO Scheduling log segment 1218520176 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 20:24:10,377] INFO Deleting segment 1218520176 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 20:24:10,444] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001218520176.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 20:26:28,789] INFO Rolled new log segment for 'Topic22kv-5' in 1 
ms. (kafka.log.Log)
[2015-03-13 20:38:10,333] INFO Scheduling log segment 1223883126 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 20:39:10,334] INFO Deleting segment 1223883126 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 20:39:10,478] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001223883126.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 20:46:16,924] INFO Rolled new log segment for 'Topic22kv-5' in 1 
ms. (kafka.log.Log)
[2015-03-13 20:53:10,370] INFO Scheduling log segment 1229245987 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 20:54:10,371] INFO Deleting segment 1229245987 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 20:54:10,444] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001229245987.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 21:03:10,347] INFO Scheduling log segment 1234609321 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 21:04:10,348] INFO Deleting segment 1234609321 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 21:04:10,519] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001234609321.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 21:06:07,810] INFO Rolled new log segment for 'Topic22kv-5' in 0 
ms. (kafka.log.Log)
[2015-03-13 21:18:10,355] INFO Scheduling log segment 1239972496 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 21:19:10,408] INFO Deleting segment 1239972496 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 21:19:10,516] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001239972496.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 21:25:49,058] INFO Rolled new log segment for 'Topic22kv-5' in 0 
ms. (kafka.log.Log)
[2015-03-13 21:38:10,344] INFO Scheduling log segment 1245335417 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 21:39:10,443] INFO Deleting segment 1245335417 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 21:39:10,539] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001245335417.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 21:45:30,609] INFO Rolled new log segment for 'Topic22kv-5' in 1 
ms. (kafka.log.Log)
[2015-03-13 21:53:10,340] INFO Scheduling log segment 1250698493 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 21:54:10,340] INFO Deleting segment 1250698493 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 21:54:10,456] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001250698493.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-13 22:05:31,719] INFO Rolled new log segment for 'Topic22kv-5' in 1 
ms. (kafka.log.Log)
[2015-03-13 22:13:10,333] INFO Scheduling log segment 1256061631 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-13 22:14:10,333] INFO Deleting segment 1256061631 from log 
Topic22kv-5. (kafka.log.Log)
[2015-03-13 22:14:10,406] INFO Deleting index 
/vol11/kafka82/Topic22kv-5/001256061631.index.deleted 

Re: Broker Exceptions

2015-03-17 Thread Zakee
 What version are you running ?

Version 0.8.2.0

 Your case is 2). But the only thing weird is your replica (broker 3) is
 requesting for offset which is greater than the leaders log end offset.


So what could be the cause?

Thanks
Zakee



 On Mar 17, 2015, at 11:45 AM, Mayuresh Gharat gharatmayures...@gmail.com 
 wrote:
 
 What version are you running ?
 
 The code for latest version says that :
 
 1) if the log end offset of the replica is greater than the leaders log end
 offset, the replicas offset will be reset to logEndOffset of the leader.
 
 2) Else if the log end offset of the replica is smaller than the leaders
 log end offset and its out of range, the replicas offset will be reset to
 logStartOffset of the leader.
 
 Your case is 2). But the only thing weird is your replica (broker 3) is
 requesting for offset which is greater than the leaders log end offset.
 
 Thanks,
 
 Mayuresh
 
 
 On Tue, Mar 17, 2015 at 10:26 AM, Mayuresh Gharat 
 gharatmayures...@gmail.com mailto:gharatmayures...@gmail.com wrote:
 
 cool.
 
 On Tue, Mar 17, 2015 at 10:15 AM, Zakee kzak...@netzero.net wrote:
 
 Hi Mayuresh,
 
 The logs are already attached and are in reverse order starting backwards
 from [2015-03-14 07:46:52,517] to the time when brokers were started.
 
 Thanks
 Zakee
 
 
 
 On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat 
 gharatmayures...@gmail.com wrote:
 
 Hi Zakee,
 
 Thanks for the logs. Can you paste earlier logs from broker-3 up to :
 
 [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
 offset 1754769769 for partition [Topic22kv,5] out of range; reset
 offset to 1400864851 (kafka.server.ReplicaFetcherThread)
 
 That would help us figure out what was happening on this broker before
 it
 issued a replicaFetch request to broker-4.
 
 Thanks,
 
 Mayuresh
 
 On Mon, Mar 16, 2015 at 11:32 PM, Zakee kzak...@netzero.net wrote:
 
 Hi Mayuresh,
 
 Here are the logs.
 
 
 Old School Yearbook Pics
 View Class Yearbooks Online Free. Search by School  Year. Look Now!
 
 http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc
 
 
 Thanks,
 Kazim Zakee
 
 
 
 On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat 
 gharatmayures...@gmail.com wrote:
 
 Can you provide more logs (complete) on Broker 3 till time :
 
 *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3
 for
 partition [Topic22kv,5] reset its fetch offset from 1400864851 to
 current
 leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
 
 I would like to see logs from time much before it sent the fetch
 request
 to
 Broker 4 to the time above. I want to check if in any case Broker 3
 was a
 leader before broker 4 took over.
 
 Additional logs will help.
 
 
 Thanks,
 
 Mayuresh
 
 
 
 On Sat, Mar 14, 2015 at 8:35 PM, Zakee kzak...@netzero.net wrote:
 
 log.cleanup.policy is delete not compact.
 log.cleaner.enable=true
 log.cleaner.threads=5
 log.cleanup.policy=delete
 log.flush.scheduler.interval.ms=3000
 log.retention.minutes=1440
 log.segment.bytes=1073741824  (1gb)
 
 Messages are keyed but not compressed, producer async and uses kafka
 default partitioner.
 String message = msg.getString();
 String uniqKey = +rnd.nextInt();// random key
 String partKey = getPartitionKey();// partition key
 KeyedMessageString, String data = new KeyedMessageString,
 String(this.topicName, uniqKey, partKey, message);
 producer.send(data);
 
 Thanks
 Zakee
 
 
 
 On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote:
 
 Is your topic log compacted? Also if it is are the messages keyed?
 Or
 are the messages compressed?
 
 Thanks,
 
 Mayuresh
 
 Sent from my iPhone
 
 On Mar 14, 2015, at 2:02 PM, Zakee kzak...@netzero.net mailto:
 kzak...@netzero.net wrote:
 
 Thanks, Jiangjie for helping resolve the kafka controller migration
 driven partition leader rebalance issue. The logs are much cleaner
 now.
 
 There are a few incidences of Out of range offset even though
 there
 is
 no consumers running, only producers and replica fetchers. I was
 trying
 to
 relate to a cause, looks like compaction (log segment deletion)
 causing
 this. Not sure whether this is expected behavior.
 
 Broker-4:
 [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]:
 Error
 when processing fetch request for partition [Topic22kv,5] offset
 1754769769
 from follower with correlation id 1645671. Possible cause: Request
 for
 offset 1754769769 but we only have log segments in the range
 1400864851
 to
 1754769732. (kafka.server.ReplicaManager)
 
 Broker-3:
 [2015-03-14 07:46:52,356] INFO The cleaning for partition
 [Topic22kv,5]
 is aborted and paused (kafka.log.LogCleaner)
 [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851
 for
 log Topic22kv-5 for deletion. (kafka.log.Log)
 …
 [2015-03-14 07:46:52,421] INFO Compaction for partition
 [Topic22kv,5]
 is resumed (kafka.log.LogCleaner)
 [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], 

Re: Broker Exceptions

2015-03-17 Thread Mayuresh Gharat
What version are you running ?

The code for latest version says that :

1) if the log end offset of the replica is greater than the leaders log end
offset, the replicas offset will be reset to logEndOffset of the leader.

2) Else if the log end offset of the replica is smaller than the leaders
log end offset and its out of range, the replicas offset will be reset to
logStartOffset of the leader.

Your case is 2). But the only thing weird is your replica (broker 3) is
requesting for offset which is greater than the leaders log end offset.

Thanks,

Mayuresh


On Tue, Mar 17, 2015 at 10:26 AM, Mayuresh Gharat 
gharatmayures...@gmail.com wrote:

 cool.

 On Tue, Mar 17, 2015 at 10:15 AM, Zakee kzak...@netzero.net wrote:

 Hi Mayuresh,

 The logs are already attached and are in reverse order starting backwards
 from [2015-03-14 07:46:52,517] to the time when brokers were started.

 Thanks
 Zakee



  On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat 
 gharatmayures...@gmail.com wrote:
 
  Hi Zakee,
 
  Thanks for the logs. Can you paste earlier logs from broker-3 up to :
 
  [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
  offset 1754769769 for partition [Topic22kv,5] out of range; reset
  offset to 1400864851 (kafka.server.ReplicaFetcherThread)
 
  That would help us figure out what was happening on this broker before
 it
  issued a replicaFetch request to broker-4.
 
  Thanks,
 
  Mayuresh
 
  On Mon, Mar 16, 2015 at 11:32 PM, Zakee kzak...@netzero.net wrote:
 
  Hi Mayuresh,
 
  Here are the logs.
 
  
  Old School Yearbook Pics
  View Class Yearbooks Online Free. Search by School  Year. Look Now!
 
 http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc
 
 
  Thanks,
  Kazim Zakee
 
 
 
  On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat 
  gharatmayures...@gmail.com wrote:
 
  Can you provide more logs (complete) on Broker 3 till time :
 
  *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3
  for
  partition [Topic22kv,5] reset its fetch offset from 1400864851 to
 current
  leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
 
  I would like to see logs from time much before it sent the fetch
 request
  to
  Broker 4 to the time above. I want to check if in any case Broker 3
 was a
  leader before broker 4 took over.
 
  Additional logs will help.
 
 
  Thanks,
 
  Mayuresh
 
 
 
  On Sat, Mar 14, 2015 at 8:35 PM, Zakee kzak...@netzero.net wrote:
 
  log.cleanup.policy is delete not compact.
  log.cleaner.enable=true
  log.cleaner.threads=5
  log.cleanup.policy=delete
  log.flush.scheduler.interval.ms=3000
  log.retention.minutes=1440
  log.segment.bytes=1073741824  (1gb)
 
  Messages are keyed but not compressed, producer async and uses kafka
  default partitioner.
  String message = msg.getString();
  String uniqKey = +rnd.nextInt();// random key
  String partKey = getPartitionKey();// partition key
  KeyedMessageString, String data = new KeyedMessageString,
  String(this.topicName, uniqKey, partKey, message);
  producer.send(data);
 
  Thanks
  Zakee
 
 
 
  On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote:
 
  Is your topic log compacted? Also if it is are the messages keyed?
 Or
  are the messages compressed?
 
  Thanks,
 
  Mayuresh
 
  Sent from my iPhone
 
  On Mar 14, 2015, at 2:02 PM, Zakee kzak...@netzero.net mailto:
  kzak...@netzero.net wrote:
 
  Thanks, Jiangjie for helping resolve the kafka controller migration
  driven partition leader rebalance issue. The logs are much cleaner
 now.
 
  There are a few incidences of Out of range offset even though
 there
  is
  no consumers running, only producers and replica fetchers. I was
 trying
  to
  relate to a cause, looks like compaction (log segment deletion)
 causing
  this. Not sure whether this is expected behavior.
 
  Broker-4:
  [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]:
 Error
  when processing fetch request for partition [Topic22kv,5] offset
  1754769769
  from follower with correlation id 1645671. Possible cause: Request
 for
  offset 1754769769 but we only have log segments in the range
 1400864851
  to
  1754769732. (kafka.server.ReplicaManager)
 
  Broker-3:
  [2015-03-14 07:46:52,356] INFO The cleaning for partition
  [Topic22kv,5]
  is aborted and paused (kafka.log.LogCleaner)
  [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851
 for
  log Topic22kv-5 for deletion. (kafka.log.Log)
  …
  [2015-03-14 07:46:52,421] INFO Compaction for partition
 [Topic22kv,5]
  is resumed (kafka.log.LogCleaner)
  [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
  offset 1754769769 for partition [Topic22kv,5] out of range; reset
  offset to
  1400864851 (kafka.server.ReplicaFetcherThread)
  [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica
 3
  for partition [Topic22kv,5] reset its fetch offset from 1400864851 to
  current leader 4's start 

Re: Broker Exceptions

2015-03-17 Thread Zakee
Hi Mayuresh,

The logs are already attached and are in reverse order starting backwards from 
[2015-03-14 07:46:52,517] to the time when brokers were started.

Thanks
Zakee



 On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat gharatmayures...@gmail.com 
 wrote:
 
 Hi Zakee,
 
 Thanks for the logs. Can you paste earlier logs from broker-3 up to :
 
 [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
 offset 1754769769 for partition [Topic22kv,5] out of range; reset
 offset to 1400864851 (kafka.server.ReplicaFetcherThread)
 
 That would help us figure out what was happening on this broker before it
 issued a replicaFetch request to broker-4.
 
 Thanks,
 
 Mayuresh
 
 On Mon, Mar 16, 2015 at 11:32 PM, Zakee kzak...@netzero.net wrote:
 
 Hi Mayuresh,
 
 Here are the logs.
 
 
 Old School Yearbook Pics
 View Class Yearbooks Online Free. Search by School  Year. Look Now!
 http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc
 
 
 Thanks,
 Kazim Zakee
 
 
 
 On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat 
 gharatmayures...@gmail.com wrote:
 
 Can you provide more logs (complete) on Broker 3 till time :
 
 *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3
 for
 partition [Topic22kv,5] reset its fetch offset from 1400864851 to current
 leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
 
 I would like to see logs from time much before it sent the fetch request
 to
 Broker 4 to the time above. I want to check if in any case Broker 3 was a
 leader before broker 4 took over.
 
 Additional logs will help.
 
 
 Thanks,
 
 Mayuresh
 
 
 
 On Sat, Mar 14, 2015 at 8:35 PM, Zakee kzak...@netzero.net wrote:
 
 log.cleanup.policy is delete not compact.
 log.cleaner.enable=true
 log.cleaner.threads=5
 log.cleanup.policy=delete
 log.flush.scheduler.interval.ms=3000
 log.retention.minutes=1440
 log.segment.bytes=1073741824  (1gb)
 
 Messages are keyed but not compressed, producer async and uses kafka
 default partitioner.
 String message = msg.getString();
 String uniqKey = +rnd.nextInt();// random key
 String partKey = getPartitionKey();// partition key
 KeyedMessageString, String data = new KeyedMessageString,
 String(this.topicName, uniqKey, partKey, message);
 producer.send(data);
 
 Thanks
 Zakee
 
 
 
 On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote:
 
 Is your topic log compacted? Also if it is are the messages keyed? Or
 are the messages compressed?
 
 Thanks,
 
 Mayuresh
 
 Sent from my iPhone
 
 On Mar 14, 2015, at 2:02 PM, Zakee kzak...@netzero.net mailto:
 kzak...@netzero.net wrote:
 
 Thanks, Jiangjie for helping resolve the kafka controller migration
 driven partition leader rebalance issue. The logs are much cleaner now.
 
 There are a few incidences of Out of range offset even though  there
 is
 no consumers running, only producers and replica fetchers. I was trying
 to
 relate to a cause, looks like compaction (log segment deletion) causing
 this. Not sure whether this is expected behavior.
 
 Broker-4:
 [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error
 when processing fetch request for partition [Topic22kv,5] offset
 1754769769
 from follower with correlation id 1645671. Possible cause: Request for
 offset 1754769769 but we only have log segments in the range 1400864851
 to
 1754769732. (kafka.server.ReplicaManager)
 
 Broker-3:
 [2015-03-14 07:46:52,356] INFO The cleaning for partition
 [Topic22kv,5]
 is aborted and paused (kafka.log.LogCleaner)
 [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for
 log Topic22kv-5 for deletion. (kafka.log.Log)
 …
 [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5]
 is resumed (kafka.log.LogCleaner)
 [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
 offset 1754769769 for partition [Topic22kv,5] out of range; reset
 offset to
 1400864851 (kafka.server.ReplicaFetcherThread)
 [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3
 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to
 current leader 4's start offset 1400864851
 (kafka.server.ReplicaFetcherThread)
 
 
 Old School Yearbook Pics
 View Class Yearbooks Online Free. Search by School  Year. Look Now!
 
 
 http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc 
 
 http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc
 topic22kv_746a_314_logs.txt
 
 
 Thanks
 Zakee
 
 On Mar 9, 2015, at 12:18 PM, Zakee kzak...@netzero.net wrote:
 
 No broker restarts.
 
 Created a kafka issue:
 https://issues.apache.org/jira/browse/KAFKA-2011 
 https://issues.apache.org/jira/browse/KAFKA-2011
 
 Logs for rebalance:
 [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred
 replica election for partitions: (kafka.controller.KafkaController)
 [2015-03-07 16:52:48,969] INFO [Controller 2]: 

Re: Broker Exceptions

2015-03-17 Thread Mayuresh Gharat
We are trying to see what might have caused it.

We had some questions :
1) Is this reproducible? That way we can dig deep.


This looks interesting problem to solve and you might have caught a bug,
but we need to verify the root cause before filing a ticket.

Thanks,

Mayuresh

On Tue, Mar 17, 2015 at 2:10 PM, Zakee kzak...@netzero.net wrote:

  What version are you running ?

 Version 0.8.2.0

  Your case is 2). But the only thing weird is your replica (broker 3) is
  requesting for offset which is greater than the leaders log end offset.


 So what could be the cause?

 Thanks
 Zakee



  On Mar 17, 2015, at 11:45 AM, Mayuresh Gharat 
 gharatmayures...@gmail.com wrote:
 
  What version are you running ?
 
  The code for latest version says that :
 
  1) if the log end offset of the replica is greater than the leaders log
 end
  offset, the replicas offset will be reset to logEndOffset of the leader.
 
  2) Else if the log end offset of the replica is smaller than the leaders
  log end offset and its out of range, the replicas offset will be reset to
  logStartOffset of the leader.
 
  Your case is 2). But the only thing weird is your replica (broker 3) is
  requesting for offset which is greater than the leaders log end offset.
 
  Thanks,
 
  Mayuresh
 
 
  On Tue, Mar 17, 2015 at 10:26 AM, Mayuresh Gharat 
  gharatmayures...@gmail.com mailto:gharatmayures...@gmail.com wrote:
 
  cool.
 
  On Tue, Mar 17, 2015 at 10:15 AM, Zakee kzak...@netzero.net wrote:
 
  Hi Mayuresh,
 
  The logs are already attached and are in reverse order starting
 backwards
  from [2015-03-14 07:46:52,517] to the time when brokers were started.
 
  Thanks
  Zakee
 
 
 
  On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat 
  gharatmayures...@gmail.com wrote:
 
  Hi Zakee,
 
  Thanks for the logs. Can you paste earlier logs from broker-3 up to :
 
  [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
  offset 1754769769 for partition [Topic22kv,5] out of range; reset
  offset to 1400864851 (kafka.server.ReplicaFetcherThread)
 
  That would help us figure out what was happening on this broker before
  it
  issued a replicaFetch request to broker-4.
 
  Thanks,
 
  Mayuresh
 
  On Mon, Mar 16, 2015 at 11:32 PM, Zakee kzak...@netzero.net wrote:
 
  Hi Mayuresh,
 
  Here are the logs.
 
  
  Old School Yearbook Pics
  View Class Yearbooks Online Free. Search by School  Year. Look Now!
 
 
 http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc
 
 
  Thanks,
  Kazim Zakee
 
 
 
  On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat 
  gharatmayures...@gmail.com wrote:
 
  Can you provide more logs (complete) on Broker 3 till time :
 
  *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4],
 Replica 3
  for
  partition [Topic22kv,5] reset its fetch offset from 1400864851 to
  current
  leader 4's start offset 1400864851
 (kafka.server.ReplicaFetcherThread)
 
  I would like to see logs from time much before it sent the fetch
  request
  to
  Broker 4 to the time above. I want to check if in any case Broker 3
  was a
  leader before broker 4 took over.
 
  Additional logs will help.
 
 
  Thanks,
 
  Mayuresh
 
 
 
  On Sat, Mar 14, 2015 at 8:35 PM, Zakee kzak...@netzero.net wrote:
 
  log.cleanup.policy is delete not compact.
  log.cleaner.enable=true
  log.cleaner.threads=5
  log.cleanup.policy=delete
  log.flush.scheduler.interval.ms=3000
  log.retention.minutes=1440
  log.segment.bytes=1073741824  (1gb)
 
  Messages are keyed but not compressed, producer async and uses
 kafka
  default partitioner.
  String message = msg.getString();
  String uniqKey = +rnd.nextInt();// random key
  String partKey = getPartitionKey();// partition key
  KeyedMessageString, String data = new KeyedMessageString,
  String(this.topicName, uniqKey, partKey, message);
  producer.send(data);
 
  Thanks
  Zakee
 
 
 
  On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote:
 
  Is your topic log compacted? Also if it is are the messages keyed?
  Or
  are the messages compressed?
 
  Thanks,
 
  Mayuresh
 
  Sent from my iPhone
 
  On Mar 14, 2015, at 2:02 PM, Zakee kzak...@netzero.net mailto:
  kzak...@netzero.net wrote:
 
  Thanks, Jiangjie for helping resolve the kafka controller
 migration
  driven partition leader rebalance issue. The logs are much cleaner
  now.
 
  There are a few incidences of Out of range offset even though
  there
  is
  no consumers running, only producers and replica fetchers. I was
  trying
  to
  relate to a cause, looks like compaction (log segment deletion)
  causing
  this. Not sure whether this is expected behavior.
 
  Broker-4:
  [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]:
  Error
  when processing fetch request for partition [Topic22kv,5] offset
  1754769769
  from follower with correlation id 1645671. Possible cause: Request
  for
  offset 1754769769 but we only have log segments in the range
  1400864851
  

Re: Broker Exceptions

2015-03-16 Thread Mayuresh Gharat
Can you provide more logs (complete) on Broker 3 till time :

*[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3 for
partition [Topic22kv,5] reset its fetch offset from 1400864851 to current
leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)

I would like to see logs from time much before it sent the fetch request to
Broker 4 to the time above. I want to check if in any case Broker 3 was a
leader before broker 4 took over.

Additional logs will help.


Thanks,

Mayuresh



On Sat, Mar 14, 2015 at 8:35 PM, Zakee kzak...@netzero.net wrote:

 log.cleanup.policy is delete not compact.
 log.cleaner.enable=true
 log.cleaner.threads=5
 log.cleanup.policy=delete
 log.flush.scheduler.interval.ms=3000
 log.retention.minutes=1440
 log.segment.bytes=1073741824  (1gb)

 Messages are keyed but not compressed, producer async and uses kafka
 default partitioner.
 String message = msg.getString();
 String uniqKey = +rnd.nextInt();// random key
 String partKey = getPartitionKey();// partition key
 KeyedMessageString, String data = new KeyedMessageString,
 String(this.topicName, uniqKey, partKey, message);
 producer.send(data);

 Thanks
 Zakee



  On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote:
 
  Is your topic log compacted? Also if it is are the messages keyed? Or
 are the messages compressed?
 
  Thanks,
 
  Mayuresh
 
  Sent from my iPhone
 
  On Mar 14, 2015, at 2:02 PM, Zakee kzak...@netzero.net mailto:
 kzak...@netzero.net wrote:
 
  Thanks, Jiangjie for helping resolve the kafka controller migration
 driven partition leader rebalance issue. The logs are much cleaner now.
 
  There are a few incidences of Out of range offset even though  there is
 no consumers running, only producers and replica fetchers. I was trying to
 relate to a cause, looks like compaction (log segment deletion) causing
 this. Not sure whether this is expected behavior.
 
  Broker-4:
  [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error
 when processing fetch request for partition [Topic22kv,5] offset 1754769769
 from follower with correlation id 1645671. Possible cause: Request for
 offset 1754769769 but we only have log segments in the range 1400864851 to
 1754769732. (kafka.server.ReplicaManager)
 
  Broker-3:
  [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5]
 is aborted and paused (kafka.log.LogCleaner)
  [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for
 log Topic22kv-5 for deletion. (kafka.log.Log)
  …
  [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5]
 is resumed (kafka.log.LogCleaner)
  [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
 offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to
 1400864851 (kafka.server.ReplicaFetcherThread)
  [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3
 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to
 current leader 4's start offset 1400864851
 (kafka.server.ReplicaFetcherThread)
 
  
  Old School Yearbook Pics
  View Class Yearbooks Online Free. Search by School  Year. Look Now!
 
 http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc 
 http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc
  topic22kv_746a_314_logs.txt
 
 
  Thanks
  Zakee
 
  On Mar 9, 2015, at 12:18 PM, Zakee kzak...@netzero.net wrote:
 
  No broker restarts.
 
  Created a kafka issue:
 https://issues.apache.org/jira/browse/KAFKA-2011 
 https://issues.apache.org/jira/browse/KAFKA-2011
 
  Logs for rebalance:
  [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred
 replica election for partitions: (kafka.controller.KafkaController)
  [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that
 completed preferred replica election: (kafka.controller.KafkaController)
  …
  [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred
 replica election for partitions: (kafka.controller.KafkaController)
  ...
  [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred
 replica election for partitions: (kafka.controller.KafkaController)
  ...
  [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred
 replica leader election for partitions (kafka.controller.KafkaController)
  ...
  [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing
 preferred replica election:  (kafka.controller.KafkaController)
 
  Also, I still see lots of below errors (~69k) going on in the logs
 since the restart. Is there any other reason than rebalance for these
 errors?
 
  [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
 for partition [Topic-11,7] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
  [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
 for partition [Topic-2,25] to broker 5:class
 

Re: Broker Exceptions

2015-03-14 Thread Zakee
Thanks, Jiangjie for helping resolve the kafka controller migration driven 
partition leader rebalance issue. The logs are much cleaner now. 

There are a few incidences of Out of range offset even though  there is no 
consumers running, only producers and replica fetchers. I was trying to relate 
to a cause, looks like compaction (log segment deletion) causing this. Not sure 
whether this is expected behavior.

Broker-4:
[2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when 
processing fetch request for partition [Topic22kv,5] offset 1754769769 from 
follower with correlation id 1645671. Possible cause: Request for offset 
1754769769 but we only have log segments in the range 1400864851 to 1754769732. 
(kafka.server.ReplicaManager)

Broker-3:
[2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is 
aborted and paused (kafka.log.LogCleaner)
[2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
…
[2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is 
resumed (kafka.log.LogCleaner)
[2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 
1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 
(kafka.server.ReplicaFetcherThread)
[2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 for 
partition [Topic22kv,5] reset its fetch offset from 1400864851 to current 
leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)


Old School Yearbook Pics
View Class Yearbooks Online Free. Search by School  Year. Look Now!
http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vucon broker-4
[2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when 
processing fetch request for partition [Topic22kv,5] offset 1754769769 from 
follower with correlation id 1645671. Possible cause: Request for offset 
1754769769 but we only have log segments in the range 1400864851 to 1754769732. 
(kafka.server.ReplicaManager)
[2015-03-14 07:46:52,759] INFO Closing socket connection to /19.10.4.143. 
(kafka.network.Processor)

on broker-3
[2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is 
aborted and paused (kafka.log.LogCleaner)
[2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,408] INFO Scheduling log segment 1406227848 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,408] INFO Scheduling log segment 1411591123 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,408] INFO Scheduling log segment 1416954195 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,409] INFO Scheduling log segment 1422317783 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,409] INFO Scheduling log segment 1427680989 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,409] INFO Scheduling log segment 1433044302 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,409] INFO Scheduling log segment 1438407760 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,409] INFO Scheduling log segment 1443770521 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,409] INFO Scheduling log segment 1449133811 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,409] INFO Scheduling log segment 1454497169 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,410] INFO Scheduling log segment 1459860085 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,411] INFO Scheduling log segment 1465223478 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,411] INFO Scheduling log segment 1470586720 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,412] INFO Scheduling log segment 1475949659 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,412] INFO Scheduling log segment 1481312627 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,412] INFO Scheduling log segment 1486675299 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,412] INFO Scheduling log segment 1492038376 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,413] INFO Scheduling log segment 1497401497 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,413] INFO Scheduling log segment 1502764133 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,413] INFO Scheduling log segment 1508126631 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,413] INFO Scheduling log segment 1513489256 for log 
Topic22kv-5 for deletion. (kafka.log.Log)
[2015-03-14 07:46:52,413] INFO Scheduling log segment 1518852045 for log 
Topic22kv-5 for deletion. 

Re: Broker Exceptions

2015-03-09 Thread Zakee
 Hmm, that sounds like a bug. Can you paste the log of leader rebalance
 here?
Thanks for you suggestions. 
It looks like the rebalance actually happened only once soon after I started 
with clean cluster and data was pushed, it didn’t happen again so far, and I 
see the partitions leader counts on brokers did not change since then. One of 
the brokers was constantly showing 0 for partition leader count. Is that normal?

Also, I still see lots of below errors (~69k) going on in the logs since the 
restart. Is there any other reason than rebalance for these errors?

[2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition 
[Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException 
(kafka.server.ReplicaFetcherThread)
[2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition 
[Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException 
(kafka.server.ReplicaFetcherThread)
[2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition 
[Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException 
(kafka.server.ReplicaFetcherThread)
[2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition 
[Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException 
(kafka.server.ReplicaFetcherThread)

 Some other things to check are:
 1. The actual property name is auto.leader.rebalance.enable, not
 auto.leader.rebalance. You’ve probably known this, just to double confirm.
Yes 

 2. In zookeeper path, can you verify /admin/preferred_replica_election
 does not exist?
ls /admin
[delete_topics]
ls /admin/preferred_replica_election
Node does not exist: /admin/preferred_replica_election


Thanks
Zakee



 On Mar 7, 2015, at 10:49 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote:
 
 Hmm, that sounds like a bug. Can you paste the log of leader rebalance
 here?
 Some other things to check are:
 1. The actual property name is auto.leader.rebalance.enable, not
 auto.leader.rebalance. You’ve probably known this, just to double confirm.
 2. In zookeeper path, can you verify /admin/preferred_replica_election
 does not exist?
 
 Jiangjie (Becket) Qin
 
 On 3/7/15, 10:24 PM, Zakee kzak...@netzero.net wrote:
 
 I started with  clean cluster and started to push data. It still does the
 rebalance at random durations even though the auto.leader.relabalance is
 set to false.
 
 Thanks
 Zakee
 
 
 
 On Mar 6, 2015, at 3:51 PM, Jiangjie Qin j...@linkedin.com.INVALID
 wrote:
 
 Yes, the rebalance should not happen in that case. That is a little bit
 strange. Could you try to launch a clean Kafka cluster with
 auto.leader.election disabled and try push data?
 When leader migration occurs, NotLeaderForPartition exception is
 expected.
 
 Jiangjie (Becket) Qin
 
 
 On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote:
 
 Yes, Jiangjie, I do see lots of these errors Starting preferred
 replica
 leader election for partitions” in logs. I also see lot of Produce
 request failure warnings in with the NotLeader Exception.
 
 I tried switching off the auto.leader.relabalance to false. I am still
 noticing the rebalance happening. My understanding was the rebalance
 will
 not happen when this is set to false.
 
 Thanks
 Zakee
 
 
 
 On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID
 wrote:
 
 I don’t think num.replica.fetchers will help in this case. Increasing
 number of fetcher threads will only help in cases where you have a
 large
 amount of data coming into a broker and more replica fetcher threads
 will
 help keep up. We usually only use 1-2 for each broker. But in your
 case,
 it looks that leader migration cause issue.
 Do you see anything else in the log? Like preferred leader election?
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net
 mailto:kzak...@netzero.net wrote:
 
 Thanks, Jiangjie.
 
 Yes, I do see under partitions usually shooting every hour. Anythings
 that
 I could try to reduce it?
 
 How does num.replica.fetchers affect the replica sync? Currently
 have
 configured 7 each of 5 brokers.
 
 -Zakee
 
 On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
 j...@linkedin.com.invalid
 wrote:
 
 These messages are usually caused by leader migration. I think as
 long
 as
 you don¹t see this lasting for ever and got a bunch of under
 replicated
 partitions, it should be fine.
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote:
 
 Need to know if I should I be worried about this or ignore them.
 
 I see tons of these exceptions/warnings in the broker logs, not
 sure
 what
 causes them and what could be done to fix them.
 
 ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic]
 to
 broker
 5:class kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error
 for
 partition [TestTopic] to broker 5:class
 

Re: Broker Exceptions

2015-03-09 Thread Zakee
Correction: Actually  the rebalance happened quite until 24 hours after the 
start, and thats where below errors were found. Ideally rebalance should not 
have happened at all.


Thanks
Zakee



 On Mar 9, 2015, at 10:28 AM, Zakee kzak...@netzero.net wrote:
 
 Hmm, that sounds like a bug. Can you paste the log of leader rebalance
 here?
 Thanks for you suggestions. 
 It looks like the rebalance actually happened only once soon after I started 
 with clean cluster and data was pushed, it didn’t happen again so far, and I 
 see the partitions leader counts on brokers did not change since then. One of 
 the brokers was constantly showing 0 for partition leader count. Is that 
 normal?
 
 Also, I still see lots of below errors (~69k) going on in the logs since the 
 restart. Is there any other reason than rebalance for these errors?
 
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
 partition [Topic-11,7] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
 partition [Topic-2,25] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
 partition [Topic-2,21] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
 partition [Topic-22,9] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)
 
 Some other things to check are:
 1. The actual property name is auto.leader.rebalance.enable, not
 auto.leader.rebalance. You’ve probably known this, just to double confirm.
 Yes 
 
 2. In zookeeper path, can you verify /admin/preferred_replica_election
 does not exist?
 ls /admin
 [delete_topics]
 ls /admin/preferred_replica_election
 Node does not exist: /admin/preferred_replica_election
 
 
 Thanks
 Zakee
 
 
 
 On Mar 7, 2015, at 10:49 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote:
 
 Hmm, that sounds like a bug. Can you paste the log of leader rebalance
 here?
 Some other things to check are:
 1. The actual property name is auto.leader.rebalance.enable, not
 auto.leader.rebalance. You’ve probably known this, just to double confirm.
 2. In zookeeper path, can you verify /admin/preferred_replica_election
 does not exist?
 
 Jiangjie (Becket) Qin
 
 On 3/7/15, 10:24 PM, Zakee kzak...@netzero.net wrote:
 
 I started with  clean cluster and started to push data. It still does the
 rebalance at random durations even though the auto.leader.relabalance is
 set to false.
 
 Thanks
 Zakee
 
 
 
 On Mar 6, 2015, at 3:51 PM, Jiangjie Qin j...@linkedin.com.INVALID
 wrote:
 
 Yes, the rebalance should not happen in that case. That is a little bit
 strange. Could you try to launch a clean Kafka cluster with
 auto.leader.election disabled and try push data?
 When leader migration occurs, NotLeaderForPartition exception is
 expected.
 
 Jiangjie (Becket) Qin
 
 
 On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote:
 
 Yes, Jiangjie, I do see lots of these errors Starting preferred
 replica
 leader election for partitions” in logs. I also see lot of Produce
 request failure warnings in with the NotLeader Exception.
 
 I tried switching off the auto.leader.relabalance to false. I am still
 noticing the rebalance happening. My understanding was the rebalance
 will
 not happen when this is set to false.
 
 Thanks
 Zakee
 
 
 
 On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID
 wrote:
 
 I don’t think num.replica.fetchers will help in this case. Increasing
 number of fetcher threads will only help in cases where you have a
 large
 amount of data coming into a broker and more replica fetcher threads
 will
 help keep up. We usually only use 1-2 for each broker. But in your
 case,
 it looks that leader migration cause issue.
 Do you see anything else in the log? Like preferred leader election?
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net
 mailto:kzak...@netzero.net wrote:
 
 Thanks, Jiangjie.
 
 Yes, I do see under partitions usually shooting every hour. Anythings
 that
 I could try to reduce it?
 
 How does num.replica.fetchers affect the replica sync? Currently
 have
 configured 7 each of 5 brokers.
 
 -Zakee
 
 On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
 j...@linkedin.com.invalid
 wrote:
 
 These messages are usually caused by leader migration. I think as
 long
 as
 you don¹t see this lasting for ever and got a bunch of under
 replicated
 partitions, it should be fine.
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote:
 
 Need to know if I should I be worried about this or ignore them.
 
 I see tons of these exceptions/warnings in the broker logs, not
 sure
 what
 causes them and what could be done to fix them.
 
 ERROR 

Re: Broker Exceptions

2015-03-09 Thread Jiangjie Qin
Is there anything wrong with brokers around that time? E.g. Broker restart?
The log you pasted are actually from replica fetchers. Could you paste the
related logs in controller.log?

Thanks.

Jiangjie (Becket) Qin

On 3/9/15, 10:32 AM, Zakee kzak...@netzero.net wrote:

Correction: Actually  the rebalance happened quite until 24 hours after
the start, and thats where below errors were found. Ideally rebalance
should not have happened at all.


Thanks
Zakee



 On Mar 9, 2015, at 10:28 AM, Zakee kzak...@netzero.net wrote:
 
 Hmm, that sounds like a bug. Can you paste the log of leader rebalance
 here?
 Thanks for you suggestions.
 It looks like the rebalance actually happened only once soon after I
started with clean cluster and data was pushed, it didn’t happen again
so far, and I see the partitions leader counts on brokers did not change
since then. One of the brokers was constantly showing 0 for partition
leader count. Is that normal?
 
 Also, I still see lots of below errors (~69k) going on in the logs
since the restart. Is there any other reason than rebalance for these
errors?
 
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
partition [Topic-11,7] to broker 5:class
kafka.common.NotLeaderForPartitionException
(kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
partition [Topic-2,25] to broker 5:class
kafka.common.NotLeaderForPartitionException
(kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
partition [Topic-2,21] to broker 5:class
kafka.common.NotLeaderForPartitionException
(kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
partition [Topic-22,9] to broker 5:class
kafka.common.NotLeaderForPartitionException
(kafka.server.ReplicaFetcherThread)
 
 Some other things to check are:
 1. The actual property name is auto.leader.rebalance.enable, not
 auto.leader.rebalance. You’ve probably known this, just to double
confirm.
 Yes 
 
 2. In zookeeper path, can you verify /admin/preferred_replica_election
 does not exist?
 ls /admin
 [delete_topics]
 ls /admin/preferred_replica_election
 Node does not exist: /admin/preferred_replica_election
 
 
 Thanks
 Zakee
 
 
 
 On Mar 7, 2015, at 10:49 PM, Jiangjie Qin j...@linkedin.com.INVALID
wrote:
 
 Hmm, that sounds like a bug. Can you paste the log of leader rebalance
 here?
 Some other things to check are:
 1. The actual property name is auto.leader.rebalance.enable, not
 auto.leader.rebalance. You’ve probably known this, just to double
confirm.
 2. In zookeeper path, can you verify /admin/preferred_replica_election
 does not exist?
 
 Jiangjie (Becket) Qin
 
 On 3/7/15, 10:24 PM, Zakee kzak...@netzero.net wrote:
 
 I started with  clean cluster and started to push data. It still does
the
 rebalance at random durations even though the auto.leader.relabalance
is
 set to false.
 
 Thanks
 Zakee
 
 
 
 On Mar 6, 2015, at 3:51 PM, Jiangjie Qin j...@linkedin.com.INVALID
 wrote:
 
 Yes, the rebalance should not happen in that case. That is a little
bit
 strange. Could you try to launch a clean Kafka cluster with
 auto.leader.election disabled and try push data?
 When leader migration occurs, NotLeaderForPartition exception is
 expected.
 
 Jiangjie (Becket) Qin
 
 
 On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote:
 
 Yes, Jiangjie, I do see lots of these errors Starting preferred
 replica
 leader election for partitions” in logs. I also see lot of Produce
 request failure warnings in with the NotLeader Exception.
 
 I tried switching off the auto.leader.relabalance to false. I am
still
 noticing the rebalance happening. My understanding was the rebalance
 will
 not happen when this is set to false.
 
 Thanks
 Zakee
 
 
 
 On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
j...@linkedin.com.INVALID
 wrote:
 
 I don’t think num.replica.fetchers will help in this case.
Increasing
 number of fetcher threads will only help in cases where you have a
 large
 amount of data coming into a broker and more replica fetcher
threads
 will
 help keep up. We usually only use 1-2 for each broker. But in your
 case,
 it looks that leader migration cause issue.
 Do you see anything else in the log? Like preferred leader
election?
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net
 mailto:kzak...@netzero.net wrote:
 
 Thanks, Jiangjie.
 
 Yes, I do see under partitions usually shooting every hour.
Anythings
 that
 I could try to reduce it?
 
 How does num.replica.fetchers affect the replica sync? Currently
 have
 configured 7 each of 5 brokers.
 
 -Zakee
 
 On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
 j...@linkedin.com.invalid
 wrote:
 
 These messages are usually caused by leader migration. I think as
 long
 as
 you don¹t see this lasting for ever and got a bunch of under
 replicated
 partitions, it should be fine.
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 4:07 PM, Zakee 

Re: Broker Exceptions

2015-03-09 Thread Zakee
No broker restarts.

Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 
https://issues.apache.org/jira/browse/KAFKA-2011

 Logs for rebalance:
 [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed 
 preferred replica election:  (kafka.controller.KafkaController)
 …
 [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 ...
 [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 ...
 [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica 
 leader election for partitions  (kafka.controller.KafkaController)
 ...
 [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing 
 preferred replica election:  (kafka.controller.KafkaController)
 
 Also, I still see lots of below errors (~69k) going on in the logs since the 
 restart. Is there any other reason than rebalance for these errors?
 
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
 partition [Topic-11,7] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
 partition [Topic-2,25] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
 partition [Topic-2,21] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
 partition [Topic-22,9] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)


  Could you paste the related logs in controller.log?
What specifically should I search for in the logs?

Thanks,
Zakee



 On Mar 9, 2015, at 11:35 AM, Jiangjie Qin j...@linkedin.com.INVALID 
 mailto:j...@linkedin.com.INVALID wrote:
 
 Is there anything wrong with brokers around that time? E.g. Broker restart?
 The log you pasted are actually from replica fetchers. Could you paste the
 related logs in controller.log?
 
 Thanks.
 
 Jiangjie (Becket) Qin
 
 On 3/9/15, 10:32 AM, Zakee kzak...@netzero.net 
 mailto:kzak...@netzero.net wrote:
 
 Correction: Actually  the rebalance happened quite until 24 hours after
 the start, and thats where below errors were found. Ideally rebalance
 should not have happened at all.
 
 
 Thanks
 Zakee
 
 
 
 On Mar 9, 2015, at 10:28 AM, Zakee kzak...@netzero.net 
 mailto:kzak...@netzero.net wrote:
 
 Hmm, that sounds like a bug. Can you paste the log of leader rebalance
 here?
 Thanks for you suggestions.
 It looks like the rebalance actually happened only once soon after I
 started with clean cluster and data was pushed, it didn’t happen again
 so far, and I see the partitions leader counts on brokers did not change
 since then. One of the brokers was constantly showing 0 for partition
 leader count. Is that normal?
 
 Also, I still see lots of below errors (~69k) going on in the logs
 since the restart. Is there any other reason than rebalance for these
 errors?
 
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
 partition [Topic-11,7] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
 partition [Topic-2,25] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
 partition [Topic-2,21] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
 partition [Topic-22,9] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 
 Some other things to check are:
 1. The actual property name is auto.leader.rebalance.enable, not
 auto.leader.rebalance. You’ve probably known this, just to double
 confirm.
 Yes 
 
 2. In zookeeper path, can you verify /admin/preferred_replica_election
 does not exist?
 ls /admin
 [delete_topics]
 ls /admin/preferred_replica_election
 Node does not exist: /admin/preferred_replica_election
 
 
 Thanks
 Zakee
 
 
 
 On Mar 7, 2015, at 10:49 PM, Jiangjie Qin j...@linkedin.com.INVALID 
 mailto:j...@linkedin.com.INVALID
 wrote:
 
 Hmm, that sounds like a bug. Can you paste the log of leader rebalance
 here?
 Some other things to check are:
 1. The actual property name is auto.leader.rebalance.enable, not
 auto.leader.rebalance. You’ve probably known this, just to double
 confirm.
 2. In zookeeper path, can you verify 

Re: Broker Exceptions

2015-03-09 Thread Kazim Zakee
No broker restarts.

Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 
https://issues.apache.org/jira/browse/KAFKA-2011

 Logs for rebalance:
 [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed 
 preferred replica election:  (kafka.controller.KafkaController)
 …
 [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 ...
 [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 ...
 [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica 
 leader election for partitions  (kafka.controller.KafkaController)
 ...
 [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing 
 preferred replica election:  (kafka.controller.KafkaController)
 
 Also, I still see lots of below errors (~69k) going on in the logs since the 
 restart. Is there any other reason than rebalance for these errors?
 
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
 partition [Topic-11,7] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
 partition [Topic-2,25] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
 partition [Topic-2,21] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
 partition [Topic-22,9] to broker 5:class 
 kafka.common.NotLeaderForPartitionException 
 (kafka.server.ReplicaFetcherThread)


  Could you paste the
 related logs in controller.log?
What specifically should I search for in the logs?

Thanks,
Kazim Zakee



 On Mar 9, 2015, at 11:35 AM, Jiangjie Qin j...@linkedin.com.INVALID wrote:
 
 Is there anything wrong with brokers around that time? E.g. Broker restart?
 The log you pasted are actually from replica fetchers. Could you paste the
 related logs in controller.log?
 
 Thanks.
 
 Jiangjie (Becket) Qin
 
 On 3/9/15, 10:32 AM, Zakee kzak...@netzero.net 
 mailto:kzak...@netzero.net wrote:
 
 Correction: Actually  the rebalance happened quite until 24 hours after
 the start, and thats where below errors were found. Ideally rebalance
 should not have happened at all.
 
 
 Thanks
 Zakee
 
 
 
 On Mar 9, 2015, at 10:28 AM, Zakee kzak...@netzero.net wrote:
 
 Hmm, that sounds like a bug. Can you paste the log of leader rebalance
 here?
 Thanks for you suggestions.
 It looks like the rebalance actually happened only once soon after I
 started with clean cluster and data was pushed, it didn’t happen again
 so far, and I see the partitions leader counts on brokers did not change
 since then. One of the brokers was constantly showing 0 for partition
 leader count. Is that normal?
 
 Also, I still see lots of below errors (~69k) going on in the logs
 since the restart. Is there any other reason than rebalance for these
 errors?
 
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
 partition [Topic-11,7] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
 partition [Topic-2,25] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
 partition [Topic-2,21] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
 partition [Topic-22,9] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 
 Some other things to check are:
 1. The actual property name is auto.leader.rebalance.enable, not
 auto.leader.rebalance. You’ve probably known this, just to double
 confirm.
 Yes 
 
 2. In zookeeper path, can you verify /admin/preferred_replica_election
 does not exist?
 ls /admin
 [delete_topics]
 ls /admin/preferred_replica_election
 Node does not exist: /admin/preferred_replica_election
 
 
 Thanks
 Zakee
 
 
 
 On Mar 7, 2015, at 10:49 PM, Jiangjie Qin j...@linkedin.com.INVALID
 wrote:
 
 Hmm, that sounds like a bug. Can you paste the log of leader rebalance
 here?
 Some other things to check are:
 1. The actual property name is auto.leader.rebalance.enable, not
 auto.leader.rebalance. You’ve probably known this, just to double
 confirm.
 2. In zookeeper path, can you verify /admin/preferred_replica_election
 does not exist?
 
 Jiangjie (Becket) Qin
 
 On 3/7/15, 10:24 

Re: Broker Exceptions

2015-03-07 Thread Zakee
I started with  clean cluster and started to push data. It still does the 
rebalance at random durations even though the auto.leader.relabalance is set to 
false.

Thanks
Zakee



 On Mar 6, 2015, at 3:51 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote:
 
 Yes, the rebalance should not happen in that case. That is a little bit
 strange. Could you try to launch a clean Kafka cluster with
 auto.leader.election disabled and try push data?
 When leader migration occurs, NotLeaderForPartition exception is expected.
 
 Jiangjie (Becket) Qin
 
 
 On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote:
 
 Yes, Jiangjie, I do see lots of these errors Starting preferred replica
 leader election for partitions” in logs. I also see lot of Produce
 request failure warnings in with the NotLeader Exception.
 
 I tried switching off the auto.leader.relabalance to false. I am still
 noticing the rebalance happening. My understanding was the rebalance will
 not happen when this is set to false.
 
 Thanks
 Zakee
 
 
 
 On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID
 wrote:
 
 I don’t think num.replica.fetchers will help in this case. Increasing
 number of fetcher threads will only help in cases where you have a large
 amount of data coming into a broker and more replica fetcher threads
 will
 help keep up. We usually only use 1-2 for each broker. But in your case,
 it looks that leader migration cause issue.
 Do you see anything else in the log? Like preferred leader election?
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net
 mailto:kzak...@netzero.net wrote:
 
 Thanks, Jiangjie.
 
 Yes, I do see under partitions usually shooting every hour. Anythings
 that
 I could try to reduce it?
 
 How does num.replica.fetchers affect the replica sync? Currently have
 configured 7 each of 5 brokers.
 
 -Zakee
 
 On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
 j...@linkedin.com.invalid
 wrote:
 
 These messages are usually caused by leader migration. I think as long
 as
 you don¹t see this lasting for ever and got a bunch of under
 replicated
 partitions, it should be fine.
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote:
 
 Need to know if I should I be worried about this or ignore them.
 
 I see tons of these exceptions/warnings in the broker logs, not sure
 what
 causes them and what could be done to fix them.
 
 ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
 broker
 5:class kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
 partition [TestTopic] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
 request
 with correlation id 950084 from client ReplicaFetcherThread-1-2 on
 partition [TestTopic,2] failed due to Leader not local for partition
 [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
 
 
 Any ideas?
 
 -Zakee
 
 Next Apple Sensation
 1 little-known path to big profits
 
 
 http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03v
 uc
 
 
 Extended Stay America
 Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free
 WIFI
 
 http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02du
 c
 
 
 
 
 Extended Stay America
 Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
 http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
 http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
 
 
 
 The WORST exercise for aging
 Avoid this #34;healthy#34; exercise to look  feel 5-10 years YOUNGER
 http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07duc



Re: Broker Exceptions

2015-03-06 Thread Zakee
Yes, Jiangjie, I do see lots of these errors Starting preferred replica leader 
election for partitions” in logs. I also see lot of Produce request failure 
warnings in with the NotLeader Exception. 

I tried switching off the auto.leader.relabalance to false. I am still noticing 
the rebalance happening. My understanding was the rebalance will not happen 
when this is set to false.  

Thanks
Zakee



 On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote:
 
 I don’t think num.replica.fetchers will help in this case. Increasing
 number of fetcher threads will only help in cases where you have a large
 amount of data coming into a broker and more replica fetcher threads will
 help keep up. We usually only use 1-2 for each broker. But in your case,
 it looks that leader migration cause issue.
 Do you see anything else in the log? Like preferred leader election?
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net 
 mailto:kzak...@netzero.net wrote:
 
 Thanks, Jiangjie.
 
 Yes, I do see under partitions usually shooting every hour. Anythings that
 I could try to reduce it?
 
 How does num.replica.fetchers affect the replica sync? Currently have
 configured 7 each of 5 brokers.
 
 -Zakee
 
 On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid
 wrote:
 
 These messages are usually caused by leader migration. I think as long
 as
 you don¹t see this lasting for ever and got a bunch of under replicated
 partitions, it should be fine.
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote:
 
 Need to know if I should I be worried about this or ignore them.
 
 I see tons of these exceptions/warnings in the broker logs, not sure
 what
 causes them and what could be done to fix them.
 
 ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
 broker
 5:class kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
 partition [TestTopic] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
 request
 with correlation id 950084 from client ReplicaFetcherThread-1-2 on
 partition [TestTopic,2] failed due to Leader not local for partition
 [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
 
 
 Any ideas?
 
 -Zakee
 
 Next Apple Sensation
 1 little-known path to big profits
 
 http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc
 
 
 Extended Stay America
 Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI
 http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02duc
 
 
 
 
 Extended Stay America
 Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
 http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc 
 http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc


Re: Broker Exceptions

2015-03-06 Thread Jiangjie Qin
Yes, the rebalance should not happen in that case. That is a little bit
strange. Could you try to launch a clean Kafka cluster with
auto.leader.election disabled and try push data?
When leader migration occurs, NotLeaderForPartition exception is expected.

Jiangjie (Becket) Qin


On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote:

Yes, Jiangjie, I do see lots of these errors Starting preferred replica
leader election for partitions” in logs. I also see lot of Produce
request failure warnings in with the NotLeader Exception.

I tried switching off the auto.leader.relabalance to false. I am still
noticing the rebalance happening. My understanding was the rebalance will
not happen when this is set to false.

Thanks
Zakee



 On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID
wrote:
 
 I don’t think num.replica.fetchers will help in this case. Increasing
 number of fetcher threads will only help in cases where you have a large
 amount of data coming into a broker and more replica fetcher threads
will
 help keep up. We usually only use 1-2 for each broker. But in your case,
 it looks that leader migration cause issue.
 Do you see anything else in the log? Like preferred leader election?
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net
mailto:kzak...@netzero.net wrote:
 
 Thanks, Jiangjie.
 
 Yes, I do see under partitions usually shooting every hour. Anythings
that
 I could try to reduce it?
 
 How does num.replica.fetchers affect the replica sync? Currently have
 configured 7 each of 5 brokers.
 
 -Zakee
 
 On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
j...@linkedin.com.invalid
 wrote:
 
 These messages are usually caused by leader migration. I think as long
 as
 you don¹t see this lasting for ever and got a bunch of under
replicated
 partitions, it should be fine.
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote:
 
 Need to know if I should I be worried about this or ignore them.
 
 I see tons of these exceptions/warnings in the broker logs, not sure
 what
 causes them and what could be done to fix them.
 
 ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
 broker
 5:class kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
 partition [TestTopic] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
 request
 with correlation id 950084 from client ReplicaFetcherThread-1-2 on
 partition [TestTopic,2] failed due to Leader not local for partition
 [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
 
 
 Any ideas?
 
 -Zakee
 
 Next Apple Sensation
 1 little-known path to big profits
 
 
http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03v
uc
 
 
 Extended Stay America
 Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free
WIFI
 
http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02du
c
 
 
 
 
 Extended Stay America
 Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
 http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc



Re: Broker Exceptions

2015-03-06 Thread Zakee
Thanks, Jiangjie, I will try with a clean cluster again.

Thanks
Zakee



 On Mar 6, 2015, at 3:51 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote:
 
 Yes, the rebalance should not happen in that case. That is a little bit
 strange. Could you try to launch a clean Kafka cluster with
 auto.leader.election disabled and try push data?
 When leader migration occurs, NotLeaderForPartition exception is expected.
 
 Jiangjie (Becket) Qin
 
 
 On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote:
 
 Yes, Jiangjie, I do see lots of these errors Starting preferred replica
 leader election for partitions” in logs. I also see lot of Produce
 request failure warnings in with the NotLeader Exception.
 
 I tried switching off the auto.leader.relabalance to false. I am still
 noticing the rebalance happening. My understanding was the rebalance will
 not happen when this is set to false.
 
 Thanks
 Zakee
 
 
 
 On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID
 wrote:
 
 I don’t think num.replica.fetchers will help in this case. Increasing
 number of fetcher threads will only help in cases where you have a large
 amount of data coming into a broker and more replica fetcher threads
 will
 help keep up. We usually only use 1-2 for each broker. But in your case,
 it looks that leader migration cause issue.
 Do you see anything else in the log? Like preferred leader election?
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net
 mailto:kzak...@netzero.net wrote:
 
 Thanks, Jiangjie.
 
 Yes, I do see under partitions usually shooting every hour. Anythings
 that
 I could try to reduce it?
 
 How does num.replica.fetchers affect the replica sync? Currently have
 configured 7 each of 5 brokers.
 
 -Zakee
 
 On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
 j...@linkedin.com.invalid
 wrote:
 
 These messages are usually caused by leader migration. I think as long
 as
 you don¹t see this lasting for ever and got a bunch of under
 replicated
 partitions, it should be fine.
 
 Jiangjie (Becket) Qin
 
 On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote:
 
 Need to know if I should I be worried about this or ignore them.
 
 I see tons of these exceptions/warnings in the broker logs, not sure
 what
 causes them and what could be done to fix them.
 
 ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
 broker
 5:class kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
 partition [TestTopic] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
 request
 with correlation id 950084 from client ReplicaFetcherThread-1-2 on
 partition [TestTopic,2] failed due to Leader not local for partition
 [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
 
 
 Any ideas?
 
 -Zakee
 
 Next Apple Sensation
 1 little-known path to big profits
 
 
 http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03v
 uc
 
 
 Extended Stay America
 Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free
 WIFI
 
 http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02du
 c
 
 
 
 
 Extended Stay America
 Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
 http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
 http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
 
 
 
 The WORST exercise for aging
 Avoid this #34;healthy#34; exercise to look  feel 5-10 years YOUNGER
 http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07duc



Re: Broker Exceptions

2015-02-25 Thread Jiangjie Qin
These messages are usually caused by leader migration. I think as long as
you don¹t see this lasting for ever and got a bunch of under replicated
partitions, it should be fine.

Jiangjie (Becket) Qin

On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote:

Need to know if I should I be worried about this or ignore them.

I see tons of these exceptions/warnings in the broker logs, not sure what
causes them and what could be done to fix them.

ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
broker
5:class kafka.common.NotLeaderForPartitionException
(kafka.server.ReplicaFetcherThread)
[2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
partition [TestTopic] to broker 5:class
kafka.common.NotLeaderForPartitionException
(kafka.server.ReplicaFetcherThread)
[2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
request
with correlation id 950084 from client ReplicaFetcherThread-1-2 on
partition [TestTopic,2] failed due to Leader not local for partition
[TestTopic,2] on broker 2 (kafka.server.ReplicaManager)


Any ideas?

-Zakee

Next Apple Sensation
1 little-known path to big profits
http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc



Re: Broker Exceptions

2015-02-25 Thread Zakee
Thanks, Jiangjie.

Yes, I do see under partitions usually shooting every hour. Anythings that
I could try to reduce it?

How does num.replica.fetchers affect the replica sync? Currently have
configured 7 each of 5 brokers.

-Zakee

On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid
wrote:

 These messages are usually caused by leader migration. I think as long as
 you don¹t see this lasting for ever and got a bunch of under replicated
 partitions, it should be fine.

 Jiangjie (Becket) Qin

 On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote:

 Need to know if I should I be worried about this or ignore them.
 
 I see tons of these exceptions/warnings in the broker logs, not sure what
 causes them and what could be done to fix them.
 
 ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
 broker
 5:class kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
 partition [TestTopic] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
 request
 with correlation id 950084 from client ReplicaFetcherThread-1-2 on
 partition [TestTopic,2] failed due to Leader not local for partition
 [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
 
 
 Any ideas?
 
 -Zakee
 
 Next Apple Sensation
 1 little-known path to big profits
 http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc

 
 Extended Stay America
 Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI
 http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02duc



Re: Broker Exceptions

2015-02-25 Thread Jiangjie Qin
I don’t think num.replica.fetchers will help in this case. Increasing
number of fetcher threads will only help in cases where you have a large
amount of data coming into a broker and more replica fetcher threads will
help keep up. We usually only use 1-2 for each broker. But in your case,
it looks that leader migration cause issue.
Do you see anything else in the log? Like preferred leader election?

Jiangjie (Becket) Qin

On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net wrote:

Thanks, Jiangjie.

Yes, I do see under partitions usually shooting every hour. Anythings that
I could try to reduce it?

How does num.replica.fetchers affect the replica sync? Currently have
configured 7 each of 5 brokers.

-Zakee

On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid
wrote:

 These messages are usually caused by leader migration. I think as long
as
 you don¹t see this lasting for ever and got a bunch of under replicated
 partitions, it should be fine.

 Jiangjie (Becket) Qin

 On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote:

 Need to know if I should I be worried about this or ignore them.
 
 I see tons of these exceptions/warnings in the broker logs, not sure
what
 causes them and what could be done to fix them.
 
 ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
 broker
 5:class kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
 partition [TestTopic] to broker 5:class
 kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)
 [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
 request
 with correlation id 950084 from client ReplicaFetcherThread-1-2 on
 partition [TestTopic,2] failed due to Leader not local for partition
 [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
 
 
 Any ideas?
 
 -Zakee
 
 Next Apple Sensation
 1 little-known path to big profits
 
http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc

 
 Extended Stay America
 Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI
 http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02duc