Re: Broker Exceptions
Hi Mayuresh, Here are the logs. Broker-4 [2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 17:49:40,762] INFO Partition [Topic22kv,5] on broker 4: Expanding ISR for partition [Topic22kv,5] from 2,4 to 2,4,3 (kafka.cluster.Partition) [2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 18:47:11,539] INFO Partition [Topic22kv,5] on broker 4: Expanding ISR for partition [Topic22kv,5] from 2,4 to 2,4,3 (kafka.cluster.Partition) [2015-03-13 20:09:10,460] INFO Deleting index /vol11/kafka82/Topic22kv-5/001213156892.index.deleted (kafka.log.OffsetIndex) [2015-03-13 20:23:10,377] INFO Scheduling log segment 1218520176 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 20:24:10,377] INFO Deleting segment 1218520176 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 20:24:10,444] INFO Deleting index /vol11/kafka82/Topic22kv-5/001218520176.index.deleted (kafka.log.OffsetIndex) [2015-03-13 20:26:28,789] INFO Rolled new log segment for 'Topic22kv-5' in 1 ms. (kafka.log.Log) [2015-03-13 20:38:10,333] INFO Scheduling log segment 1223883126 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 20:39:10,334] INFO Deleting segment 1223883126 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 20:39:10,478] INFO Deleting index /vol11/kafka82/Topic22kv-5/001223883126.index.deleted (kafka.log.OffsetIndex) [2015-03-13 20:46:16,924] INFO Rolled new log segment for 'Topic22kv-5' in 1 ms. (kafka.log.Log) [2015-03-13 20:53:10,370] INFO Scheduling log segment 1229245987 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 20:54:10,371] INFO Deleting segment 1229245987 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 20:54:10,444] INFO Deleting index /vol11/kafka82/Topic22kv-5/001229245987.index.deleted (kafka.log.OffsetIndex) [2015-03-13 21:03:10,347] INFO Scheduling log segment 1234609321 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 21:04:10,348] INFO Deleting segment 1234609321 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 21:04:10,519] INFO Deleting index /vol11/kafka82/Topic22kv-5/001234609321.index.deleted (kafka.log.OffsetIndex) [2015-03-13 21:06:07,810] INFO Rolled new log segment for 'Topic22kv-5' in 0 ms. (kafka.log.Log) [2015-03-13 21:18:10,355] INFO Scheduling log segment 1239972496 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 21:19:10,408] INFO Deleting segment 1239972496 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 21:19:10,516] INFO Deleting index /vol11/kafka82/Topic22kv-5/001239972496.index.deleted (kafka.log.OffsetIndex) [2015-03-13 21:25:49,058] INFO Rolled new log segment for 'Topic22kv-5' in 0 ms. (kafka.log.Log) [2015-03-13 21:38:10,344] INFO Scheduling log segment 1245335417 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 21:39:10,443] INFO Deleting segment 1245335417 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 21:39:10,539] INFO Deleting index /vol11/kafka82/Topic22kv-5/001245335417.index.deleted (kafka.log.OffsetIndex) [2015-03-13 21:45:30,609] INFO Rolled new log segment for 'Topic22kv-5' in 1 ms. (kafka.log.Log) [2015-03-13 21:53:10,340] INFO Scheduling log segment 1250698493 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 21:54:10,340] INFO Deleting segment 1250698493 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 21:54:10,456] INFO Deleting index /vol11/kafka82/Topic22kv-5/001250698493.index.deleted (kafka.log.OffsetIndex) [2015-03-13 22:05:31,719] INFO Rolled new log segment for 'Topic22kv-5' in 1 ms. (kafka.log.Log) [2015-03-13 22:13:10,333] INFO Scheduling log segment 1256061631 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 22:14:10,333] INFO Deleting segment 1256061631 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 22:14:10,406] INFO Deleting index /vol11/kafka82/Topic22kv-5/001256061631.index.deleted (kafka.log.OffsetIndex) [2015-03-13 22:25:12,659] INFO Rolled new log segment for 'Topic22kv-5' in 0 ms. (kafka.log.Log) [2015-03-13 22:33:10,390] INFO Scheduling log segment 1261424153 for log Topic22kv-5 for deletion.
Re: Broker Exceptions
Hi Mayuresh, Here are the logs. Old School Yearbook Pics View Class Yearbooks Online Free. Search by School Year. Look Now! http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vucBroker-4 [2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 17:49:40,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 17:49:40,762] INFO Partition [Topic22kv,5] on broker 4: Expanding ISR for partition [Topic22kv,5] from 2,4 to 2,4,3 (kafka.cluster.Partition) [2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 18:47:10,514] INFO Partition [Topic22kv,5] on broker 4: Shrinking ISR for partition [Topic22kv,5] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2015-03-13 18:47:11,539] INFO Partition [Topic22kv,5] on broker 4: Expanding ISR for partition [Topic22kv,5] from 2,4 to 2,4,3 (kafka.cluster.Partition) [2015-03-13 20:09:10,460] INFO Deleting index /vol11/kafka82/Topic22kv-5/001213156892.index.deleted (kafka.log.OffsetIndex) [2015-03-13 20:23:10,377] INFO Scheduling log segment 1218520176 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 20:24:10,377] INFO Deleting segment 1218520176 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 20:24:10,444] INFO Deleting index /vol11/kafka82/Topic22kv-5/001218520176.index.deleted (kafka.log.OffsetIndex) [2015-03-13 20:26:28,789] INFO Rolled new log segment for 'Topic22kv-5' in 1 ms. (kafka.log.Log) [2015-03-13 20:38:10,333] INFO Scheduling log segment 1223883126 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 20:39:10,334] INFO Deleting segment 1223883126 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 20:39:10,478] INFO Deleting index /vol11/kafka82/Topic22kv-5/001223883126.index.deleted (kafka.log.OffsetIndex) [2015-03-13 20:46:16,924] INFO Rolled new log segment for 'Topic22kv-5' in 1 ms. (kafka.log.Log) [2015-03-13 20:53:10,370] INFO Scheduling log segment 1229245987 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 20:54:10,371] INFO Deleting segment 1229245987 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 20:54:10,444] INFO Deleting index /vol11/kafka82/Topic22kv-5/001229245987.index.deleted (kafka.log.OffsetIndex) [2015-03-13 21:03:10,347] INFO Scheduling log segment 1234609321 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 21:04:10,348] INFO Deleting segment 1234609321 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 21:04:10,519] INFO Deleting index /vol11/kafka82/Topic22kv-5/001234609321.index.deleted (kafka.log.OffsetIndex) [2015-03-13 21:06:07,810] INFO Rolled new log segment for 'Topic22kv-5' in 0 ms. (kafka.log.Log) [2015-03-13 21:18:10,355] INFO Scheduling log segment 1239972496 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 21:19:10,408] INFO Deleting segment 1239972496 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 21:19:10,516] INFO Deleting index /vol11/kafka82/Topic22kv-5/001239972496.index.deleted (kafka.log.OffsetIndex) [2015-03-13 21:25:49,058] INFO Rolled new log segment for 'Topic22kv-5' in 0 ms. (kafka.log.Log) [2015-03-13 21:38:10,344] INFO Scheduling log segment 1245335417 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 21:39:10,443] INFO Deleting segment 1245335417 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 21:39:10,539] INFO Deleting index /vol11/kafka82/Topic22kv-5/001245335417.index.deleted (kafka.log.OffsetIndex) [2015-03-13 21:45:30,609] INFO Rolled new log segment for 'Topic22kv-5' in 1 ms. (kafka.log.Log) [2015-03-13 21:53:10,340] INFO Scheduling log segment 1250698493 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 21:54:10,340] INFO Deleting segment 1250698493 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 21:54:10,456] INFO Deleting index /vol11/kafka82/Topic22kv-5/001250698493.index.deleted (kafka.log.OffsetIndex) [2015-03-13 22:05:31,719] INFO Rolled new log segment for 'Topic22kv-5' in 1 ms. (kafka.log.Log) [2015-03-13 22:13:10,333] INFO Scheduling log segment 1256061631 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-13 22:14:10,333] INFO Deleting segment 1256061631 from log Topic22kv-5. (kafka.log.Log) [2015-03-13 22:14:10,406] INFO Deleting index /vol11/kafka82/Topic22kv-5/001256061631.index.deleted
Re: Broker Exceptions
What version are you running ? Version 0.8.2.0 Your case is 2). But the only thing weird is your replica (broker 3) is requesting for offset which is greater than the leaders log end offset. So what could be the cause? Thanks Zakee On Mar 17, 2015, at 11:45 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: What version are you running ? The code for latest version says that : 1) if the log end offset of the replica is greater than the leaders log end offset, the replicas offset will be reset to logEndOffset of the leader. 2) Else if the log end offset of the replica is smaller than the leaders log end offset and its out of range, the replicas offset will be reset to logStartOffset of the leader. Your case is 2). But the only thing weird is your replica (broker 3) is requesting for offset which is greater than the leaders log end offset. Thanks, Mayuresh On Tue, Mar 17, 2015 at 10:26 AM, Mayuresh Gharat gharatmayures...@gmail.com mailto:gharatmayures...@gmail.com wrote: cool. On Tue, Mar 17, 2015 at 10:15 AM, Zakee kzak...@netzero.net wrote: Hi Mayuresh, The logs are already attached and are in reverse order starting backwards from [2015-03-14 07:46:52,517] to the time when brokers were started. Thanks Zakee On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Hi Zakee, Thanks for the logs. Can you paste earlier logs from broker-3 up to : [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 (kafka.server.ReplicaFetcherThread) That would help us figure out what was happening on this broker before it issued a replicaFetch request to broker-4. Thanks, Mayuresh On Mon, Mar 16, 2015 at 11:32 PM, Zakee kzak...@netzero.net wrote: Hi Mayuresh, Here are the logs. Old School Yearbook Pics View Class Yearbooks Online Free. Search by School Year. Look Now! http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc Thanks, Kazim Zakee On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Can you provide more logs (complete) on Broker 3 till time : *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread) I would like to see logs from time much before it sent the fetch request to Broker 4 to the time above. I want to check if in any case Broker 3 was a leader before broker 4 took over. Additional logs will help. Thanks, Mayuresh On Sat, Mar 14, 2015 at 8:35 PM, Zakee kzak...@netzero.net wrote: log.cleanup.policy is delete not compact. log.cleaner.enable=true log.cleaner.threads=5 log.cleanup.policy=delete log.flush.scheduler.interval.ms=3000 log.retention.minutes=1440 log.segment.bytes=1073741824 (1gb) Messages are keyed but not compressed, producer async and uses kafka default partitioner. String message = msg.getString(); String uniqKey = +rnd.nextInt();// random key String partKey = getPartitionKey();// partition key KeyedMessageString, String data = new KeyedMessageString, String(this.topicName, uniqKey, partKey, message); producer.send(data); Thanks Zakee On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote: Is your topic log compacted? Also if it is are the messages keyed? Or are the messages compressed? Thanks, Mayuresh Sent from my iPhone On Mar 14, 2015, at 2:02 PM, Zakee kzak...@netzero.net mailto: kzak...@netzero.net wrote: Thanks, Jiangjie for helping resolve the kafka controller migration driven partition leader rebalance issue. The logs are much cleaner now. There are a few incidences of Out of range offset even though there is no consumers running, only producers and replica fetchers. I was trying to relate to a cause, looks like compaction (log segment deletion) causing this. Not sure whether this is expected behavior. Broker-4: [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when processing fetch request for partition [Topic22kv,5] offset 1754769769 from follower with correlation id 1645671. Possible cause: Request for offset 1754769769 but we only have log segments in the range 1400864851 to 1754769732. (kafka.server.ReplicaManager) Broker-3: [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is aborted and paused (kafka.log.LogCleaner) [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log Topic22kv-5 for deletion. (kafka.log.Log) … [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is resumed (kafka.log.LogCleaner) [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4],
Re: Broker Exceptions
What version are you running ? The code for latest version says that : 1) if the log end offset of the replica is greater than the leaders log end offset, the replicas offset will be reset to logEndOffset of the leader. 2) Else if the log end offset of the replica is smaller than the leaders log end offset and its out of range, the replicas offset will be reset to logStartOffset of the leader. Your case is 2). But the only thing weird is your replica (broker 3) is requesting for offset which is greater than the leaders log end offset. Thanks, Mayuresh On Tue, Mar 17, 2015 at 10:26 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: cool. On Tue, Mar 17, 2015 at 10:15 AM, Zakee kzak...@netzero.net wrote: Hi Mayuresh, The logs are already attached and are in reverse order starting backwards from [2015-03-14 07:46:52,517] to the time when brokers were started. Thanks Zakee On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Hi Zakee, Thanks for the logs. Can you paste earlier logs from broker-3 up to : [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 (kafka.server.ReplicaFetcherThread) That would help us figure out what was happening on this broker before it issued a replicaFetch request to broker-4. Thanks, Mayuresh On Mon, Mar 16, 2015 at 11:32 PM, Zakee kzak...@netzero.net wrote: Hi Mayuresh, Here are the logs. Old School Yearbook Pics View Class Yearbooks Online Free. Search by School Year. Look Now! http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc Thanks, Kazim Zakee On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Can you provide more logs (complete) on Broker 3 till time : *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread) I would like to see logs from time much before it sent the fetch request to Broker 4 to the time above. I want to check if in any case Broker 3 was a leader before broker 4 took over. Additional logs will help. Thanks, Mayuresh On Sat, Mar 14, 2015 at 8:35 PM, Zakee kzak...@netzero.net wrote: log.cleanup.policy is delete not compact. log.cleaner.enable=true log.cleaner.threads=5 log.cleanup.policy=delete log.flush.scheduler.interval.ms=3000 log.retention.minutes=1440 log.segment.bytes=1073741824 (1gb) Messages are keyed but not compressed, producer async and uses kafka default partitioner. String message = msg.getString(); String uniqKey = +rnd.nextInt();// random key String partKey = getPartitionKey();// partition key KeyedMessageString, String data = new KeyedMessageString, String(this.topicName, uniqKey, partKey, message); producer.send(data); Thanks Zakee On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote: Is your topic log compacted? Also if it is are the messages keyed? Or are the messages compressed? Thanks, Mayuresh Sent from my iPhone On Mar 14, 2015, at 2:02 PM, Zakee kzak...@netzero.net mailto: kzak...@netzero.net wrote: Thanks, Jiangjie for helping resolve the kafka controller migration driven partition leader rebalance issue. The logs are much cleaner now. There are a few incidences of Out of range offset even though there is no consumers running, only producers and replica fetchers. I was trying to relate to a cause, looks like compaction (log segment deletion) causing this. Not sure whether this is expected behavior. Broker-4: [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when processing fetch request for partition [Topic22kv,5] offset 1754769769 from follower with correlation id 1645671. Possible cause: Request for offset 1754769769 but we only have log segments in the range 1400864851 to 1754769732. (kafka.server.ReplicaManager) Broker-3: [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is aborted and paused (kafka.log.LogCleaner) [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log Topic22kv-5 for deletion. (kafka.log.Log) … [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is resumed (kafka.log.LogCleaner) [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 (kafka.server.ReplicaFetcherThread) [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start
Re: Broker Exceptions
Hi Mayuresh, The logs are already attached and are in reverse order starting backwards from [2015-03-14 07:46:52,517] to the time when brokers were started. Thanks Zakee On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Hi Zakee, Thanks for the logs. Can you paste earlier logs from broker-3 up to : [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 (kafka.server.ReplicaFetcherThread) That would help us figure out what was happening on this broker before it issued a replicaFetch request to broker-4. Thanks, Mayuresh On Mon, Mar 16, 2015 at 11:32 PM, Zakee kzak...@netzero.net wrote: Hi Mayuresh, Here are the logs. Old School Yearbook Pics View Class Yearbooks Online Free. Search by School Year. Look Now! http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc Thanks, Kazim Zakee On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Can you provide more logs (complete) on Broker 3 till time : *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread) I would like to see logs from time much before it sent the fetch request to Broker 4 to the time above. I want to check if in any case Broker 3 was a leader before broker 4 took over. Additional logs will help. Thanks, Mayuresh On Sat, Mar 14, 2015 at 8:35 PM, Zakee kzak...@netzero.net wrote: log.cleanup.policy is delete not compact. log.cleaner.enable=true log.cleaner.threads=5 log.cleanup.policy=delete log.flush.scheduler.interval.ms=3000 log.retention.minutes=1440 log.segment.bytes=1073741824 (1gb) Messages are keyed but not compressed, producer async and uses kafka default partitioner. String message = msg.getString(); String uniqKey = +rnd.nextInt();// random key String partKey = getPartitionKey();// partition key KeyedMessageString, String data = new KeyedMessageString, String(this.topicName, uniqKey, partKey, message); producer.send(data); Thanks Zakee On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote: Is your topic log compacted? Also if it is are the messages keyed? Or are the messages compressed? Thanks, Mayuresh Sent from my iPhone On Mar 14, 2015, at 2:02 PM, Zakee kzak...@netzero.net mailto: kzak...@netzero.net wrote: Thanks, Jiangjie for helping resolve the kafka controller migration driven partition leader rebalance issue. The logs are much cleaner now. There are a few incidences of Out of range offset even though there is no consumers running, only producers and replica fetchers. I was trying to relate to a cause, looks like compaction (log segment deletion) causing this. Not sure whether this is expected behavior. Broker-4: [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when processing fetch request for partition [Topic22kv,5] offset 1754769769 from follower with correlation id 1645671. Possible cause: Request for offset 1754769769 but we only have log segments in the range 1400864851 to 1754769732. (kafka.server.ReplicaManager) Broker-3: [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is aborted and paused (kafka.log.LogCleaner) [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log Topic22kv-5 for deletion. (kafka.log.Log) … [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is resumed (kafka.log.LogCleaner) [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 (kafka.server.ReplicaFetcherThread) [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread) Old School Yearbook Pics View Class Yearbooks Online Free. Search by School Year. Look Now! http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc topic22kv_746a_314_logs.txt Thanks Zakee On Mar 9, 2015, at 12:18 PM, Zakee kzak...@netzero.net wrote: No broker restarts. Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 https://issues.apache.org/jira/browse/KAFKA-2011 Logs for rebalance: [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) [2015-03-07 16:52:48,969] INFO [Controller 2]:
Re: Broker Exceptions
We are trying to see what might have caused it. We had some questions : 1) Is this reproducible? That way we can dig deep. This looks interesting problem to solve and you might have caught a bug, but we need to verify the root cause before filing a ticket. Thanks, Mayuresh On Tue, Mar 17, 2015 at 2:10 PM, Zakee kzak...@netzero.net wrote: What version are you running ? Version 0.8.2.0 Your case is 2). But the only thing weird is your replica (broker 3) is requesting for offset which is greater than the leaders log end offset. So what could be the cause? Thanks Zakee On Mar 17, 2015, at 11:45 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: What version are you running ? The code for latest version says that : 1) if the log end offset of the replica is greater than the leaders log end offset, the replicas offset will be reset to logEndOffset of the leader. 2) Else if the log end offset of the replica is smaller than the leaders log end offset and its out of range, the replicas offset will be reset to logStartOffset of the leader. Your case is 2). But the only thing weird is your replica (broker 3) is requesting for offset which is greater than the leaders log end offset. Thanks, Mayuresh On Tue, Mar 17, 2015 at 10:26 AM, Mayuresh Gharat gharatmayures...@gmail.com mailto:gharatmayures...@gmail.com wrote: cool. On Tue, Mar 17, 2015 at 10:15 AM, Zakee kzak...@netzero.net wrote: Hi Mayuresh, The logs are already attached and are in reverse order starting backwards from [2015-03-14 07:46:52,517] to the time when brokers were started. Thanks Zakee On Mar 17, 2015, at 12:07 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Hi Zakee, Thanks for the logs. Can you paste earlier logs from broker-3 up to : [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 (kafka.server.ReplicaFetcherThread) That would help us figure out what was happening on this broker before it issued a replicaFetch request to broker-4. Thanks, Mayuresh On Mon, Mar 16, 2015 at 11:32 PM, Zakee kzak...@netzero.net wrote: Hi Mayuresh, Here are the logs. Old School Yearbook Pics View Class Yearbooks Online Free. Search by School Year. Look Now! http://thirdpartyoffers.netzero.net/TGL3231/5507ca8137dc94a805e6bst01vuc Thanks, Kazim Zakee On Mar 16, 2015, at 10:48 AM, Mayuresh Gharat gharatmayures...@gmail.com wrote: Can you provide more logs (complete) on Broker 3 till time : *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread) I would like to see logs from time much before it sent the fetch request to Broker 4 to the time above. I want to check if in any case Broker 3 was a leader before broker 4 took over. Additional logs will help. Thanks, Mayuresh On Sat, Mar 14, 2015 at 8:35 PM, Zakee kzak...@netzero.net wrote: log.cleanup.policy is delete not compact. log.cleaner.enable=true log.cleaner.threads=5 log.cleanup.policy=delete log.flush.scheduler.interval.ms=3000 log.retention.minutes=1440 log.segment.bytes=1073741824 (1gb) Messages are keyed but not compressed, producer async and uses kafka default partitioner. String message = msg.getString(); String uniqKey = +rnd.nextInt();// random key String partKey = getPartitionKey();// partition key KeyedMessageString, String data = new KeyedMessageString, String(this.topicName, uniqKey, partKey, message); producer.send(data); Thanks Zakee On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote: Is your topic log compacted? Also if it is are the messages keyed? Or are the messages compressed? Thanks, Mayuresh Sent from my iPhone On Mar 14, 2015, at 2:02 PM, Zakee kzak...@netzero.net mailto: kzak...@netzero.net wrote: Thanks, Jiangjie for helping resolve the kafka controller migration driven partition leader rebalance issue. The logs are much cleaner now. There are a few incidences of Out of range offset even though there is no consumers running, only producers and replica fetchers. I was trying to relate to a cause, looks like compaction (log segment deletion) causing this. Not sure whether this is expected behavior. Broker-4: [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when processing fetch request for partition [Topic22kv,5] offset 1754769769 from follower with correlation id 1645671. Possible cause: Request for offset 1754769769 but we only have log segments in the range 1400864851
Re: Broker Exceptions
Can you provide more logs (complete) on Broker 3 till time : *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread) I would like to see logs from time much before it sent the fetch request to Broker 4 to the time above. I want to check if in any case Broker 3 was a leader before broker 4 took over. Additional logs will help. Thanks, Mayuresh On Sat, Mar 14, 2015 at 8:35 PM, Zakee kzak...@netzero.net wrote: log.cleanup.policy is delete not compact. log.cleaner.enable=true log.cleaner.threads=5 log.cleanup.policy=delete log.flush.scheduler.interval.ms=3000 log.retention.minutes=1440 log.segment.bytes=1073741824 (1gb) Messages are keyed but not compressed, producer async and uses kafka default partitioner. String message = msg.getString(); String uniqKey = +rnd.nextInt();// random key String partKey = getPartitionKey();// partition key KeyedMessageString, String data = new KeyedMessageString, String(this.topicName, uniqKey, partKey, message); producer.send(data); Thanks Zakee On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote: Is your topic log compacted? Also if it is are the messages keyed? Or are the messages compressed? Thanks, Mayuresh Sent from my iPhone On Mar 14, 2015, at 2:02 PM, Zakee kzak...@netzero.net mailto: kzak...@netzero.net wrote: Thanks, Jiangjie for helping resolve the kafka controller migration driven partition leader rebalance issue. The logs are much cleaner now. There are a few incidences of Out of range offset even though there is no consumers running, only producers and replica fetchers. I was trying to relate to a cause, looks like compaction (log segment deletion) causing this. Not sure whether this is expected behavior. Broker-4: [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when processing fetch request for partition [Topic22kv,5] offset 1754769769 from follower with correlation id 1645671. Possible cause: Request for offset 1754769769 but we only have log segments in the range 1400864851 to 1754769732. (kafka.server.ReplicaManager) Broker-3: [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is aborted and paused (kafka.log.LogCleaner) [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log Topic22kv-5 for deletion. (kafka.log.Log) … [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is resumed (kafka.log.LogCleaner) [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 (kafka.server.ReplicaFetcherThread) [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread) Old School Yearbook Pics View Class Yearbooks Online Free. Search by School Year. Look Now! http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc topic22kv_746a_314_logs.txt Thanks Zakee On Mar 9, 2015, at 12:18 PM, Zakee kzak...@netzero.net wrote: No broker restarts. Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 https://issues.apache.org/jira/browse/KAFKA-2011 Logs for rebalance: [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed preferred replica election: (kafka.controller.KafkaController) … [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica leader election for partitions (kafka.controller.KafkaController) ... [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing preferred replica election: (kafka.controller.KafkaController) Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors? [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class
Re: Broker Exceptions
Thanks, Jiangjie for helping resolve the kafka controller migration driven partition leader rebalance issue. The logs are much cleaner now. There are a few incidences of Out of range offset even though there is no consumers running, only producers and replica fetchers. I was trying to relate to a cause, looks like compaction (log segment deletion) causing this. Not sure whether this is expected behavior. Broker-4: [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when processing fetch request for partition [Topic22kv,5] offset 1754769769 from follower with correlation id 1645671. Possible cause: Request for offset 1754769769 but we only have log segments in the range 1400864851 to 1754769732. (kafka.server.ReplicaManager) Broker-3: [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is aborted and paused (kafka.log.LogCleaner) [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log Topic22kv-5 for deletion. (kafka.log.Log) … [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is resumed (kafka.log.LogCleaner) [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to 1400864851 (kafka.server.ReplicaFetcherThread) [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread) Old School Yearbook Pics View Class Yearbooks Online Free. Search by School Year. Look Now! http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vucon broker-4 [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when processing fetch request for partition [Topic22kv,5] offset 1754769769 from follower with correlation id 1645671. Possible cause: Request for offset 1754769769 but we only have log segments in the range 1400864851 to 1754769732. (kafka.server.ReplicaManager) [2015-03-14 07:46:52,759] INFO Closing socket connection to /19.10.4.143. (kafka.network.Processor) on broker-3 [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is aborted and paused (kafka.log.LogCleaner) [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,408] INFO Scheduling log segment 1406227848 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,408] INFO Scheduling log segment 1411591123 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,408] INFO Scheduling log segment 1416954195 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,409] INFO Scheduling log segment 1422317783 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,409] INFO Scheduling log segment 1427680989 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,409] INFO Scheduling log segment 1433044302 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,409] INFO Scheduling log segment 1438407760 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,409] INFO Scheduling log segment 1443770521 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,409] INFO Scheduling log segment 1449133811 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,409] INFO Scheduling log segment 1454497169 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,410] INFO Scheduling log segment 1459860085 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,411] INFO Scheduling log segment 1465223478 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,411] INFO Scheduling log segment 1470586720 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,412] INFO Scheduling log segment 1475949659 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,412] INFO Scheduling log segment 1481312627 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,412] INFO Scheduling log segment 1486675299 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,412] INFO Scheduling log segment 1492038376 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,413] INFO Scheduling log segment 1497401497 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,413] INFO Scheduling log segment 1502764133 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,413] INFO Scheduling log segment 1508126631 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,413] INFO Scheduling log segment 1513489256 for log Topic22kv-5 for deletion. (kafka.log.Log) [2015-03-14 07:46:52,413] INFO Scheduling log segment 1518852045 for log Topic22kv-5 for deletion.
Re: Broker Exceptions
Hmm, that sounds like a bug. Can you paste the log of leader rebalance here? Thanks for you suggestions. It looks like the rebalance actually happened only once soon after I started with clean cluster and data was pushed, it didn’t happen again so far, and I see the partitions leader counts on brokers did not change since then. One of the brokers was constantly showing 0 for partition leader count. Is that normal? Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors? [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) Some other things to check are: 1. The actual property name is auto.leader.rebalance.enable, not auto.leader.rebalance. You’ve probably known this, just to double confirm. Yes 2. In zookeeper path, can you verify /admin/preferred_replica_election does not exist? ls /admin [delete_topics] ls /admin/preferred_replica_election Node does not exist: /admin/preferred_replica_election Thanks Zakee On Mar 7, 2015, at 10:49 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: Hmm, that sounds like a bug. Can you paste the log of leader rebalance here? Some other things to check are: 1. The actual property name is auto.leader.rebalance.enable, not auto.leader.rebalance. You’ve probably known this, just to double confirm. 2. In zookeeper path, can you verify /admin/preferred_replica_election does not exist? Jiangjie (Becket) Qin On 3/7/15, 10:24 PM, Zakee kzak...@netzero.net wrote: I started with clean cluster and started to push data. It still does the rebalance at random durations even though the auto.leader.relabalance is set to false. Thanks Zakee On Mar 6, 2015, at 3:51 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: Yes, the rebalance should not happen in that case. That is a little bit strange. Could you try to launch a clean Kafka cluster with auto.leader.election disabled and try push data? When leader migration occurs, NotLeaderForPartition exception is expected. Jiangjie (Becket) Qin On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote: Yes, Jiangjie, I do see lots of these errors Starting preferred replica leader election for partitions” in logs. I also see lot of Produce request failure warnings in with the NotLeader Exception. I tried switching off the auto.leader.relabalance to false. I am still noticing the rebalance happening. My understanding was the rebalance will not happen when this is set to false. Thanks Zakee On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: I don’t think num.replica.fetchers will help in this case. Increasing number of fetcher threads will only help in cases where you have a large amount of data coming into a broker and more replica fetcher threads will help keep up. We usually only use 1-2 for each broker. But in your case, it looks that leader migration cause issue. Do you see anything else in the log? Like preferred leader election? Jiangjie (Becket) Qin On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net mailto:kzak...@netzero.net wrote: Thanks, Jiangjie. Yes, I do see under partitions usually shooting every hour. Anythings that I could try to reduce it? How does num.replica.fetchers affect the replica sync? Currently have configured 7 each of 5 brokers. -Zakee On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: These messages are usually caused by leader migration. I think as long as you don¹t see this lasting for ever and got a bunch of under replicated partitions, it should be fine. Jiangjie (Becket) Qin On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote: Need to know if I should I be worried about this or ignore them. I see tons of these exceptions/warnings in the broker logs, not sure what causes them and what could be done to fix them. ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class
Re: Broker Exceptions
Correction: Actually the rebalance happened quite until 24 hours after the start, and thats where below errors were found. Ideally rebalance should not have happened at all. Thanks Zakee On Mar 9, 2015, at 10:28 AM, Zakee kzak...@netzero.net wrote: Hmm, that sounds like a bug. Can you paste the log of leader rebalance here? Thanks for you suggestions. It looks like the rebalance actually happened only once soon after I started with clean cluster and data was pushed, it didn’t happen again so far, and I see the partitions leader counts on brokers did not change since then. One of the brokers was constantly showing 0 for partition leader count. Is that normal? Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors? [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) Some other things to check are: 1. The actual property name is auto.leader.rebalance.enable, not auto.leader.rebalance. You’ve probably known this, just to double confirm. Yes 2. In zookeeper path, can you verify /admin/preferred_replica_election does not exist? ls /admin [delete_topics] ls /admin/preferred_replica_election Node does not exist: /admin/preferred_replica_election Thanks Zakee On Mar 7, 2015, at 10:49 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: Hmm, that sounds like a bug. Can you paste the log of leader rebalance here? Some other things to check are: 1. The actual property name is auto.leader.rebalance.enable, not auto.leader.rebalance. You’ve probably known this, just to double confirm. 2. In zookeeper path, can you verify /admin/preferred_replica_election does not exist? Jiangjie (Becket) Qin On 3/7/15, 10:24 PM, Zakee kzak...@netzero.net wrote: I started with clean cluster and started to push data. It still does the rebalance at random durations even though the auto.leader.relabalance is set to false. Thanks Zakee On Mar 6, 2015, at 3:51 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: Yes, the rebalance should not happen in that case. That is a little bit strange. Could you try to launch a clean Kafka cluster with auto.leader.election disabled and try push data? When leader migration occurs, NotLeaderForPartition exception is expected. Jiangjie (Becket) Qin On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote: Yes, Jiangjie, I do see lots of these errors Starting preferred replica leader election for partitions” in logs. I also see lot of Produce request failure warnings in with the NotLeader Exception. I tried switching off the auto.leader.relabalance to false. I am still noticing the rebalance happening. My understanding was the rebalance will not happen when this is set to false. Thanks Zakee On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: I don’t think num.replica.fetchers will help in this case. Increasing number of fetcher threads will only help in cases where you have a large amount of data coming into a broker and more replica fetcher threads will help keep up. We usually only use 1-2 for each broker. But in your case, it looks that leader migration cause issue. Do you see anything else in the log? Like preferred leader election? Jiangjie (Becket) Qin On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net mailto:kzak...@netzero.net wrote: Thanks, Jiangjie. Yes, I do see under partitions usually shooting every hour. Anythings that I could try to reduce it? How does num.replica.fetchers affect the replica sync? Currently have configured 7 each of 5 brokers. -Zakee On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: These messages are usually caused by leader migration. I think as long as you don¹t see this lasting for ever and got a bunch of under replicated partitions, it should be fine. Jiangjie (Becket) Qin On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote: Need to know if I should I be worried about this or ignore them. I see tons of these exceptions/warnings in the broker logs, not sure what causes them and what could be done to fix them. ERROR
Re: Broker Exceptions
Is there anything wrong with brokers around that time? E.g. Broker restart? The log you pasted are actually from replica fetchers. Could you paste the related logs in controller.log? Thanks. Jiangjie (Becket) Qin On 3/9/15, 10:32 AM, Zakee kzak...@netzero.net wrote: Correction: Actually the rebalance happened quite until 24 hours after the start, and thats where below errors were found. Ideally rebalance should not have happened at all. Thanks Zakee On Mar 9, 2015, at 10:28 AM, Zakee kzak...@netzero.net wrote: Hmm, that sounds like a bug. Can you paste the log of leader rebalance here? Thanks for you suggestions. It looks like the rebalance actually happened only once soon after I started with clean cluster and data was pushed, it didn’t happen again so far, and I see the partitions leader counts on brokers did not change since then. One of the brokers was constantly showing 0 for partition leader count. Is that normal? Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors? [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) Some other things to check are: 1. The actual property name is auto.leader.rebalance.enable, not auto.leader.rebalance. You’ve probably known this, just to double confirm. Yes 2. In zookeeper path, can you verify /admin/preferred_replica_election does not exist? ls /admin [delete_topics] ls /admin/preferred_replica_election Node does not exist: /admin/preferred_replica_election Thanks Zakee On Mar 7, 2015, at 10:49 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: Hmm, that sounds like a bug. Can you paste the log of leader rebalance here? Some other things to check are: 1. The actual property name is auto.leader.rebalance.enable, not auto.leader.rebalance. You’ve probably known this, just to double confirm. 2. In zookeeper path, can you verify /admin/preferred_replica_election does not exist? Jiangjie (Becket) Qin On 3/7/15, 10:24 PM, Zakee kzak...@netzero.net wrote: I started with clean cluster and started to push data. It still does the rebalance at random durations even though the auto.leader.relabalance is set to false. Thanks Zakee On Mar 6, 2015, at 3:51 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: Yes, the rebalance should not happen in that case. That is a little bit strange. Could you try to launch a clean Kafka cluster with auto.leader.election disabled and try push data? When leader migration occurs, NotLeaderForPartition exception is expected. Jiangjie (Becket) Qin On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote: Yes, Jiangjie, I do see lots of these errors Starting preferred replica leader election for partitions” in logs. I also see lot of Produce request failure warnings in with the NotLeader Exception. I tried switching off the auto.leader.relabalance to false. I am still noticing the rebalance happening. My understanding was the rebalance will not happen when this is set to false. Thanks Zakee On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: I don’t think num.replica.fetchers will help in this case. Increasing number of fetcher threads will only help in cases where you have a large amount of data coming into a broker and more replica fetcher threads will help keep up. We usually only use 1-2 for each broker. But in your case, it looks that leader migration cause issue. Do you see anything else in the log? Like preferred leader election? Jiangjie (Becket) Qin On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net mailto:kzak...@netzero.net wrote: Thanks, Jiangjie. Yes, I do see under partitions usually shooting every hour. Anythings that I could try to reduce it? How does num.replica.fetchers affect the replica sync? Currently have configured 7 each of 5 brokers. -Zakee On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: These messages are usually caused by leader migration. I think as long as you don¹t see this lasting for ever and got a bunch of under replicated partitions, it should be fine. Jiangjie (Becket) Qin On 2/25/15, 4:07 PM, Zakee
Re: Broker Exceptions
No broker restarts. Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 https://issues.apache.org/jira/browse/KAFKA-2011 Logs for rebalance: [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed preferred replica election: (kafka.controller.KafkaController) … [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica leader election for partitions (kafka.controller.KafkaController) ... [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing preferred replica election: (kafka.controller.KafkaController) Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors? [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) Could you paste the related logs in controller.log? What specifically should I search for in the logs? Thanks, Zakee On Mar 9, 2015, at 11:35 AM, Jiangjie Qin j...@linkedin.com.INVALID mailto:j...@linkedin.com.INVALID wrote: Is there anything wrong with brokers around that time? E.g. Broker restart? The log you pasted are actually from replica fetchers. Could you paste the related logs in controller.log? Thanks. Jiangjie (Becket) Qin On 3/9/15, 10:32 AM, Zakee kzak...@netzero.net mailto:kzak...@netzero.net wrote: Correction: Actually the rebalance happened quite until 24 hours after the start, and thats where below errors were found. Ideally rebalance should not have happened at all. Thanks Zakee On Mar 9, 2015, at 10:28 AM, Zakee kzak...@netzero.net mailto:kzak...@netzero.net wrote: Hmm, that sounds like a bug. Can you paste the log of leader rebalance here? Thanks for you suggestions. It looks like the rebalance actually happened only once soon after I started with clean cluster and data was pushed, it didn’t happen again so far, and I see the partitions leader counts on brokers did not change since then. One of the brokers was constantly showing 0 for partition leader count. Is that normal? Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors? [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) Some other things to check are: 1. The actual property name is auto.leader.rebalance.enable, not auto.leader.rebalance. You’ve probably known this, just to double confirm. Yes 2. In zookeeper path, can you verify /admin/preferred_replica_election does not exist? ls /admin [delete_topics] ls /admin/preferred_replica_election Node does not exist: /admin/preferred_replica_election Thanks Zakee On Mar 7, 2015, at 10:49 PM, Jiangjie Qin j...@linkedin.com.INVALID mailto:j...@linkedin.com.INVALID wrote: Hmm, that sounds like a bug. Can you paste the log of leader rebalance here? Some other things to check are: 1. The actual property name is auto.leader.rebalance.enable, not auto.leader.rebalance. You’ve probably known this, just to double confirm. 2. In zookeeper path, can you verify
Re: Broker Exceptions
No broker restarts. Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 https://issues.apache.org/jira/browse/KAFKA-2011 Logs for rebalance: [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed preferred replica election: (kafka.controller.KafkaController) … [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController) ... [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica leader election for partitions (kafka.controller.KafkaController) ... [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing preferred replica election: (kafka.controller.KafkaController) Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors? [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) Could you paste the related logs in controller.log? What specifically should I search for in the logs? Thanks, Kazim Zakee On Mar 9, 2015, at 11:35 AM, Jiangjie Qin j...@linkedin.com.INVALID wrote: Is there anything wrong with brokers around that time? E.g. Broker restart? The log you pasted are actually from replica fetchers. Could you paste the related logs in controller.log? Thanks. Jiangjie (Becket) Qin On 3/9/15, 10:32 AM, Zakee kzak...@netzero.net mailto:kzak...@netzero.net wrote: Correction: Actually the rebalance happened quite until 24 hours after the start, and thats where below errors were found. Ideally rebalance should not have happened at all. Thanks Zakee On Mar 9, 2015, at 10:28 AM, Zakee kzak...@netzero.net wrote: Hmm, that sounds like a bug. Can you paste the log of leader rebalance here? Thanks for you suggestions. It looks like the rebalance actually happened only once soon after I started with clean cluster and data was pushed, it didn’t happen again so far, and I see the partitions leader counts on brokers did not change since then. One of the brokers was constantly showing 0 for partition leader count. Is that normal? Also, I still see lots of below errors (~69k) going on in the logs since the restart. Is there any other reason than rebalance for these errors? [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-11,7] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-2,25] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for partition [Topic-2,21] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for partition [Topic-22,9] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) Some other things to check are: 1. The actual property name is auto.leader.rebalance.enable, not auto.leader.rebalance. You’ve probably known this, just to double confirm. Yes 2. In zookeeper path, can you verify /admin/preferred_replica_election does not exist? ls /admin [delete_topics] ls /admin/preferred_replica_election Node does not exist: /admin/preferred_replica_election Thanks Zakee On Mar 7, 2015, at 10:49 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: Hmm, that sounds like a bug. Can you paste the log of leader rebalance here? Some other things to check are: 1. The actual property name is auto.leader.rebalance.enable, not auto.leader.rebalance. You’ve probably known this, just to double confirm. 2. In zookeeper path, can you verify /admin/preferred_replica_election does not exist? Jiangjie (Becket) Qin On 3/7/15, 10:24
Re: Broker Exceptions
I started with clean cluster and started to push data. It still does the rebalance at random durations even though the auto.leader.relabalance is set to false. Thanks Zakee On Mar 6, 2015, at 3:51 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: Yes, the rebalance should not happen in that case. That is a little bit strange. Could you try to launch a clean Kafka cluster with auto.leader.election disabled and try push data? When leader migration occurs, NotLeaderForPartition exception is expected. Jiangjie (Becket) Qin On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote: Yes, Jiangjie, I do see lots of these errors Starting preferred replica leader election for partitions” in logs. I also see lot of Produce request failure warnings in with the NotLeader Exception. I tried switching off the auto.leader.relabalance to false. I am still noticing the rebalance happening. My understanding was the rebalance will not happen when this is set to false. Thanks Zakee On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: I don’t think num.replica.fetchers will help in this case. Increasing number of fetcher threads will only help in cases where you have a large amount of data coming into a broker and more replica fetcher threads will help keep up. We usually only use 1-2 for each broker. But in your case, it looks that leader migration cause issue. Do you see anything else in the log? Like preferred leader election? Jiangjie (Becket) Qin On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net mailto:kzak...@netzero.net wrote: Thanks, Jiangjie. Yes, I do see under partitions usually shooting every hour. Anythings that I could try to reduce it? How does num.replica.fetchers affect the replica sync? Currently have configured 7 each of 5 brokers. -Zakee On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: These messages are usually caused by leader migration. I think as long as you don¹t see this lasting for ever and got a bunch of under replicated partitions, it should be fine. Jiangjie (Becket) Qin On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote: Need to know if I should I be worried about this or ignore them. I see tons of these exceptions/warnings in the broker logs, not sure what causes them and what could be done to fix them. ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 950084 from client ReplicaFetcherThread-1-2 on partition [TestTopic,2] failed due to Leader not local for partition [TestTopic,2] on broker 2 (kafka.server.ReplicaManager) Any ideas? -Zakee Next Apple Sensation 1 little-known path to big profits http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03v uc Extended Stay America Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02du c Extended Stay America Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed. http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc The WORST exercise for aging Avoid this #34;healthy#34; exercise to look feel 5-10 years YOUNGER http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07duc
Re: Broker Exceptions
Yes, Jiangjie, I do see lots of these errors Starting preferred replica leader election for partitions” in logs. I also see lot of Produce request failure warnings in with the NotLeader Exception. I tried switching off the auto.leader.relabalance to false. I am still noticing the rebalance happening. My understanding was the rebalance will not happen when this is set to false. Thanks Zakee On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: I don’t think num.replica.fetchers will help in this case. Increasing number of fetcher threads will only help in cases where you have a large amount of data coming into a broker and more replica fetcher threads will help keep up. We usually only use 1-2 for each broker. But in your case, it looks that leader migration cause issue. Do you see anything else in the log? Like preferred leader election? Jiangjie (Becket) Qin On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net mailto:kzak...@netzero.net wrote: Thanks, Jiangjie. Yes, I do see under partitions usually shooting every hour. Anythings that I could try to reduce it? How does num.replica.fetchers affect the replica sync? Currently have configured 7 each of 5 brokers. -Zakee On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: These messages are usually caused by leader migration. I think as long as you don¹t see this lasting for ever and got a bunch of under replicated partitions, it should be fine. Jiangjie (Becket) Qin On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote: Need to know if I should I be worried about this or ignore them. I see tons of these exceptions/warnings in the broker logs, not sure what causes them and what could be done to fix them. ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 950084 from client ReplicaFetcherThread-1-2 on partition [TestTopic,2] failed due to Leader not local for partition [TestTopic,2] on broker 2 (kafka.server.ReplicaManager) Any ideas? -Zakee Next Apple Sensation 1 little-known path to big profits http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc Extended Stay America Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02duc Extended Stay America Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed. http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
Re: Broker Exceptions
Yes, the rebalance should not happen in that case. That is a little bit strange. Could you try to launch a clean Kafka cluster with auto.leader.election disabled and try push data? When leader migration occurs, NotLeaderForPartition exception is expected. Jiangjie (Becket) Qin On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote: Yes, Jiangjie, I do see lots of these errors Starting preferred replica leader election for partitions” in logs. I also see lot of Produce request failure warnings in with the NotLeader Exception. I tried switching off the auto.leader.relabalance to false. I am still noticing the rebalance happening. My understanding was the rebalance will not happen when this is set to false. Thanks Zakee On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: I don’t think num.replica.fetchers will help in this case. Increasing number of fetcher threads will only help in cases where you have a large amount of data coming into a broker and more replica fetcher threads will help keep up. We usually only use 1-2 for each broker. But in your case, it looks that leader migration cause issue. Do you see anything else in the log? Like preferred leader election? Jiangjie (Becket) Qin On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net mailto:kzak...@netzero.net wrote: Thanks, Jiangjie. Yes, I do see under partitions usually shooting every hour. Anythings that I could try to reduce it? How does num.replica.fetchers affect the replica sync? Currently have configured 7 each of 5 brokers. -Zakee On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: These messages are usually caused by leader migration. I think as long as you don¹t see this lasting for ever and got a bunch of under replicated partitions, it should be fine. Jiangjie (Becket) Qin On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote: Need to know if I should I be worried about this or ignore them. I see tons of these exceptions/warnings in the broker logs, not sure what causes them and what could be done to fix them. ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 950084 from client ReplicaFetcherThread-1-2 on partition [TestTopic,2] failed due to Leader not local for partition [TestTopic,2] on broker 2 (kafka.server.ReplicaManager) Any ideas? -Zakee Next Apple Sensation 1 little-known path to big profits http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03v uc Extended Stay America Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02du c Extended Stay America Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed. http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
Re: Broker Exceptions
Thanks, Jiangjie, I will try with a clean cluster again. Thanks Zakee On Mar 6, 2015, at 3:51 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: Yes, the rebalance should not happen in that case. That is a little bit strange. Could you try to launch a clean Kafka cluster with auto.leader.election disabled and try push data? When leader migration occurs, NotLeaderForPartition exception is expected. Jiangjie (Becket) Qin On 3/6/15, 3:14 PM, Zakee kzak...@netzero.net wrote: Yes, Jiangjie, I do see lots of these errors Starting preferred replica leader election for partitions” in logs. I also see lot of Produce request failure warnings in with the NotLeader Exception. I tried switching off the auto.leader.relabalance to false. I am still noticing the rebalance happening. My understanding was the rebalance will not happen when this is set to false. Thanks Zakee On Feb 25, 2015, at 5:17 PM, Jiangjie Qin j...@linkedin.com.INVALID wrote: I don’t think num.replica.fetchers will help in this case. Increasing number of fetcher threads will only help in cases where you have a large amount of data coming into a broker and more replica fetcher threads will help keep up. We usually only use 1-2 for each broker. But in your case, it looks that leader migration cause issue. Do you see anything else in the log? Like preferred leader election? Jiangjie (Becket) Qin On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net mailto:kzak...@netzero.net wrote: Thanks, Jiangjie. Yes, I do see under partitions usually shooting every hour. Anythings that I could try to reduce it? How does num.replica.fetchers affect the replica sync? Currently have configured 7 each of 5 brokers. -Zakee On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: These messages are usually caused by leader migration. I think as long as you don¹t see this lasting for ever and got a bunch of under replicated partitions, it should be fine. Jiangjie (Becket) Qin On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote: Need to know if I should I be worried about this or ignore them. I see tons of these exceptions/warnings in the broker logs, not sure what causes them and what could be done to fix them. ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 950084 from client ReplicaFetcherThread-1-2 on partition [TestTopic,2] failed due to Leader not local for partition [TestTopic,2] on broker 2 (kafka.server.ReplicaManager) Any ideas? -Zakee Next Apple Sensation 1 little-known path to big profits http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03v uc Extended Stay America Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02du c Extended Stay America Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed. http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc The WORST exercise for aging Avoid this #34;healthy#34; exercise to look feel 5-10 years YOUNGER http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07duc
Broker Exceptions
Need to know if I should I be worried about this or ignore them. I see tons of these exceptions/warnings in the broker logs, not sure what causes them and what could be done to fix them. ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 950084 from client ReplicaFetcherThread-1-2 on partition [TestTopic,2] failed due to Leader not local for partition [TestTopic,2] on broker 2 (kafka.server.ReplicaManager) Any ideas? -Zakee Next Apple Sensation 1 little-known path to big profits http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc
Re: Broker Exceptions
These messages are usually caused by leader migration. I think as long as you don¹t see this lasting for ever and got a bunch of under replicated partitions, it should be fine. Jiangjie (Becket) Qin On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote: Need to know if I should I be worried about this or ignore them. I see tons of these exceptions/warnings in the broker logs, not sure what causes them and what could be done to fix them. ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 950084 from client ReplicaFetcherThread-1-2 on partition [TestTopic,2] failed due to Leader not local for partition [TestTopic,2] on broker 2 (kafka.server.ReplicaManager) Any ideas? -Zakee Next Apple Sensation 1 little-known path to big profits http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc
Re: Broker Exceptions
Thanks, Jiangjie. Yes, I do see under partitions usually shooting every hour. Anythings that I could try to reduce it? How does num.replica.fetchers affect the replica sync? Currently have configured 7 each of 5 brokers. -Zakee On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: These messages are usually caused by leader migration. I think as long as you don¹t see this lasting for ever and got a bunch of under replicated partitions, it should be fine. Jiangjie (Becket) Qin On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote: Need to know if I should I be worried about this or ignore them. I see tons of these exceptions/warnings in the broker logs, not sure what causes them and what could be done to fix them. ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 950084 from client ReplicaFetcherThread-1-2 on partition [TestTopic,2] failed due to Leader not local for partition [TestTopic,2] on broker 2 (kafka.server.ReplicaManager) Any ideas? -Zakee Next Apple Sensation 1 little-known path to big profits http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc Extended Stay America Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02duc
Re: Broker Exceptions
I don’t think num.replica.fetchers will help in this case. Increasing number of fetcher threads will only help in cases where you have a large amount of data coming into a broker and more replica fetcher threads will help keep up. We usually only use 1-2 for each broker. But in your case, it looks that leader migration cause issue. Do you see anything else in the log? Like preferred leader election? Jiangjie (Becket) Qin On 2/25/15, 5:02 PM, Zakee kzak...@netzero.net wrote: Thanks, Jiangjie. Yes, I do see under partitions usually shooting every hour. Anythings that I could try to reduce it? How does num.replica.fetchers affect the replica sync? Currently have configured 7 each of 5 brokers. -Zakee On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: These messages are usually caused by leader migration. I think as long as you don¹t see this lasting for ever and got a bunch of under replicated partitions, it should be fine. Jiangjie (Becket) Qin On 2/25/15, 4:07 PM, Zakee kzak...@netzero.net wrote: Need to know if I should I be worried about this or ignore them. I see tons of these exceptions/warnings in the broker logs, not sure what causes them and what could be done to fix them. ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker 5:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 950084 from client ReplicaFetcherThread-1-2 on partition [TestTopic,2] failed due to Leader not local for partition [TestTopic,2] on broker 2 (kafka.server.ReplicaManager) Any ideas? -Zakee Next Apple Sensation 1 little-known path to big profits http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc Extended Stay America Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02duc