[jira] [Commented] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment
[ https://issues.apache.org/jira/browse/KAFKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14396039#comment-14396039 ] Jay Kreps commented on KAFKA-1631: -- Is this behavior really so bad? I actually think handling a reassignment as an add+delete makes some sense... ReplicationFactor and under-replicated partitions incorrect during reassignment --- Key: KAFKA-1631 URL: https://issues.apache.org/jira/browse/KAFKA-1631 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen Assignee: Ewen Cheslack-Postava Labels: newbie Attachments: KAFKA-1631-v1.patch We have a topic with a replication factor of 3. We monitor UnderReplicatedPartitions as recommended by the documentation. During a partition reassignment, partitions being reassigned are reported as under-replicated. Running a describe shows: {code} Topic:activity-wal-1PartitionCount:15 ReplicationFactor:5 Configs: Topic: activity-wal-1 Partition: 0Leader: 14 Replicas: 14,13,12,11,15Isr: 14,12,11,13 Topic: activity-wal-1 Partition: 1Leader: 14 Replicas: 15,14,11 Isr: 14,11 Topic: activity-wal-1 Partition: 2Leader: 11 Replicas: 11,15,12 Isr: 12,11,15 ... {code} It looks like the displayed replication factor, 5, is simply the number of replicas listed for the first partition, which includes both brokers in the current list and those onto which the partition is being reassigned. Partition 0 is also included in the list when using the `--under-replicated-partitions` option, even though it is replicated to more partitions than the true replication factor. During a reassignment, the under-replicated partitions metric is not usable, meaning that actual under-replicated partitions can go unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment
[ https://issues.apache.org/jira/browse/KAFKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14159388#comment-14159388 ] Neha Narkhede commented on KAFKA-1631: -- The behavior of partition reassignment being old set - old set + new set - new set is just an implementation detail that users don't need to know and understand. However, there are 2 ways to report under replicated partitions today and this solution fixes one but not the other. For instance, if partitions being reassigned are not reported as under replicated through the topics tool (with this patch) but are reported by the broker's mbean, users would get confused. An ideal long term solution would be to define partition states as being one of the following - new, initializing, ready, migrating, under replicated (maybe more or less) and expose the partition's state as being one of these through the topic tool as well as JMX. It is possible to get away without having these states if there are maybe just 2 possible states that the partition lives in, but as the # of states increases, it is worth exposing those explicitly. One of these states is under-replicated and partitions being reassigned should belong to a separate migrating state, not under replicated. ReplicationFactor and under-replicated partitions incorrect during reassignment --- Key: KAFKA-1631 URL: https://issues.apache.org/jira/browse/KAFKA-1631 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen Labels: newbie Attachments: KAFKA-1631-v1.patch We have a topic with a replication factor of 3. We monitor UnderReplicatedPartitions as recommended by the documentation. During a partition reassignment, partitions being reassigned are reported as under-replicated. Running a describe shows: {code} Topic:activity-wal-1PartitionCount:15 ReplicationFactor:5 Configs: Topic: activity-wal-1 Partition: 0Leader: 14 Replicas: 14,13,12,11,15Isr: 14,12,11,13 Topic: activity-wal-1 Partition: 1Leader: 14 Replicas: 15,14,11 Isr: 14,11 Topic: activity-wal-1 Partition: 2Leader: 11 Replicas: 11,15,12 Isr: 12,11,15 ... {code} It looks like the displayed replication factor, 5, is simply the number of replicas listed for the first partition, which includes both brokers in the current list and those onto which the partition is being reassigned. Partition 0 is also included in the list when using the `--under-replicated-partitions` option, even though it is replicated to more partitions than the true replication factor. During a reassignment, the under-replicated partitions metric is not usable, meaning that actual under-replicated partitions can go unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment
[ https://issues.apache.org/jira/browse/KAFKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148304#comment-14148304 ] Ryan Berdeen commented on KAFKA-1631: - Not reporting partitions being reassigned seems even worse--this would lead to false negatives! It also doesn't address the fact that replication factor is reported incorrectly. It seems like the right solution would be to store the intended replication factor for the topic, and alert if the size of the ISR is less than this. ReplicationFactor and under-replicated partitions incorrect during reassignment --- Key: KAFKA-1631 URL: https://issues.apache.org/jira/browse/KAFKA-1631 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen Labels: newbie We have a topic with a replication factor of 3. We monitor UnderReplicatedPartitions as recommended by the documentation. During a partition reassignment, partitions being reassigned are reported as under-replicated. Running a describe shows: {code} Topic:activity-wal-1PartitionCount:15 ReplicationFactor:5 Configs: Topic: activity-wal-1 Partition: 0Leader: 14 Replicas: 14,13,12,11,15Isr: 14,12,11,13 Topic: activity-wal-1 Partition: 1Leader: 14 Replicas: 15,14,11 Isr: 14,11 Topic: activity-wal-1 Partition: 2Leader: 11 Replicas: 11,15,12 Isr: 12,11,15 ... {code} It looks like the displayed replication factor, 5, is simply the number of replicas listed for the first partition, which includes both brokers in the current list and those onto which the partition is being reassigned. Partition 0 is also included in the list when using the `--under-replicated-partitions` option, even though it is replicated to more partitions than the true replication factor. During a reassignment, the under-replicated partitions metric is not usable, meaning that actual under-replicated partitions can go unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment
[ https://issues.apache.org/jira/browse/KAFKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148559#comment-14148559 ] Ryan Berdeen commented on KAFKA-1631: - The patch does look like an improvement to the {{TopicCommand}}, but doesn't address the number of under-replicated partitions reported by the brokers. It seems like there shouldn't be multiple definitions of under-replicated partition. ReplicationFactor and under-replicated partitions incorrect during reassignment --- Key: KAFKA-1631 URL: https://issues.apache.org/jira/browse/KAFKA-1631 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen Labels: newbie Attachments: KAFKA-1631-v1.patch We have a topic with a replication factor of 3. We monitor UnderReplicatedPartitions as recommended by the documentation. During a partition reassignment, partitions being reassigned are reported as under-replicated. Running a describe shows: {code} Topic:activity-wal-1PartitionCount:15 ReplicationFactor:5 Configs: Topic: activity-wal-1 Partition: 0Leader: 14 Replicas: 14,13,12,11,15Isr: 14,12,11,13 Topic: activity-wal-1 Partition: 1Leader: 14 Replicas: 15,14,11 Isr: 14,11 Topic: activity-wal-1 Partition: 2Leader: 11 Replicas: 11,15,12 Isr: 12,11,15 ... {code} It looks like the displayed replication factor, 5, is simply the number of replicas listed for the first partition, which includes both brokers in the current list and those onto which the partition is being reassigned. Partition 0 is also included in the list when using the `--under-replicated-partitions` option, even though it is replicated to more partitions than the true replication factor. During a reassignment, the under-replicated partitions metric is not usable, meaning that actual under-replicated partitions can go unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment
[ https://issues.apache.org/jira/browse/KAFKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148680#comment-14148680 ] Ewen Cheslack-Postava commented on KAFKA-1631: -- Right. Unfortunately most of the system isn't aware of the large scale change (reassign old set - new set), only of each intermediate state (old set - old set + new set - new set). As it stands, the UnderReplicatedPartitions are computed by Partition class, which is created by ReplicaManager. But the high-level reassignment is managed by KafkaController, and looks like the only place the necessary state is maintained. I think getting the semantics you want may require a much more substantial change since each partition leader will need to know about the partition reassignment rather than just the controller. On the other hand, while I think it's less than ideal, the current behavior could certainly be argued to be reasonable -- i.e. that reassignment is not natively supported, it's just a higher-level operation you can build up. In this case, the intermediate step is expected, and the temporary reporting of under-replication would make sense since for a time the desired replication of (old set + new set) has not been achieved. ReplicationFactor and under-replicated partitions incorrect during reassignment --- Key: KAFKA-1631 URL: https://issues.apache.org/jira/browse/KAFKA-1631 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen Labels: newbie Attachments: KAFKA-1631-v1.patch We have a topic with a replication factor of 3. We monitor UnderReplicatedPartitions as recommended by the documentation. During a partition reassignment, partitions being reassigned are reported as under-replicated. Running a describe shows: {code} Topic:activity-wal-1PartitionCount:15 ReplicationFactor:5 Configs: Topic: activity-wal-1 Partition: 0Leader: 14 Replicas: 14,13,12,11,15Isr: 14,12,11,13 Topic: activity-wal-1 Partition: 1Leader: 14 Replicas: 15,14,11 Isr: 14,11 Topic: activity-wal-1 Partition: 2Leader: 11 Replicas: 11,15,12 Isr: 12,11,15 ... {code} It looks like the displayed replication factor, 5, is simply the number of replicas listed for the first partition, which includes both brokers in the current list and those onto which the partition is being reassigned. Partition 0 is also included in the list when using the `--under-replicated-partitions` option, even though it is replicated to more partitions than the true replication factor. During a reassignment, the under-replicated partitions metric is not usable, meaning that actual under-replicated partitions can go unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment
[ https://issues.apache.org/jira/browse/KAFKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131051#comment-14131051 ] Neha Narkhede commented on KAFKA-1631: -- Thanks for reporting the issue, [~rberdeen]. Since partition reassignment involves changing the replicas of a partition, it is tricky to report the under replicated status correctly at all times. However, one possible improvement is to change the topics tool to not report partitions being reassigned, as under replicated. It is a minor change, feel free to give it a stab. ReplicationFactor and under-replicated partitions incorrect during reassignment --- Key: KAFKA-1631 URL: https://issues.apache.org/jira/browse/KAFKA-1631 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen Labels: newbie We have a topic with a replication factor of 3. We monitor UnderReplicatedPartitions as recommended by the documentation. During a partition reassignment, partitions being reassigned are reported as under-replicated. Running a describe shows: {code} Topic:activity-wal-1PartitionCount:15 ReplicationFactor:5 Configs: Topic: activity-wal-1 Partition: 0Leader: 14 Replicas: 14,13,12,11,15Isr: 14,12,11,13 Topic: activity-wal-1 Partition: 1Leader: 14 Replicas: 15,14,11 Isr: 14,11 Topic: activity-wal-1 Partition: 2Leader: 11 Replicas: 11,15,12 Isr: 12,11,15 ... {code} It looks like the displayed replication factor, 5, is simply the number of replicas listed for the first partition, which includes both brokers in the current list and those onto which the partition is being reassigned. Partition 0 is also included in the list when using the `--under-replicated-partitions` option, even though it is replicated to more partitions than the true replication factor. During a reassignment, the under-replicated partitions metric is not usable, meaning that actual under-replicated partitions can go unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)