Hello, We tried the ReassignPartitionsCommand to move partitions to new brokers. The execution initially showed message "Successfully started reassignment of partitions ...". But when I tried to verify using --verify option, it reported some reassignments have failed:
ERROR: Assigned replicas (0,5,2) don't match the list of replicas for reassignment (0,5) for partition [vhs_playback_event,1] ERROR: Assigned replicas (4,5,0,2) don't match the list of replicas for reassignment (4,5) for partition [vhs_playback_event,11] ERROR: Assigned replicas (3,5,0,2) don't match the list of replicas for reassignment (3,5) for partition [vhs_playback_event,16] I noticed that the assigned replicas in the error messages include both old assignment and new assignment. Is this a real error or just means partitions are being copied and current state does not match the final expected state? Since I was confused by the errors, I ran the same ReassignPartitionsCommand with the same assignment again but got some additional failure messages complaining that three partitions do not exist: [2015-01-23 18:15:41,333] ERROR Skipping reassignment of partition [vhs_playback_event,16] since it doesn't exist (kafka.admin.ReassignPartitionsCommand) [2015-01-23 18:15:41,455] ERROR Skipping reassignment of partition [vhs_playback_event,15] since it doesn't exist (kafka.admin.ReassignPartitionsCommand) [2015-01-23 18:15:41,499] ERROR Skipping reassignment of partition [vhs_playback_event,17] since it doesn't exist (kafka.admin.ReassignPartitionsCommand) These partitions later reappeared from the output of --verify. The other thing is that at one point the BytesOut from one broker exceeds 100Mbytes, which is quite alarming. In the end, the reassignment was done according to the input file to ReassignPartitionsCommand. But the UnderReplicatedPartitions for the brokers keeps showing a positive number, even though the output of describe topic command and ZooKeeper data show the ISRs are all in sync, and Replica-MaxLag is 0. To sum up, the overall execution is successful but the error messages are quite noisy and the metric is not consistent with what appears to be. Does anyone have the similar experience and is there anything we can do get it done smoother? What can we do to reset the inconsistent UnderReplicatedPartitions metric? Thanks, Allen