Hello,

We tried the ReassignPartitionsCommand to move partitions to new brokers.
The execution initially showed message "Successfully started reassignment
of partitions ...". But when I tried to verify using --verify option, it
reported some reassignments have failed:

ERROR: Assigned replicas (0,5,2) don't match the list of replicas for
reassignment (0,5) for partition [vhs_playback_event,1]
ERROR: Assigned replicas (4,5,0,2) don't match the list of replicas for
reassignment (4,5) for partition [vhs_playback_event,11]
ERROR: Assigned replicas (3,5,0,2) don't match the list of replicas for
reassignment (3,5) for partition [vhs_playback_event,16]

I noticed that the assigned replicas in the error messages include both old
assignment and new assignment. Is this a real error or just means
partitions are being copied and current state does not match the final
expected state?

Since I was confused by the errors, I ran the same
ReassignPartitionsCommand with the same assignment again but got some
additional failure messages complaining that three partitions do not exist:

[2015-01-23 18:15:41,333] ERROR Skipping reassignment of partition
[vhs_playback_event,16] since it doesn't exist
(kafka.admin.ReassignPartitionsCommand)
[2015-01-23 18:15:41,455] ERROR Skipping reassignment of partition
[vhs_playback_event,15] since it doesn't exist
(kafka.admin.ReassignPartitionsCommand)
[2015-01-23 18:15:41,499] ERROR Skipping reassignment of partition
[vhs_playback_event,17] since it doesn't exist
(kafka.admin.ReassignPartitionsCommand)

These partitions later reappeared from the output of --verify.

The other thing is that at one point the BytesOut from one broker exceeds
100Mbytes, which is quite alarming.

In the end, the reassignment was done according to the input file to
ReassignPartitionsCommand. But the UnderReplicatedPartitions for the
brokers keeps showing a positive number, even though the output of describe
topic command and ZooKeeper data show the ISRs are all in sync, and
Replica-MaxLag is 0.

To sum up, the overall execution is successful but the error messages are
quite noisy and the metric is not consistent with what appears to be.

Does anyone have the similar experience and is there anything we can do get
it done smoother? What can we do to reset the inconsistent
UnderReplicatedPartitions metric?

Thanks,
Allen

Reply via email to