[ https://issues.apache.org/jira/browse/HDDS-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Glen Geng updated HDDS-4343: ---------------------------- Description: {code:java} // If there are unhealthy replicas, then we should remove them even if it // makes the container violate the placement policy, as excess unhealthy // containers are not really useful. It will be corrected later as a // mis-replicated container will be seen as under-replicated. for (ContainerReplica r : unhealthyReplicas) { if (excess > 0) { sendDeleteCommand(container, r.getDatanodeDetails(), true); excess -= 1; } break; } // After removing all unhealthy replicas, if the container is still over // replicated then we need to check if it is already mis-replicated. // If it is, we do no harm by removing excess replicas. However, if it is // not mis-replicated, then we can only remove replicas if they don't // make the container become mis-replicated.it seems that the comments want to remove all unhealthy replicas until excess reach 0 ?I guess it should be for (ContainerReplica r : unhealthyReplicas) { if (excess > 0) { sendDeleteCommand(container, r.getDatanodeDetails(), true); excess -= 1; } else { break; } } {code} was: {code:java} 20/08/28 03:21:53 WARN retry.RetryInvocationHandler: A failover has occurred since the start of call #28868 $Proxy17.submitRequest over nodeId=om3,nodeAddress=vc1330.halxg.cloudera.com:9862 20/08/28 03:21:53 WARN retry.RetryInvocationHandler: A failover has occurred since the start of call #28870 $Proxy17.submitRequest over nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862 20/08/28 03:21:53 WARN retry.RetryInvocationHandler: A failover has occurred since the start of call #28869 $Proxy17.submitRequest over nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862 20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred since the start of call #28871 $Proxy17.submitRequest over nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862 20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred since the start of call #28872 $Proxy17.submitRequest over nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862 20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred since the start of call #28866 $Proxy17.submitRequest over nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862 20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred since the start of call #28867 $Proxy17.submitRequest over nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862 20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred since the start of call #28874 $Proxy17.submitRequest over nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862 20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred since the start of call #28875 $Proxy17.submitRequest over nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862 20/08/28 03:21:54 ERROR freon.BaseFreonGenerator: Error on executing task 14424 KEY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to commit key, as /vol1/bucket1/akjkdz4hoj/14424/104766512182520809entry is not found in the OpenKey table at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:593) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.commitKey(OzoneManagerProtocolClientSideTranslatorPB.java:650) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.commitKey(BlockOutputStreamEntryPool.java:306) at org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:514) at org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:60) at org.apache.hadoop.ozone.freon.OzoneClientKeyGenerator.lambda$createKey$0(OzoneClientKeyGenerator.java:118) at com.codahale.metrics.Timer.time(Timer.java:101) at org.apache.hadoop.ozone.freon.OzoneClientKeyGenerator.createKey(OzoneClientKeyGenerator.java:113) at org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:178) at org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:167) at org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$0(BaseFreonGenerator.java:150) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} > CLONE - OM client request fails with "failed to commit as key is not found in > OpenKey table" > -------------------------------------------------------------------------------------------- > > Key: HDDS-4343 > URL: https://issues.apache.org/jira/browse/HDDS-4343 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM > Reporter: Glen Geng > Assignee: Glen Geng > Priority: Blocker > > {code:java} > // If there are unhealthy replicas, then we should remove them even if > it > // makes the container violate the placement policy, as excess unhealthy > // containers are not really useful. It will be corrected later as a > // mis-replicated container will be seen as under-replicated. > for (ContainerReplica r : unhealthyReplicas) { > if (excess > 0) { > sendDeleteCommand(container, r.getDatanodeDetails(), true); > excess -= 1; > } > break; > } > // After removing all unhealthy replicas, if the container is still over > // replicated then we need to check if it is already mis-replicated. > // If it is, we do no harm by removing excess replicas. However, if it > is > // not mis-replicated, then we can only remove replicas if they don't > // make the container become mis-replicated.it seems that the comments > want to remove all unhealthy replicas until excess reach 0 ?I guess it should > be > for (ContainerReplica r : unhealthyReplicas) { > if (excess > 0) { > sendDeleteCommand(container, r.getDatanodeDetails(), true); > excess -= 1; > } else { > break; > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org