sodonnel commented on a change in pull request #881:
URL: https://github.com/apache/hadoop-ozone/pull/881#discussion_r420707400
##########
File path:
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -512,25 +523,60 @@ private void handleUnderReplicatedContainer(final
ContainerInfo container,
if (source.size() > 0) {
final int replicationFactor = container
.getReplicationFactor().getNumber();
- final int delta = replicationFactor - getReplicaCount(id, replicas);
+ // Want to check if the container is mis-replicated after considering
+ // inflight add and delete.
+ // Create a new list from source (healthy replicas minus pending
delete)
+ List<DatanodeDetails> targetReplicas = new ArrayList<>(source);
+ // Then add any pending additions
+ targetReplicas.addAll(replicationInFlight);
+
+ int delta = replicationFactor - getReplicaCount(id, replicas);
+ final ContainerPlacementStatus placementStatus =
+ containerPlacement.validateContainerPlacement(
+ targetReplicas, replicationFactor);
+ final int misRepDelta = placementStatus.additionalReplicaRequired();
Review comment:
> I understand the statement "a container has a valid placement", as "we
have the replication factor amount of replicas placed as we want them to be
placed"
Placement and replication are two different things, and I think it is good
to keep them that way.
If you consider HDFS fsck - if a container is under replicated, it will
report that. If it is mis-replicated it will report that. If it is both, it
reports them both separately, along with their "missing count" for each. In
HDFS for racks, there can only ever be 1 additional rack needed. But with
upgrade domain, there can be possibly two additional domains needed.
> I see this as something that still implicitly brings in rack awareness
into ReplicationManager via an arbitrary int value
It brings mis-replication into the RM, not rack awareness, and this is OK. A
container needs to meet several goals to be properly replicated - the correct
number of healthy replicas and the correct placement and this is what the
replication manager checks.
In Recon, we want to be able to give an indicator about mis-replication and
under replication separately in a similar way and ideally with Recon we just
load the exact same placement policy as the cluster is using and call the same
methods.
> I am unhappy about the check in lines from L562, and suggested a way
there if we can achieve it.
This check is still useful. If we combined the two rules as you are
suggesting so "addtitionalReplicaRequired" return a single number representing
delta and misRep, then in the case where you have delta=1, misRep=1 (ie two
replicas on one rack), then if the newly selected node, due to fallback, gives
us 3 replicas on one rack. This is OK, but the next time this container is
checked, we have 3 replicas on one rack, and the logic will return 1, telling
us to create another. We need to be sure this improves the mis-replication
before creating a 4th replica, as that will make the container over-replicated.
Of course, we can check:
* If we test 2 replicas with 1 rack, the logic will return 1.
* If we test 3 replicas with 1 rack, the logic will return 1.
* If we test 4 replicas with 1 rack, the logic will return 1.
So if we push the under and mis-replication numbers together, we cannot
really see what is going on, and we will need to test the replication count too
I think anyway.
I can give these methods new names or improve the java docs a bit more to
make it clear that placement != replication and the number returned from the
ContainerPlacementStatus object does not reflect replicas. I've made a small
change to the doc now and will push it up:
```
/**
* Returns a boolean indicating if the container replicas meet the desired
* placement policy. That is, they are placed on a sufficient number of
* racks, or node groups etc. This does not check if the container is over
* or under replicated, as it is possible for a container to have enough
* replicas and still not meet the placement rules.
* @return True if the containers meet the policy. False otherwise.
*/
boolean isPolicySatisfied();
/**
* Returns an String describing why a container does not meet the placement
* policy.
* @return String indicating the reason the policy is not satisfied or
null if
* the policy is satisfied.
*/
String misReplicatedReason();
/**
* If the container does not meet the placement policy, return an integer
* indicating how many additional replicas are required so the container
* meets the placement policy. Otherwise return zero.
* Note the count returned are the number of replicas needed to meet the
* placement policy. The container may need additional replicas if it is
* under replicated. The container could also have sufficient replicas but
* require more to make it meet the policy, as the existing replicas are
not
* placed correctly.
* @return The number of additional replicas required, or zero.
*/
int additionalReplicaRequired();
```
I could change the name of additionalReplicaRequired() to be something like
replicasRequiredForPlacement(), if yo think it makes it more clear.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]