fapifta commented on pull request #881:
URL: https://github.com/apache/hadoop-ozone/pull/881#issuecomment-623438510


   Somehow I still feel bad about the current separation of concerns...
   
   But let's go one by one:
   It is not a problem to consider node state in the PlacementPolicy, as the 
policies are already depend on NodeManager, and selects just from healthy 
nodes. Decommissioning state will/should be available via the NodeManager also 
for the policies.
   Pipeline placement can use the current interface, and we can introduce a new 
one called ContainerPlacementPolicy extending PlacementPolicy and defining the 
necessary methods.
   Strictly speaking pipeline placement does not need the 
validateContainrePlacement method either.
   
   My problem is a bit more fundamental, and I am mainly concerned about the 
complexity we would be introducing here to have a generalized-ish algorythm 
with a secondary condition that has to be met and checked for all placement 
policies, and for all except one placement policies we supply the dummy values 
to do not do anything differently then before. While on the other hand we are 
completely unsure whether we will ever need a tertiary condition for an other 
placement policy in the future or we won't. This is why I suggested a new 
approach, and reverse the control, as the way how we remove the excess 
replicas, or how we add new replicas via the ReplicationManager will not 
foreseeably change in the future much, and as we can provide a callback 
mechanism to the policies without introducing new classes and hard to 
understand indirections, but without coupling, I still think that is the better 
way to implement this.
   
   What I fear the most if we let this through, is that the next one comes to 
this code after adding a policy either need to do a magic inside the policy, to 
provide the secondary condition properly, maybe mix together a secondary and 
tertiary condition to fit into the current interface, as we can not broke 
compatibility, and at the end of the day ReplicationManager gets modified again 
to accept the new stuff, as it is easier to add. I do not see this with my 
suggestion, moreover I do not see any reason after that to change the 
ReplicationManager ever again significantly.
   Also in the suggested version, we can test the mis-replication detection and 
fix specifically with just the policies, and leave the ReplicationManager just 
an executor of what the policy suggests. I hope it is and understandable fear, 
I am not sure it is addressable within the current implementation though... :) 
But I am open to even chat about it.
   
   
   This was how I summarised and understood first what was happening, and what 
will be introduced with the new code.
   At the beginning we have ReplicationManager, we have NodeManager, we have 
the PlacementPolicy.
    - ReplicationManager is to ensure that the number of replicas "considered 
live" is exactly as much as the replication factor of a container.
    - NodeManager is to provide Node status, and information. I am unsure why 
it knows about NetworkTopology, and whether it should but for now it knows 
about topology also.
    - PlacementPolicy is there to provide information about the replica 
placement rules, based on NodeManager's data.
   
   Now we want to extend a functionality that was hardcoded into 
ReplicationManager before.
   The functionality works now as:
    - if a container is open, leave it as it is
    - if a container is closing, send the close command to the container 
replicas
    - if a container is quasy_closed, force the closure of the replicas
    - update container replica states about inflight and deleted replicas if 
needed based on current action
    - if the container is healthy leave it as it is
    - if the container is under replicated handle under-replication
    - if the container is over replicated handle over-replication
   
   How we handle mis-replication before patch:
    - at under replication, we use the PlacementPolicy to get a set of nodes 
that are proper to hold the container replicas considering the exclusion list 
consists of unhealthy or in-weird-state replicas, and we replicate to the nodes 
that are coming back from the policy.
   - at over replication, we calculate the number of excess replicas, and 
remove those that are not necessary/unhealthy/in-weird-state.
   
   
   In the new code, what is changing is how we handle mis-replication, the rest 
of the code is unchanged. The new code suggest the following way:
    - at under-replication we start to have a secondary condition expressed by 
a number of missing something (in this case rack), and we check against this 
number and the number of missing replicas, and if none-of them is met and both 
of them are improving we actually re-replicate, otherwise next time we will 
check again.
    - at over-replication we start to have a secondary condition expressed by a 
number and we check against it again, to see if we remove a replica then the 
system still met a condition on this number, and remove only replicas that are 
not ruining the condition.
   
   Until we only have the rack-aware placement with this secondary condition, 
it can be named as rackSomething, however at the point in time when we 
introduce a new one, this naming will not stand the probe of time for sure, and 
we might end up with more general more hard to understand names. The 
architecture probably will not stand either, as this generalism here is 
limiting our possibility to write a policy specific easy to understand 
algorithm to find the excess replicas and to add new replicas. Also the current 
algorithm is a bit harder to understand if we can not use the policy specific 
terms as I see it at the moment.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to