fapifta commented on pull request #881:
URL: https://github.com/apache/hadoop-ozone/pull/881#issuecomment-623438510
Somehow I still feel bad about the current separation of concerns...
But let's go one by one:
It is not a problem to consider node state in the PlacementPolicy, as the
policies are already depend on NodeManager, and selects just from healthy
nodes. Decommissioning state will/should be available via the NodeManager also
for the policies.
Pipeline placement can use the current interface, and we can introduce a new
one called ContainerPlacementPolicy extending PlacementPolicy and defining the
necessary methods.
Strictly speaking pipeline placement does not need the
validateContainrePlacement method either.
My problem is a bit more fundamental, and I am mainly concerned about the
complexity we would be introducing here to have a generalized-ish algorythm
with a secondary condition that has to be met and checked for all placement
policies, and for all except one placement policies we supply the dummy values
to do not do anything differently then before. While on the other hand we are
completely unsure whether we will ever need a tertiary condition for an other
placement policy in the future or we won't. This is why I suggested a new
approach, and reverse the control, as the way how we remove the excess
replicas, or how we add new replicas via the ReplicationManager will not
foreseeably change in the future much, and as we can provide a callback
mechanism to the policies without introducing new classes and hard to
understand indirections, but without coupling, I still think that is the better
way to implement this.
What I fear the most if we let this through, is that the next one comes to
this code after adding a policy either need to do a magic inside the policy, to
provide the secondary condition properly, maybe mix together a secondary and
tertiary condition to fit into the current interface, as we can not broke
compatibility, and at the end of the day ReplicationManager gets modified again
to accept the new stuff, as it is easier to add. I do not see this with my
suggestion, moreover I do not see any reason after that to change the
ReplicationManager ever again significantly.
Also in the suggested version, we can test the mis-replication detection and
fix specifically with just the policies, and leave the ReplicationManager just
an executor of what the policy suggests. I hope it is and understandable fear,
I am not sure it is addressable within the current implementation though... :)
But I am open to even chat about it.
This was how I summarised and understood first what was happening, and what
will be introduced with the new code.
At the beginning we have ReplicationManager, we have NodeManager, we have
the PlacementPolicy.
- ReplicationManager is to ensure that the number of replicas "considered
live" is exactly as much as the replication factor of a container.
- NodeManager is to provide Node status, and information. I am unsure why
it knows about NetworkTopology, and whether it should but for now it knows
about topology also.
- PlacementPolicy is there to provide information about the replica
placement rules, based on NodeManager's data.
Now we want to extend a functionality that was hardcoded into
ReplicationManager before.
The functionality works now as:
- if a container is open, leave it as it is
- if a container is closing, send the close command to the container
replicas
- if a container is quasy_closed, force the closure of the replicas
- update container replica states about inflight and deleted replicas if
needed based on current action
- if the container is healthy leave it as it is
- if the container is under replicated handle under-replication
- if the container is over replicated handle over-replication
How we handle mis-replication before patch:
- at under replication, we use the PlacementPolicy to get a set of nodes
that are proper to hold the container replicas considering the exclusion list
consists of unhealthy or in-weird-state replicas, and we replicate to the nodes
that are coming back from the policy.
- at over replication, we calculate the number of excess replicas, and
remove those that are not necessary/unhealthy/in-weird-state.
In the new code, what is changing is how we handle mis-replication, the rest
of the code is unchanged. The new code suggest the following way:
- at under-replication we start to have a secondary condition expressed by
a number of missing something (in this case rack), and we check against this
number and the number of missing replicas, and if none-of them is met and both
of them are improving we actually re-replicate, otherwise next time we will
check again.
- at over-replication we start to have a secondary condition expressed by a
number and we check against it again, to see if we remove a replica then the
system still met a condition on this number, and remove only replicas that are
not ruining the condition.
Until we only have the rack-aware placement with this secondary condition,
it can be named as rackSomething, however at the point in time when we
introduce a new one, this naming will not stand the probe of time for sure, and
we might end up with more general more hard to understand names. The
architecture probably will not stand either, as this generalism here is
limiting our possibility to write a policy specific easy to understand
algorithm to find the excess replicas and to add new replicas. Also the current
algorithm is a bit harder to understand if we can not use the policy specific
terms as I see it at the moment.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]