timmylicheng commented on issue #668: HDDS-3139. Pipeline placement should max 
out pipeline usage
URL: https://github.com/apache/hadoop-ozone/pull/668#issuecomment-611927572
 
 
   > > @sodonnel I rebase the code with minor conflicts in test class, but the 
test won't pass. I took a close look and made some change. But I realize the 
issue that I mention in the last comment about how to leverage with 
chooseNodeFromTopology. Wanna learn your thoughts.
   > 
   > I think one problem is this line:
   > 
   > ```
   > datanodeDetails = nodes.stream().findAny().get();
   > ```
   > 
   > The findAny method does not seem to return a random entry - so the same 
node is returned until it uses up its pipeline allocation.
   > 
   > I am also not sure about the limit calculation in getLowerLoadNodes:
   > 
   > ```
   >  int limit = nodes.size() * heavyNodeCriteria
   >         / HddsProtos.ReplicationFactor.THREE.getNumber();
   > ```
   > 
   > Adding debug, I find this method starts to return an empty list when there 
are still available nodes to handle the pipeline.
   > 
   > Also in `filterViableNodes()` via the `meetCriteria()` method, nodes with 
more than the heavy load limit are already filtered out, so you are guaranteed 
your healthy node list container only nodes with the capacity to take another 
pipeline. So I wonder why we need to filter the nodes further.
   > 
   > > But I realize the issue that I mention in the last comment about how to 
leverage with chooseNodeFromTopology.
   > 
   > There seems to be some inconsistency in how we pick the nodes (not just in 
this PR, but in the wider code). Eg in `chooseNodeBasedOnRackAwareness()` we 
don't call into NetworkTopology(), but instead we use the 
`getNetworkLocation()` method on the `DatanodeDetails` object to find nodes 
that do not match the anchor's location.
   > 
   > Then later in `chooseNodeFromNetworkTopology()` we try to find a node 
where location is equal to the anchor and that is where we call into 
`networkTopology.chooseRandom()`. Could we not avoid that call, and avoid 
generating a new list of nodes and do something similar to 
`chooseNodeBasedOnRackAwareness()`, using the `getNetworkLocation()` method to 
find matching nodes. That would probably be more efficient that the current 
implementation.
   > 
   > As we are also then able to re-use the same list of healthy nodes 
everywhere without more filtering, maybe we could sort that list once by 
pipeline count in filterViableNodes or meetCriteria and then later always pick 
the node with the lowest load, filling the nodes up that way.
   > 
   > I hope this comment makes sense as it is very long.
   
   @sodonnel Thanks for the consideration. I've updated the patch according to 
your example.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to