craigcondit commented on pull request #323:
URL: 
https://github.com/apache/incubator-yunikorn-core/pull/323#issuecomment-920986918


   @yangwwei, thanks for the review.
   
   > There is another thing, with this algorithm, we calculate ratios only, 
this could lead to some issues when the size of the node is very different, for 
example, if I have 2 nodes (just use cpu as an example)
   > 
   > nodeA:
   > 
   > * cpuTotal: 2
   > * cpuUsed: 1
   > * usedRation: 0.5
   > 
   > nodeB
   > 
   > * cpuTotal: 10
   > * cpuUsed: 6
   > * usedRatio: 0.6
   > 
   > with the current algorithm, under FAIR we will prefer A over B; however, A 
has only 1 cpu available but B has 4... so it might be better to allocate B 
first. I guess this is fine... but need to keep in mind when the size of the 
nodes is very different, this algorithm may tend to fill the smaller node 
first...
   
   In your example, yes we would place on nodeA first. However, if we consider 
a more realistic example, something with larger CPU counts and where the size 
of an allocation is not 50% of one of our nodes, it's far less dramatic, even 
if nodes vary in size. Consider:
   
   nodeC: cpuTotal=8, cpuUsed=4, usedRatio=0.5
   nodeD: cpuTotal=40, cpuUsed=24, usedRatio=0.6
   
   The ratios between the node sizes are equal, and the current usedRatios 
match your example. Let's consider where subsequent allocations go and how this 
affects things. I will assume 1 CPU as allocation size for simplicity:
   
   Request 1: nodeC, as 0.5 < 0.6. Total usage on nodeC: 0.625. Total node 
allocations for C/D: { 5, 24 }
   Request 2: nodeD, as 0.6 < 0.625. Total usage on nodeD: 0.625.  Total 
allocations: { 5, 25 }
   Request 3: nodeC as scores are both 0.625. Total usage on nodeC: 0.75 Total 
allocations: { 6, 25 }
   Request 4: nodeD as 0.625 < 0.75. Total usage on nodeD: 0.65. Total node 
allocations: { 6, 26 }
   Request 4: nodeD as 0.65 < 0.75. Total usage on nodeD: 0.675. Total node 
allocations: { 6, 27 }
   Request 5: nodeD as 0.675 < 0.75. Total usage on nodeD: 0.7. Total node 
allocations: { 6, 28 }
   Request 6: nodeD as 0.7 < 0.75. Total usage on nodeD: 0.725. Total node 
allocations: { 6, 29 }
   Request 7: nodeD as 0.725 < 0.75. Total usage on nodeD: 0.75. Total node 
allocations: { 6, 30 }
   Request 8: nodeC as scores are both 0.75. Total usage on nodeC: 0.875. Total 
node allocations: { 7, 30 }
   ... and so on.
   
   We don't actually end up filling nodeC until both reach 87.5% capacity, and 
at that point, nodeC happens to fill first just because it hit the tie-breaker 
scenario (using node ID) chosen. However, you can see that all the way along, 
the allocations favor nodeB in a 5:1 ratio, which exactly matches their node 
size ratio.
   
   > Another thing is when we find 2 scores are the same, we compare nodeID, 
instead of doing that, can we compare the node available resources?
   
   The node ID is not a part of the scoring algorithms. It is a technical 
requirement that the keys we use for the B-Tree implementation be unique, and 
node ID is the one thing we know will not change and will be unique across all 
nodes. It will ONLY be used in the case where two nodes score identically. 
However, leaving it out, or using some other value that may not be unique, 
could result in collisions, causing a node to vanish from the node collection. 
We could generate something like a UUID for each node on addition to the node 
collection, but this isn't really any better; it just makes the tie-breaker 
algorithm dependent on something that isn't externally visible (and in fact 
makes testing more difficult). It doesn't improve the behavior at all.  If we 
want different behavior for different nodes, they should produce different 
scores.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to