wilfred-s commented on pull request #332:
URL: 
https://github.com/apache/incubator-yunikorn-core/pull/332#issuecomment-962774771


   Based on the unit test failures we must make sure that the order in the shim 
is correct. First recover apps then the nodes.
   Looking at the change we might have had an issue in the RMProxy for a long 
time. I do think that we need to add a retry in the node update when we recover 
the node. Even in the previous implementation there was no guarantee that all 
the application were added before a node was recovered. The tests in the unit 
tests used the order processing dependency to make sure it worked. There was 
_never_ an order requirements on the events send by a shim. An event to recover 
a node could be a separate UpdateRequest from the applications that should be 
recovered. That means we relied on the go routine ordering to hopefully do 
things correctly: i.e. events send by the shim to create new apps would be 
processed before node recovery started.
   That is a dangerous assumption: filing a follow up jira.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to