[
https://issues.apache.org/jira/browse/YARN-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711683#comment-13711683
]
Omkar Vinit Joshi commented on YARN-906:
----------------------------------------
what you are saying above completely makes sense.. That is definitely a problem
because of mismatch between dispatcher queue processing events and exec
actually launching the thread. We should probably make sure that whole
computation of call method is moved inside the try{} catch{} and just in the
beginning check for the flag status. For updating flag status we definitely
need locking....
an alternative solution which seems most logical to me is that what if we send
the same event from the place where we are canceling thread and expect /ignore
additional event at KILLING state...didn't thought much about it ..but worth
considering an alternative solution...thoughts?
[~vinodkv] what surprises me here is our single dispatcher thread model.. :( we
really can see multiple issues if anywhere in between state transition we have
client requests and it does cancel some of the expected code path ...destroying
expected state transitions..
btw interesting finding [~zjshen] :)
> TestNMClient.testNMClientNoCleanupOnStop fails occasionally
> -----------------------------------------------------------
>
> Key: YARN-906
> URL: https://issues.apache.org/jira/browse/YARN-906
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Zhijie Shen
> Assignee: Zhijie Shen
> Attachments: YARN-906.1.patch
>
>
> See
> https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClientNoCleanupOnStop/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira