[ 
https://issues.apache.org/jira/browse/YARN-9311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771666#comment-16771666
 ] 

Prabhu Joseph commented on YARN-9311:
-------------------------------------

[~rohithsharma] {{TestRMRestart#testRMStateStoreDispatcherDrainedOnRMStop}} 
started hanging after YARN-8449. Until this fix, RM startService won't call 
{{RMStateStore#handleStoreEvent}} and so won't block at while(wait);  YARN-8449 
does state store of proxyCA at serviceStart. The fix looks valid one, just 
fixing testcase should be enough. Can you review the same.

YARN-8449 state store at serviceStart:
{code}
   protected void serviceStart() throws Exception {
-    proxyCA.init();
+    if (!wasRecovered) {
+      proxyCA.init();
+    }
+    wasRecovered = false;
+    rmContext.getStateStore().storeProxyCACert(
+        proxyCA.getCaCert(), proxyCA.getCaKeyPair().getPrivate());
{code}


Test case has while(wait); at {{handleStoreEvent}}

{code}
     @Override
      protected void handleStoreEvent(RMStateStoreEvent event) {
        // Block app saving request.
        // Skip if synchronous updation of DTToken
        if (!(event instanceof RMStateStoreAMRMTokenEvent)
            && !(event instanceof RMStateStoreRMDTEvent)
            && !(event instanceof RMStateStoreRMDTMasterKeyEvent)) {
          while (wait);
        }
        super.handleStoreEvent(event);
      }
{code}

> TestRMRestart hangs due to a deadlock
> -------------------------------------
>
>                 Key: YARN-9311
>                 URL: https://issues.apache.org/jira/browse/YARN-9311
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>         Attachments: YARN-9311-001.patch, jstackdata, jstackdata1
>
>
> {{TestRMRestart#testRMStateStoreDispatcherDrainedOnRMStop}} hangs as 
> {{MockRM}} start runs in an infinite loop at {{handleStoreEvent}}
> {code}
> [INFO] Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> [INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
> [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.468 
> s - in org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to