[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813672#comment-13813672
 ] 

Bikas Saha commented on YARN-674:
---------------------------------

We were intentionally going through the same submitApplication() method to make 
sure that all the initialization and setup code paths are consistently followed 
in both cases by keeping the code path identical as much as possible. The RM 
would submit a recovered application, in essence proxying a user submitting the 
application. Its a general pattern followed through the recovery logic - to be 
minimally invasive to the mainline code path so that we can avoid functional 
bugs as much as possible. Separating them into 2 methods has resulted in code 
duplication in both methods without any huge benefit that I can see. It also 
leave us susceptible to future code changes made in one code path and not the 
other.

Why is isSecurityEnabled() being checked at this internal level. The code 
should not even reach this point if security is not enabled. It should already 
be taken care of in the public apis, right? 
Also why is it calling 
rmContext.getDelegationTokenRenewer().addApplication(event) instead of 
DelegationTokenRenewer.this.addApplication(). Same for 
rmContext.getDelegationTokenRenewer().applicationFinished(evt);
{code}
@SuppressWarnings("unchecked")
+    private void handleDTRenewerEvent(
+        DelegationTokenRenewerAppSubmitEvent event) {
+      try {
+        // Setup tokens for renewal
+        if (UserGroupInformation.isSecurityEnabled()) {
+          rmContext.getDelegationTokenRenewer().addApplication(event);
+          rmContext.getDispatcher().getEventHandler()
+              .handle(new RMAppEvent(event.getAppicationId(),
+              event.isApplicationRecovered() ? RMAppEventType.RECOVER
+              : RMAppEventType.START));
+        }
+      } catch (Throwable t) {
{code}

These assumptions may make the code brittle to future changes. Also Typo in 
comments. We should probably assert that the application state is NEW over here 
so that the broken assumption is caught at the source instead of at the 
destination app causing a state machine crash.................
{code}
+            "Unable to add the application to the delegation token renewer.",
+            t);
+        // Sending APP_REJECTED is fine, since we assume that the
+        // RMApp is in NEW state and thus we havne't yet informed the
+        // Scheduler about the existence of the application
+        rmContext.getDispatcher().getEventHandler().handle(
+            new RMAppRejectedEvent(event.getAppicationId(), t.getMessage()));
+      }
{code}

typo
{code}
 public ApplicationId getAppicationId() {
{code}

@Private + @VisibleForTesting???
{code}
+  //Only for Testing
+  public int getInProcessDelegationTokenRenewerEventsCount() {
+    return this.renewerCount.get();
+  }
{code}

Can DelegationTokenRenewerAppSubmitEvent event objects have an event type 
different from VERIFY_AND_START_APPLICATION? If not, we dont need this check 
and we can change the constructor of DelegationTokenRenewerAppSubmitEvent to 
not expect an event type argument. It should set the 
VERIFY_AND_START_APPLICATION within the constructor.
{code}
+      if (evt.getType().equals(
+          DelegationTokenRenewerEventType.VERIFY_AND_START_APPLICATION)
+          && evt instanceof DelegationTokenRenewerAppSubmitEvent) {
{code}

Rename DelegationTokenRenewerThread to not have misleading Thread in the name ?

Why is this warning not happening for other services? Whats special in the code 
for DelegationTokenRenewer?
{code}
+  <!-- Ignore Synchronization issues as they are never going to occur for
+       methods like serviceInit(), serviceStart() and handle() -->
+  <Match>
+    <Class 
name="org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer"
 />
+    <Bug pattern="IS2_INCONSISTENT_SYNC" />
+  </Match>
{code}

> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-674
>                 URL: https://issues.apache.org/jira/browse/YARN-674
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Omkar Vinit Joshi
>         Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
> YARN-674.4.patch, YARN-674.5.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to