[ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813672#comment-13813672 ]
Bikas Saha commented on YARN-674: --------------------------------- We were intentionally going through the same submitApplication() method to make sure that all the initialization and setup code paths are consistently followed in both cases by keeping the code path identical as much as possible. The RM would submit a recovered application, in essence proxying a user submitting the application. Its a general pattern followed through the recovery logic - to be minimally invasive to the mainline code path so that we can avoid functional bugs as much as possible. Separating them into 2 methods has resulted in code duplication in both methods without any huge benefit that I can see. It also leave us susceptible to future code changes made in one code path and not the other. Why is isSecurityEnabled() being checked at this internal level. The code should not even reach this point if security is not enabled. It should already be taken care of in the public apis, right? Also why is it calling rmContext.getDelegationTokenRenewer().addApplication(event) instead of DelegationTokenRenewer.this.addApplication(). Same for rmContext.getDelegationTokenRenewer().applicationFinished(evt); {code} @SuppressWarnings("unchecked") + private void handleDTRenewerEvent( + DelegationTokenRenewerAppSubmitEvent event) { + try { + // Setup tokens for renewal + if (UserGroupInformation.isSecurityEnabled()) { + rmContext.getDelegationTokenRenewer().addApplication(event); + rmContext.getDispatcher().getEventHandler() + .handle(new RMAppEvent(event.getAppicationId(), + event.isApplicationRecovered() ? RMAppEventType.RECOVER + : RMAppEventType.START)); + } + } catch (Throwable t) { {code} These assumptions may make the code brittle to future changes. Also Typo in comments. We should probably assert that the application state is NEW over here so that the broken assumption is caught at the source instead of at the destination app causing a state machine crash................. {code} + "Unable to add the application to the delegation token renewer.", + t); + // Sending APP_REJECTED is fine, since we assume that the + // RMApp is in NEW state and thus we havne't yet informed the + // Scheduler about the existence of the application + rmContext.getDispatcher().getEventHandler().handle( + new RMAppRejectedEvent(event.getAppicationId(), t.getMessage())); + } {code} typo {code} public ApplicationId getAppicationId() { {code} @Private + @VisibleForTesting??? {code} + //Only for Testing + public int getInProcessDelegationTokenRenewerEventsCount() { + return this.renewerCount.get(); + } {code} Can DelegationTokenRenewerAppSubmitEvent event objects have an event type different from VERIFY_AND_START_APPLICATION? If not, we dont need this check and we can change the constructor of DelegationTokenRenewerAppSubmitEvent to not expect an event type argument. It should set the VERIFY_AND_START_APPLICATION within the constructor. {code} + if (evt.getType().equals( + DelegationTokenRenewerEventType.VERIFY_AND_START_APPLICATION) + && evt instanceof DelegationTokenRenewerAppSubmitEvent) { {code} Rename DelegationTokenRenewerThread to not have misleading Thread in the name ? Why is this warning not happening for other services? Whats special in the code for DelegationTokenRenewer? {code} + <!-- Ignore Synchronization issues as they are never going to occur for + methods like serviceInit(), serviceStart() and handle() --> + <Match> + <Class name="org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer" /> + <Bug pattern="IS2_INCONSISTENT_SYNC" /> + </Match> {code} > Slow or failing DelegationToken renewals on submission itself make RM > unavailable > --------------------------------------------------------------------------------- > > Key: YARN-674 > URL: https://issues.apache.org/jira/browse/YARN-674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Vinod Kumar Vavilapalli > Assignee: Omkar Vinit Joshi > Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, > YARN-674.4.patch, YARN-674.5.patch > > > This was caused by YARN-280. A slow or a down NameNode for will make it look > like RM is unavailable as it may run out of RPC handlers due to blocked > client submissions. -- This message was sent by Atlassian JIRA (v6.1#6144)