[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076498#comment-14076498 ]
Zhijie Shen commented on YARN-2209: ----------------------------------- [~jianhe], thanks for the patch. Bellow is some meta comments on this issue. Why is it necessary to use the exception instead of the flag to indicate the RM restarting? In general, I'm afraid the changes here mutually break the backward compatibility between YARN and MR. On the one side, any YARN applications used to have the logic to deal with RM restarting need to be updated after this patch. For example, MR of prior versions will no longer work properly with a YARN cluster after this patch during RM restarting. The MR job won’t recognize the not found exception and take the necessary restarting treatment, but will just record the error and move on. On the other side, if we assume it is possible the new version MR job after this patch is going to be run on an old YARN cluster, the MR job will then not recognize the old flag-style restarting signal, and thus will not executing the MR-side logic to deal with RM restarting. IMHO, at least, the switch block to check the AMCommand cannot be removed but deprecated for compatibility consideration. In case we want to proceed with this change, here're some comment on the patch: 1. MR side change is not trivial. According to our convention before, shall we split the patch into two pieces: one for YARN and the other for MR, such that we can easily track the changes for different projects. 2. Why not throwing ApplicationAttemptNotFoundException instead? It sounds more reasonable here, doesn’t it? 3. Deprecate the enum type instead of each enum value? {code} @Public @Unstable public enum AMCommand { {code} 4. The description sounds not accurate enough. It doesn’t just request containers. “App Master heartbeat”? {code} + public static final String AM_ALLOCATE = "App Master request containers”; {code} 5. No need to break it into two lines, right? {code} AllocateResponse allocateResponse; … + allocateResponse = scheduler.allocate(allocateRequest); {code} 6. Is this change necessary? {code} - return allocate(progressIndicator); + allocateResponse = allocate(progressIndicator); + return allocateResponse; {code} > Replace AM resync/shutdown command with corresponding exceptions > ---------------------------------------------------------------- > > Key: YARN-2209 > URL: https://issues.apache.org/jira/browse/YARN-2209 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Jian He > Assignee: Jian He > Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, > YARN-2209.4.patch, YARN-2209.5.patch > > > YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate > application to re-register on RM restart. we should do the same for > AMS#allocate call also. -- This message was sent by Atlassian JIRA (v6.2#6252)