Zhijie Shen commented on YARN-2209:

[~jianhe], thanks for the patch. Bellow is some meta comments on this issue.

Why is it necessary to use the exception instead of the flag to indicate the RM 
restarting? In general, I'm afraid the changes here mutually break the backward 
compatibility between YARN and MR. On the one side, any YARN applications used 
to have the logic to deal with RM restarting need to be updated after this 
patch. For example, MR of prior versions will no longer work properly with a 
YARN cluster after this patch during RM restarting. The MR job won’t recognize 
the not found exception and take the necessary restarting treatment, but will 
just record the error and move on.

On the other side, if we assume it is possible the new version MR job after 
this patch is going to be run on an old YARN cluster, the MR job will then not 
recognize the old flag-style restarting signal, and thus will not executing the 
MR-side logic to deal with RM restarting. IMHO, at least, the switch block to 
check the AMCommand cannot be removed but deprecated for compatibility 

In case we want to proceed with this change, here're some comment on the patch:

1.  MR side change is not trivial. According to our convention before, shall we 
split the patch into two pieces: one for YARN and the other for MR, such that 
we can easily track the changes for different projects.

2. Why not throwing ApplicationAttemptNotFoundException instead? It sounds more 
reasonable here, doesn’t it?

3. Deprecate the enum type instead of each enum value?
 public enum AMCommand {

4. The description sounds not accurate enough. It doesn’t just request 
containers. “App Master heartbeat”?
+    public static final String AM_ALLOCATE = "App Master request containers”;

5. No need to break it into two lines, right?
     AllocateResponse allocateResponse;
+    allocateResponse = scheduler.allocate(allocateRequest);

6.  Is this change necessary?
-        return allocate(progressIndicator);
+        allocateResponse = allocate(progressIndicator);
+        return allocateResponse;

> Replace AM resync/shutdown command with corresponding exceptions
> ----------------------------------------------------------------
>                 Key: YARN-2209
>                 URL: https://issues.apache.org/jira/browse/YARN-2209
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.

This message was sent by Atlassian JIRA

Reply via email to