[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077554#comment-14077554
 ] 

Zhijie Shen commented on YARN-2209:
-----------------------------------

While the change will neither break the binary and the source compatibility, 
the logic is still at the risk of being broken by changing the way of signaling 
AM from via AMCommand to via exception. As is mentioned above applications 
other than MR will be affected by this change. For example, if a certain 
application AM logic looks as follows:

{code}
  try {
    ams.allocate(...);
  catch (Exception e) {
    ams.finishApplicationMaster(...)
  }
  if (response is shutdown/resync) {
    // cleanup and reboot ...
  }
{code}

The original logic is likely to be broken if the application is running on the 
YARN cluster after this patch. Previously, the application doesn't expect the 
shutdown/resync is going to be notified via exception, and it simply catches 
the allocate operation failure, and terminate the application. In this case, 
the application that should have been retried during RM restarting in a current 
YARN cluster is likely to conclude failure (assume killing AM container signal 
arrives later than all the aforementioned logic).

In general, the problem is that we previously claim an API is going to throw 
exception 1, exception 2 and etc., and we expect users to handle these 
exceptions. To handle them correctly, users are supposed to know in what 
situation the exception is going to be raised either implicitly or explicitly 
(in YARN it seems that users had to figure out themselves as we hardly drafted 
the javadoc for the exceptions). Lately, we don't change the API method 
signature. Instead, we add/modify the situation where the exception is going to 
be raised, or throw a sub-exception (in this case) which was not expected 
before. Hence, the existing API user is likely to be broken around the newly 
added/modified exception, as the new stuff may not be taken into consideration 
before. Is this considered as a kind of *logic incompatibility*?

> Replace AM resync/shutdown command with corresponding exceptions
> ----------------------------------------------------------------
>
>                 Key: YARN-2209
>                 URL: https://issues.apache.org/jira/browse/YARN-2209
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to