[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869932#comment-13869932
 ] 

Bikas Saha commented on YARN-1410:
----------------------------------

Xuan, can you please verify Karthik's comment above and fix/file jira? 

For all follow up jiras, please mention relevant jiras in comments in the code 
for other peoples benefit.

Why is this at most once? This could be retried any number of times until the 
RM gives us a new application id, right?
{code}+  @AtMostOnce
   public GetNewApplicationResponse getNewApplication({code}

In the common case, this extra operation is going to be pure overhead. Can we 
get this information via an exception in the submitApplication() method?
{code}-    rmClient.submitApplication(request);
+    // check whether the applicationId is present or not
+    // before we submit the application
+    try {
+      getApplicationReport(applicationId);
+      String message = "Application with id " + applicationId +
+          " is already present! Cannot add a duplicate!";
+      LOG.error(message);
+      throw new YarnException(message);
+    } catch (ApplicationNotFoundException ex) {
+      // The applicationId is not present.
+      // submit the application with this applicationId
+      rmClient.submitApplication(request);
+    }{code}

Can we create a common yarn client instead of doing it in every 
createApplication()/submitApplication() helper method call?
{code}+  private ApplicationId createApplication() {
+    int numRetries = 3;
+    ApplicationId appId = null;
+    while (numRetries-- > 0) {
+      Configuration conf = new YarnConfiguration(this.conf);
+      YarnClient client = YarnClient.createYarnClient();{code}

Better name for test or some comments that it is testing the case when appId is 
created by previous RM but the app is not saved before failover. So the RM 
accepts the old Id.

Can it happen that submitApplication() will fail on the client but the RM has 
actually saved the application (with or without failover)? What should we do in 
that case?

Why are we removing this code. We should return a new exception for the client.
{code}-    // but it is good to fail the invalid submission as early as 
possible.
+    // The duplication has been checked before application submission
     if (rmContext.getRMApps().get(applicationId) != null) {
-      String message = "Application with id " + applicationId +
-          " is already present! Cannot add a duplicate!";
-      LOG.warn(message);
-      RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
-          message, "ClientRMService", "Exception in submitting application",
-          applicationId);
-      throw RPCUtil.getRemoteException(message);{code}

What are the changes in TestYarnClient for? I dont see any new test added.

There is no testcase for the new functionality of YarnClient where it creates 
an appId if its not supplied by the user? If you want, we could do it a 
separate jira related to this jira.

> Handle client failover during 2 step client API's like app submission
> ---------------------------------------------------------------------
>
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
> YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed 
> over, and the client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create 
> app id) the new RM may reject the app submission resulting in unexpected 
> failure on the client side.
> The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to