[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825989#comment-13825989
 ] 

Bikas Saha commented on YARN-744:
---------------------------------

Better name?
{code}
+    AllocateResponseLock res = responseMap.get(applicationAttemptId);
{code}

reuse throwApplicationAttemptDoesNotExistInCacheException() in 
registerApplicationMaster()?

use InvalidApplicationMasterRequestException or a new specific exception 
instead of generic RPCUtil.throwRemoteException()?
{code}
+  private void throwApplicationAttemptDoesNotExistInCacheException(
+      ApplicationAttemptId appAttemptId) throws YarnException {
+    String message = "Application doesn't exist in cache "
+        + appAttemptId;
+    LOG.error(message);
+    throw RPCUtil.getRemoteException(message);
+  }
{code}

The new logic is not the same as the old one. If the app is no longer in the 
cache then it would send a resync response. Now it will send a regular response 
instead of a resync response.
{code}
-      // before returning response, verify in sync
-      AllocateResponse oldResponse =
-          responseMap.put(appAttemptId, allocateResponse);
-      if (oldResponse == null) {
-        // appAttempt got unregistered, remove it back out
-        responseMap.remove(appAttemptId);
-        String message = "App Attempt removed from the cache during allocate"
-            + appAttemptId;
-        LOG.error(message);
-        return resync;
-      }
-
+      res.setAllocateResponse(allocateResponse);
       return allocateResponse;
{code}

> Race condition in ApplicationMasterService.allocate .. It might process same 
> allocate request twice resulting in additional containers getting allocated.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-744
>                 URL: https://issues.apache.org/jira/browse/YARN-744
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Omkar Vinit Joshi
>            Priority: Minor
>         Attachments: MAPREDUCE-3899-branch-0.23.patch, 
> YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, 
> YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse 
> object and then puts a new lastResponse object into the map. At this point a 
> new thread entering this function will get a new lastResponse object and will 
> be able to take its lock and enter the critical section. Presumably we want 
> to limit one response per app attempt. So the lock could be taken on the 
> ApplicationAttemptId key of the response map object.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to