[
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825989#comment-13825989
]
Bikas Saha commented on YARN-744:
---------------------------------
Better name?
{code}
+ AllocateResponseLock res = responseMap.get(applicationAttemptId);
{code}
reuse throwApplicationAttemptDoesNotExistInCacheException() in
registerApplicationMaster()?
use InvalidApplicationMasterRequestException or a new specific exception
instead of generic RPCUtil.throwRemoteException()?
{code}
+ private void throwApplicationAttemptDoesNotExistInCacheException(
+ ApplicationAttemptId appAttemptId) throws YarnException {
+ String message = "Application doesn't exist in cache "
+ + appAttemptId;
+ LOG.error(message);
+ throw RPCUtil.getRemoteException(message);
+ }
{code}
The new logic is not the same as the old one. If the app is no longer in the
cache then it would send a resync response. Now it will send a regular response
instead of a resync response.
{code}
- // before returning response, verify in sync
- AllocateResponse oldResponse =
- responseMap.put(appAttemptId, allocateResponse);
- if (oldResponse == null) {
- // appAttempt got unregistered, remove it back out
- responseMap.remove(appAttemptId);
- String message = "App Attempt removed from the cache during allocate"
- + appAttemptId;
- LOG.error(message);
- return resync;
- }
-
+ res.setAllocateResponse(allocateResponse);
return allocateResponse;
{code}
> Race condition in ApplicationMasterService.allocate .. It might process same
> allocate request twice resulting in additional containers getting allocated.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-744
> URL: https://issues.apache.org/jira/browse/YARN-744
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Bikas Saha
> Assignee: Omkar Vinit Joshi
> Priority: Minor
> Attachments: MAPREDUCE-3899-branch-0.23.patch,
> YARN-744-20130711.1.patch, YARN-744-20130715.1.patch,
> YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse
> object and then puts a new lastResponse object into the map. At this point a
> new thread entering this function will get a new lastResponse object and will
> be able to take its lock and enter the critical section. Presumably we want
> to limit one response per app attempt. So the lock could be taken on the
> ApplicationAttemptId key of the response map object.
--
This message was sent by Atlassian JIRA
(v6.1#6144)