[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827538#comment-13827538 ] Hudson commented on YARN-744: - SUCCESS: Integrated in Hadoop-Yarn-trunk #397 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/397/]) YARN-744. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. (Omkar Vinit Joshi via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543707) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.2.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827627#comment-13827627 ] Hudson commented on YARN-744: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1588 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1588/]) YARN-744. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. (Omkar Vinit Joshi via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543707) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.2.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827639#comment-13827639 ] Hudson commented on YARN-744: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1614 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1614/]) YARN-744. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. (Omkar Vinit Joshi via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543707) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.2.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826974#comment-13826974 ] Omkar Vinit Joshi commented on YARN-744: Thanks [~bikassaha] addressed your comments. Attaching a new patch. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.2.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826997#comment-13826997 ] Hadoop QA commented on YARN-744: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614693/YARN-744.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2486//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2486//console This message is automatically generated. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.2.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827011#comment-13827011 ] Omkar Vinit Joshi commented on YARN-744: Test failure is not related to this. Opened ticket YARN-1425 to track this. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.2.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825989#comment-13825989 ] Bikas Saha commented on YARN-744: - Better name? {code} +AllocateResponseLock res = responseMap.get(applicationAttemptId); {code} reuse throwApplicationAttemptDoesNotExistInCacheException() in registerApplicationMaster()? use InvalidApplicationMasterRequestException or a new specific exception instead of generic RPCUtil.throwRemoteException()? {code} + private void throwApplicationAttemptDoesNotExistInCacheException( + ApplicationAttemptId appAttemptId) throws YarnException { +String message = Application doesn't exist in cache ++ appAttemptId; +LOG.error(message); +throw RPCUtil.getRemoteException(message); + } {code} The new logic is not the same as the old one. If the app is no longer in the cache then it would send a resync response. Now it will send a regular response instead of a resync response. {code} - // before returning response, verify in sync - AllocateResponse oldResponse = - responseMap.put(appAttemptId, allocateResponse); - if (oldResponse == null) { -// appAttempt got unregistered, remove it back out -responseMap.remove(appAttemptId); -String message = App Attempt removed from the cache during allocate -+ appAttemptId; -LOG.error(message); -return resync; - } - + res.setAllocateResponse(allocateResponse); return allocateResponse; {code} Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820921#comment-13820921 ] Hadoop QA commented on YARN-744: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613497/YARN-744.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2433//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2433//console This message is automatically generated. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721282#comment-13721282 ] Omkar Vinit Joshi commented on YARN-744: Thanks [~bikassaha] ... bq. AllocateResponseWrapper res how about AllocateResponseLock?? bq. If the wrapper exists then how can the lastResponse be null? you are right ..now we no longer need this removing it. yeah the test won't actually be able to simulate the race condition mentioned above. Can't think of any other test. Attaching it without a test. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744-20130726.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721300#comment-13721300 ] Hadoop QA commented on YARN-744: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594461/YARN-744-20130726.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1591//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1591//console This message is automatically generated. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744-20130726.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719003#comment-13719003 ] Bikas Saha commented on YARN-744: - Patch needs rebase. Probably reuse the recently added UnknownApplication exception. Can we pick a better name? {code}AllocateResponseWrapper res{code} If the wrapper exists then how can the lastResponse be null? {code} +synchronized (res) { + AllocateResponse lastResponse = res.getAllocateResponse(); + if (lastResponse == null) { +LOG.error(AppAttemptId doesnt exist in cache + appAttemptId); +return resync; + } {code} I am not quite getting fully what the test is doing. Does it fail without the change? On the test code itself. Is there another solution to make the test work other than making a method protected only for test override purposes? Can you make it package private instead of protected? protected implies connotations for derived classes. Can we avoid the long sleep() calls and use wait-notify or countdown latches so that we dont waste time in the test. They also help make the test less flaky because of race conditions. {code} + @Override + protected void authorizeRequest(ApplicationAttemptId appAttemptID) + throws YarnException { +int interval = 10; +count.incrementAndGet(); +while (count.get() == 1 interval-- 0 ) { + try { +Thread.sleep(1000); + } catch (InterruptedException e) {} +} +Assert.assertTrue(count.get() 1); + } +}; {code} Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713206#comment-13713206 ] Omkar Vinit Joshi commented on YARN-744: [~bikassaha] yes... there is similar but different bug though..so [~mayank_bansal] is fixing it. There we are computing the response and then updating RMNodeImpl asynchronously. If this approach is correct then we can do the similar thing after YARN-245 is in. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708761#comment-13708761 ] Zhijie Shen commented on YARN-744: -- The passed in appAttemptId for an app currently seems to be the same object, such that it can be used to for synchronized blocks, but I agree with the idea of wrapper, because it is more predictable and stand-alone in ApplicationMasterService. BTW, is it convenient to write a test case for concurrent allocation? Like TestClientRMService#testConcurrentAppSubmit. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709064#comment-13709064 ] Omkar Vinit Joshi commented on YARN-744: bq. BTW, is it convenient to write a test case for concurrent allocation? Like TestClientRMService#testConcurrentAppSubmit. yeah wrote one... bq. The passed in appAttemptId for an app currently seems to be the same object, such that it can be used to for synchronized blocks, but I agree with the idea of wrapper, because it is more predictable and stand-alone in ApplicationMasterService. locking on appAttemptId in case of allocate / RegisterApplicationMaster call won't work. They are coming from client...can't guarantee that they are identical in terms grabbing a lock.. thoughts? Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709072#comment-13709072 ] Zhijie Shen commented on YARN-744: -- bq. locking on appAttemptId in case of allocate / RegisterApplicationMaster call won't work. They are coming from client...can't guarantee that they are identical in terms grabbing a lock.. thoughts? I meant that AMRMClient uses the same appAttemptId, but the uniqueness is not guaranteed, so I agreed with the self-contained locker - wrapper. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709091#comment-13709091 ] Bikas Saha commented on YARN-744: - Why do we need a wrapper? We should not be locking on the app attempt id. We should try to find some internal RM object thats unique for the app attempt and lock on that. Also avoid locking the RMAttempImpl object itself since it will block internal async dispatcher. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709096#comment-13709096 ] Bikas Saha commented on YARN-744: - btw. it does not look like this is a practical problem. Until we start seeing a few instances of this happening we should probably lower the priority of this jira. I will do that now. Please change it if you think otherwise. A bug that does not manifest itself is not a bug :P Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709147#comment-13709147 ] Hadoop QA commented on YARN-744: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592426/YARN-744-20130715.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1485//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1485//console This message is automatically generated. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709304#comment-13709304 ] Omkar Vinit Joshi commented on YARN-744: bq. We should not be locking on the app attempt id. I am not locking on appAttemptId... or AppAttemptImpl...didn't understand your question. bq. Why do we need a wrapper? We don't have any explicit lock for an application attempt...I am creating a wrapped object to avoid maintaining per application attempt lock. Thereby across application attempt response we can lock on specific attempt. I think this is important as we may loose container than what were requested... Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706197#comment-13706197 ] Omkar Vinit Joshi commented on YARN-744: [~bikassaha] sounds reasonable ..will take a look at it again. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706554#comment-13706554 ] Hadoop QA commented on YARN-744: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591936/YARN-744-20130711.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1465//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1465//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1465//console This message is automatically generated. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695706#comment-13695706 ] Omkar Vinit Joshi commented on YARN-744: The problem here is that we retrieve the last response from resource map and then try to grab a lock on it. However after grabbing lock we don't check if the last response in resource map itself got updated or not. That results into a race condition which I am trying to solve here.. After grabbing the lock an additional check has to be made to ensure that lastResponse was not changed in between i.e. no other AM requests were processed. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695757#comment-13695757 ] Hadoop QA commented on YARN-744: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590077/YARN-744.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1404//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1404//console This message is automatically generated. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira