[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981997#comment-14981997 ] Tsuyoshi Ozawa commented on YARN-4312: -- +1, checking this in. * A failure of TestResourceTrackerService is reported as YARN-3580. Confirmed that it passes with the patch. I'll backport it. * A failure of TestClientRMTokens is reported as YARN-4306. * A failure of TestAMAuthorization looks to be not related to this issue since the reason of the failure is UnknownHostException. > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at >
[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails
[ https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981999#comment-14981999 ] Tsuyoshi Ozawa commented on YARN-3580: -- This problem is reproduced on JDK v1.7.0_79 on YARN-4312: https://issues.apache.org/jira/browse/YARN-4312?focusedCommentId=14979968=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14979968 Backporting this to 2.7.2. > [JDK 8] TestClientRMService.testGetLabelsToNodes fails > -- > > Key: YARN-3580 > URL: https://issues.apache.org/jira/browse/YARN-3580 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.8.0 > Environment: JDK 8 >Reporter: Robert Kanter >Assignee: Robert Kanter > Labels: jdk8 > Fix For: 2.8.0 > > Attachments: YARN-3580.001.patch > > > When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982032#comment-14982032 ] Tsuyoshi Ozawa commented on YARN-4312: -- 2.6.2 is releasing. For now, committing this to branch-2.7 and backport this to branch-2.6 after releasing 2.6.3. > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at >
[jira] [Created] (YARN-4321) Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode
Varun Saxena created YARN-4321: -- Summary: Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode Key: YARN-4321 URL: https://issues.apache.org/jira/browse/YARN-4321 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.1 Reporter: Varun Saxena Assignee: Varun Saxena This applies to only branch-2.7 or earlier code. When a {{NoAuthException}} is thrown in non HA mode(like in the scenario of YARN-4127), RM incessantly keeps on retrying the ZK operation. {noformat} 2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree (DataTree.java:processTxn(949)) - Ignoring processTxn failure hdr: -1 : error: -102 2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)] zookeeper.ClientCnxn (ClientCnxn.java:readResponse(818)) - Reading reply sessionid:0x15092d1ebe10001, packet:: clientPath:null serverPath:null finished:false header:: 7591,1 replyHeader:: 7591,7610,-102 request:: '/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0 response:: 2015-10-23 09:22:10,210 INFO [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(645)) - Got user-level KeeperException when processing sessionid:0x15092d1ebe10001 type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error Path:null Error:KeeperErrorCode = NoAuth {noformat} This is because we do not handle NoAuthException properly in branch-2.7 code when HA is not enabled. In {{ZKRMStateStore#runWithRetries}}, we have code as under. As can be seen if HA is not enabled, we neither rethrow NoAuthException nor do we have any logic to increment retries and back out if retries are maxed out. {code} T runWithRetries() throws Exception { int retry = 0; while (true) { try { return runWithCheck(); } catch (KeeperException.NoAuthException nae) { if (HAUtil.isHAEnabled(getConfig())) { // NoAuthException possibly means that this store is fenced due to // another RM becoming active. Even if not, // it is safer to assume we have been fenced throw new StoreFencedException(); } } catch (KeeperException ke) { . } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188
[ https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982169#comment-14982169 ] Tsuyoshi Ozawa commented on YARN-4320: -- +1, checking this in. > TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to > default port 8188 > --- > > Key: YARN-4320 > URL: https://issues.apache.org/jira/browse/YARN-4320 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4320.01.patch > > > {noformat} > Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 35.764 sec <<< ERROR! > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails
[ https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982186#comment-14982186 ] Hudson commented on YARN-3580: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2547 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2547/]) Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev d2e01f4ed87c3c41156ec9a68855f923f8c0adf9) * hadoop-yarn-project/CHANGES.txt > [JDK 8] TestClientRMService.testGetLabelsToNodes fails > -- > > Key: YARN-3580 > URL: https://issues.apache.org/jira/browse/YARN-3580 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.8.0 > Environment: JDK 8 >Reporter: Robert Kanter >Assignee: Robert Kanter > Labels: jdk8 > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-3580.001.patch > > > When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188
[ https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4320: --- Summary: TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188 (was: TestJobHistoryEventHandler fails on trunk as MiniYarnCluster no longer binds to default port 8188) > TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to > default port 8188 > --- > > Key: YARN-4320 > URL: https://issues.apache.org/jira/browse/YARN-4320 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4320.01.patch > > > {noformat} > Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 35.764 sec <<< ERROR! > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4312: - Hadoop Flags: Reviewed > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at >
[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982047#comment-14982047 ] Tsuyoshi Ozawa commented on YARN-4312: -- [~varun_saxena] thank you for your contribution! > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at >
[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails
[ https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982015#comment-14982015 ] Tsuyoshi Ozawa commented on YARN-3580: -- Done. > [JDK 8] TestClientRMService.testGetLabelsToNodes fails > -- > > Key: YARN-3580 > URL: https://issues.apache.org/jira/browse/YARN-3580 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.8.0 > Environment: JDK 8 >Reporter: Robert Kanter >Assignee: Robert Kanter > Labels: jdk8 > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-3580.001.patch > > > When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4313) Race condition in MiniMRYarnCluster when getting history server address
[ https://issues.apache.org/jira/browse/YARN-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982154#comment-14982154 ] Hudson commented on YARN-4313: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2491 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2491/]) YARN-4313. Race condition in MiniMRYarnCluster when getting history (xgong: rev 7412ff48eeb967c972c19c1370c77a41c5b3b81f) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/MiniMRYarnCluster.java * hadoop-yarn-project/CHANGES.txt > Race condition in MiniMRYarnCluster when getting history server address > --- > > Key: YARN-4313 > URL: https://issues.apache.org/jira/browse/YARN-4313 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-4313.1.patch, YARN-4313.2.patch > > > Problem in this place when waiting for JHS to be started > {code} > new Thread() { > public void run() { > historyServer.start(); > }; > }.start(); > while (historyServer.getServiceState() == STATE.INITED) { > LOG.info("Waiting for HistoryServer to start..."); > Thread.sleep(1500); > } > {code} > The service state is updated before the service is actually started. See > AbstractServic#start. So it's possible that when the while loop breaks, the > service is not yet started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode
[ https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982155#comment-14982155 ] Hudson commented on YARN-4127: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2491 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2491/]) YARN-4127. RM fail with noAuth error if switched from failover to (jianhe: rev e5b1733e049dc0f1859b93618354e049a0efdc4a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java > RM fail with noAuth error if switched from failover mode to non-failover mode > -- > > Key: YARN-4127 > URL: https://issues.apache.org/jira/browse/YARN-4127 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Jian He >Assignee: Varun Saxena > Attachments: YARN-4127-branch-2.7.01.patch, YARN-4127.01.patch, > YARN-4127.02.patch > > > The scenario is that RM failover was initially enabled, so the zkRootNodeAcl > is by default set with the *RM ID* in the ACL string > If RM failover is then switched to be disabled, it cannot load data from ZK > and fail with noAuth error. After I reset the root node ACL, it again can > access. > {code} > 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194) > {code} > the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to > connect with ZK and thus fail with no Auth error. > We should be able to switch failover on and off with no interruption to the > user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188
[ https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4320: - Hadoop Flags: Reviewed > TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to > default port 8188 > --- > > Key: YARN-4320 > URL: https://issues.apache.org/jira/browse/YARN-4320 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4320.01.patch > > > {noformat} > Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 35.764 sec <<< ERROR! > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails
[ https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3580: - Fix Version/s: 2.7.2 > [JDK 8] TestClientRMService.testGetLabelsToNodes fails > -- > > Key: YARN-3580 > URL: https://issues.apache.org/jira/browse/YARN-3580 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.8.0 > Environment: JDK 8 >Reporter: Robert Kanter >Assignee: Robert Kanter > Labels: jdk8 > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-3580.001.patch > > > When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails
[ https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982161#comment-14982161 ] Hudson commented on YARN-3580: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #605 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/605/]) Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev d2e01f4ed87c3c41156ec9a68855f923f8c0adf9) * hadoop-yarn-project/CHANGES.txt > [JDK 8] TestClientRMService.testGetLabelsToNodes fails > -- > > Key: YARN-3580 > URL: https://issues.apache.org/jira/browse/YARN-3580 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.8.0 > Environment: JDK 8 >Reporter: Robert Kanter >Assignee: Robert Kanter > Labels: jdk8 > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-3580.001.patch > > > When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4313) Race condition in MiniMRYarnCluster when getting history server address
[ https://issues.apache.org/jira/browse/YARN-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981994#comment-14981994 ] Hudson commented on YARN-4313: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #553 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/553/]) YARN-4313. Race condition in MiniMRYarnCluster when getting history (xgong: rev 7412ff48eeb967c972c19c1370c77a41c5b3b81f) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/MiniMRYarnCluster.java * hadoop-yarn-project/CHANGES.txt > Race condition in MiniMRYarnCluster when getting history server address > --- > > Key: YARN-4313 > URL: https://issues.apache.org/jira/browse/YARN-4313 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-4313.1.patch, YARN-4313.2.patch > > > Problem in this place when waiting for JHS to be started > {code} > new Thread() { > public void run() { > historyServer.start(); > }; > }.start(); > while (historyServer.getServiceState() == STATE.INITED) { > LOG.info("Waiting for HistoryServer to start..."); > Thread.sleep(1500); > } > {code} > The service state is updated before the service is actually started. See > AbstractServic#start. So it's possible that when the while loop breaks, the > service is not yet started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode
[ https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981995#comment-14981995 ] Hudson commented on YARN-4127: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #553 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/553/]) YARN-4127. RM fail with noAuth error if switched from failover to (jianhe: rev e5b1733e049dc0f1859b93618354e049a0efdc4a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java > RM fail with noAuth error if switched from failover mode to non-failover mode > -- > > Key: YARN-4127 > URL: https://issues.apache.org/jira/browse/YARN-4127 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Jian He >Assignee: Varun Saxena > Attachments: YARN-4127-branch-2.7.01.patch, YARN-4127.01.patch, > YARN-4127.02.patch > > > The scenario is that RM failover was initially enabled, so the zkRootNodeAcl > is by default set with the *RM ID* in the ACL string > If RM failover is then switched to be disabled, it cannot load data from ZK > and fail with noAuth error. After I reset the root node ACL, it again can > access. > {code} > 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194) > {code} > the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to > connect with ZK and thus fail with no Auth error. > We should be able to switch failover on and off with no interruption to the > user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981993#comment-14981993 ] Hudson commented on YARN-4183: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #553 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/553/]) YARN-4183. Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode
[ https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982027#comment-14982027 ] Varun Saxena commented on YARN-4127: [~jianhe] bq. however for the branch-2.7 patch, if I run the test case without the core change, the test will keep in a loop and not finish. could you take a look ? This is because we do not handle NoAuth exception properly in branch-2.7 code when HA is not enabled. In ZKRMStateStore#runWithRetries, we have code as under. As can be seen if HA is not enabled, we neither rethrow NoAuthException nor do we have any logic increment retries and back out if retries are maxed out. With fix in this patch, probably NoAuth will never come until and unless someone changes it from CLI. I will go ahead and file another JIRA. {code} T runWithRetries() throws Exception { int retry = 0; while (true) { try { return runWithCheck(); } catch (KeeperException.NoAuthException nae) { if (HAUtil.isHAEnabled(getConfig())) { // NoAuthException possibly means that this store is fenced due to // another RM becoming active. Even if not, // it is safer to assume we have been fenced throw new StoreFencedException(); } } catch (KeeperException ke) { . } } } {code} > RM fail with noAuth error if switched from failover mode to non-failover mode > -- > > Key: YARN-4127 > URL: https://issues.apache.org/jira/browse/YARN-4127 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Jian He >Assignee: Varun Saxena > Attachments: YARN-4127-branch-2.7.01.patch, YARN-4127.01.patch, > YARN-4127.02.patch > > > The scenario is that RM failover was initially enabled, so the zkRootNodeAcl > is by default set with the *RM ID* in the ACL string > If RM failover is then switched to be disabled, it cannot load data from ZK > and fail with noAuth error. After I reset the root node ACL, it again can > access. > {code} > 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194) > {code} > the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to > connect with ZK and thus fail with no Auth error.
[jira] [Commented] (YARN-4317) Test failure: TestResourceTrackerService
[ https://issues.apache.org/jira/browse/YARN-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981965#comment-14981965 ] Tsuyoshi Ozawa commented on YARN-4317: -- {quote} java.lang.AssertionError: expected:<15360> but was:<10240> {quote} Found that this is a another issue. Keep to open. > Test failure: TestResourceTrackerService > - > > Key: YARN-4317 > URL: https://issues.apache.org/jira/browse/YARN-4317 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa > > {quote} > Running > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.438 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService > testReconnectNode(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) > Time elapsed: 0.114 sec <<< FAILURE! > java.lang.AssertionError: expected:<15360> but was:<10240> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testReconnectNode(TestResourceTrackerService.java:624) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails
[ https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982115#comment-14982115 ] Hudson commented on YARN-3580: -- FAILURE: Integrated in Hadoop-trunk-Commit #8728 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8728/]) Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev d2e01f4ed87c3c41156ec9a68855f923f8c0adf9) * hadoop-yarn-project/CHANGES.txt > [JDK 8] TestClientRMService.testGetLabelsToNodes fails > -- > > Key: YARN-3580 > URL: https://issues.apache.org/jira/browse/YARN-3580 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.8.0 > Environment: JDK 8 >Reporter: Robert Kanter >Assignee: Robert Kanter > Labels: jdk8 > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-3580.001.patch > > > When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1510) Make NMClient support change container resources
[ https://issues.apache.org/jira/browse/YARN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983592#comment-14983592 ] Jian He commented on YARN-1510: --- lgtm, only one comment: - NMClientAsyncImpl # increaseContainerResourceAsync I think INCREASE_CONTAINER_RESOURCE event may happen at DONE and FAILED state, in which case InvalidEventTransitionException may be thrown. > Make NMClient support change container resources > > > Key: YARN-1510 > URL: https://issues.apache.org/jira/browse/YARN-1510 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1510-YARN-1197.1.patch, > YARN-1510-YARN-1197.2.patch, YARN-1510.3.patch, YARN-1510.4.patch, > YARN-1510.5.patch, YARN-1510.6.patch > > > As described in YARN-1197, YARN-1449, we need add API in NMClient to support > 1) sending request of increase/decrease container resource limits > 2) get succeeded/failed changed containers response from NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834
[ https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983627#comment-14983627 ] Vinod Kumar Vavilapalli commented on YARN-4032: --- [~adhoot] / [~jianhe] / [~kasha], any update on this? Considering this for a 2.7.2 RC this weekend. Unless I hear otherwise, I'll move it out to 2.7.3 assuming this needs more time. Thanks. > Corrupted state from a previous version can still cause RM to fail with NPE > due to same reasons as YARN-2834 > > > Key: YARN-4032 > URL: https://issues.apache.org/jira/browse/YARN-4032 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4032.prelim.patch > > > YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if > someone is upgrading from a previous version, the state can still be > inconsistent and then RM will still fail with NPE after upgrade to 2.6.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails
[ https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982262#comment-14982262 ] Hudson commented on YARN-3580: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1340 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1340/]) Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev d2e01f4ed87c3c41156ec9a68855f923f8c0adf9) * hadoop-yarn-project/CHANGES.txt > [JDK 8] TestClientRMService.testGetLabelsToNodes fails > -- > > Key: YARN-3580 > URL: https://issues.apache.org/jira/browse/YARN-3580 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.8.0 > Environment: JDK 8 >Reporter: Robert Kanter >Assignee: Robert Kanter > Labels: jdk8 > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-3580.001.patch > > > When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails
[ https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982273#comment-14982273 ] Hudson commented on YARN-3580: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #617 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/617/]) Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev d2e01f4ed87c3c41156ec9a68855f923f8c0adf9) * hadoop-yarn-project/CHANGES.txt > [JDK 8] TestClientRMService.testGetLabelsToNodes fails > -- > > Key: YARN-3580 > URL: https://issues.apache.org/jira/browse/YARN-3580 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.8.0 > Environment: JDK 8 >Reporter: Robert Kanter >Assignee: Robert Kanter > Labels: jdk8 > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-3580.001.patch > > > When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4314) Adding container wait time as a metric at queue level and application level.
[ https://issues.apache.org/jira/browse/YARN-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982340#comment-14982340 ] Raju Bairishetti commented on YARN-4314: I feel adding timestamp to each resource request will be costly and all the existing applications will need to migrate to use this metric. Had a discussion with [~sriksun] earlier about this approach. Resource request is prepared by AM. In future if we want to use this timestamp as priority for allocating resources then there is a chance that user/AM can misuse the system by saying they have older time stamps. Thinking about this approach: AppSchedulingInfo has all the scheduling info about an application. When RM receives first resource a request from AM then RM can note down the system time as resource request time. Whenever new request comes(i.e. UpdateResourceRequest() in AppSchedulingInfo or allocate()) then we can measure how many containers were waited till this time from the last request time. We can mostly listen on the container request & allocate events. I will put up a detailed doc with all my thoughts & approaches. > Adding container wait time as a metric at queue level and application level. > > > Key: YARN-4314 > URL: https://issues.apache.org/jira/browse/YARN-4314 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > > There is a need for adding the container wait-time which can be tracked at > the queue and application level. > An application can have two kinds of wait times. One is AM wait time after > submission and another is total container wait time between AM asking for > containers and getting them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188
[ https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982216#comment-14982216 ] Hudson commented on YARN-4320: -- FAILURE: Integrated in Hadoop-trunk-Commit #8729 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8729/]) YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no (ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3) * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java > TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to > default port 8188 > --- > > Key: YARN-4320 > URL: https://issues.apache.org/jira/browse/YARN-4320 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4320.01.patch > > > {noformat} > Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 35.764 sec <<< ERROR! > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982217#comment-14982217 ] Hudson commented on YARN-4312: -- FAILURE: Integrated in Hadoop-trunk-Commit #8729 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8729/]) Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev d21214ce33cb176926aa3ae5a9f4efe00f66480b) * hadoop-yarn-project/CHANGES.txt > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.2 > > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at >
[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails
[ https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982294#comment-14982294 ] Hudson commented on YARN-3580: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #554 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/554/]) Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev d2e01f4ed87c3c41156ec9a68855f923f8c0adf9) * hadoop-yarn-project/CHANGES.txt > [JDK 8] TestClientRMService.testGetLabelsToNodes fails > -- > > Key: YARN-3580 > URL: https://issues.apache.org/jira/browse/YARN-3580 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.8.0 > Environment: JDK 8 >Reporter: Robert Kanter >Assignee: Robert Kanter > Labels: jdk8 > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-3580.001.patch > > > When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983732#comment-14983732 ] Hadoop QA commented on YARN-4132: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 7s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 15s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk cannot run convertXmlToText from findbugs {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 8s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 24s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 247, now 247). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 12s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 50s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 45s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 1s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.7.0_79. {color} | |
[jira] [Commented] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834
[ https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983682#comment-14983682 ] Jian He commented on YARN-4032: --- The problem in YARN-2834 is that if there is an app existing in state-store that: - app state = final state - attempt state = null RM will fail with NPE on recovery. One approach is to delete this inconsistent state app from state-store, is that considered ? Regarding the patch, it captures all exception in app.recover and return FAILED. If the application previously ended as FINISHED, the app is changed to FAILD, which I think is inconsistent to user. Also, this exception will happen again and again whenever RM gets restarted. I think what we can do is to check whether app is at FINAL state in RMAppAttemptImpl#AttemptRecoveredTransition, skip adding attempt into scheduler if it is. > Corrupted state from a previous version can still cause RM to fail with NPE > due to same reasons as YARN-2834 > > > Key: YARN-4032 > URL: https://issues.apache.org/jira/browse/YARN-4032 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4032.prelim.patch > > > YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if > someone is upgrading from a previous version, the state can still be > inconsistent and then RM will still fail with NPE after upgrade to 2.6.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4323) AMRMClient does not respect SchedulerResourceTypes post YARN-2448
[ https://issues.apache.org/jira/browse/YARN-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-4323: -- Affects Version/s: 2.6.0 > AMRMClient does not respect SchedulerResourceTypes post YARN-2448 > - > > Key: YARN-4323 > URL: https://issues.apache.org/jira/browse/YARN-4323 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Hitesh Shah > > Given that the RM now informs the AM of the resources it supports, AMRMClient > should be changed to match correctly by normalizing the invalid resource > types. > i.e. AMRMClient::getMatchingRequests() should correctly return back matches > by only looking at the resource types that are valid. > \cc [~vvasudev] [~bikassaha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2934: Attachment: YARN-2934.v1.001.patch > Improve handling of container's stderr > --- > > Key: YARN-2934 > URL: https://issues.apache.org/jira/browse/YARN-2934 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gera Shegalov >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2934.v1.001.patch > > > Most YARN applications redirect stderr to some file. That's why when > container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4163) Audit getQueueInfo and getApplications calls
[ https://issues.apache.org/jira/browse/YARN-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983771#comment-14983771 ] Hadoop QA commented on YARN-4163: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 51, now 50). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 1s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 46s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 145m 54s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.0 Server=1.7.0 Image:test-patch-base-hadoop-date2015-10-31 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12769879/YARN-4163.2.patch | | JIRA Issue | YARN-4163 | | Optional Tests | asflicense javac
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983840#comment-14983840 ] Naganarasimha G R commented on YARN-2934: - [~jira.shegalov], Apologies from my side, missed to update on this jira, thanks [~nijel] for pushing me on this ! I have attached an initial version of the patch which suffices but following points are open for discussion # Errorfilename is sent from the application/client hence name can be any thing. Currently have used WildCard Matcher on fileName and the pattern used is "*stderr*" and also made it configurable. If the admin wants to attach multiple types then Regex matching would be required, but regex pattern is not similar to unix pattern hence for the sake of simplicity have kept Wildcard. Thoughts? # Or other way round may be i can support as you were mentioning like, support ApplicationSubmissionContext to expose interface for clients to inform the error filename and on error use this name directly if not present then use the pattern matching. But my concern is what if the error file name mentioned as part of interface is not matching the execution command sent !. # Also [~bikassaha] mentioned in his [comment| https://issues.apache.org/jira/browse/YARN-3911?focusedCommentId=14959579=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14959579], do i need to additionally check for syslog also? if so only if the syserr doesnt exist ? and similarly what approaches would be ideal for identifying the name of the log4j file? # have hard coded tail size to be 4k, is it required to be configurable ? > Improve handling of container's stderr > --- > > Key: YARN-2934 > URL: https://issues.apache.org/jira/browse/YARN-2934 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gera Shegalov >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2934.v1.001.patch > > > Most YARN applications redirect stderr to some file. That's why when > container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3911) Add tail of stderr to diagnostics if container fails to launch or it container logs are empty
[ https://issues.apache.org/jira/browse/YARN-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983841#comment-14983841 ] Naganarasimha G R commented on YARN-3911: - Hi [~bikassaha], I have attached a initial patch in YARN-2934. Hope for feedback on this there, and i think i close this jira as duplicate. Thoughts ? > Add tail of stderr to diagnostics if container fails to launch or it > container logs are empty > - > > Key: YARN-3911 > URL: https://issues.apache.org/jira/browse/YARN-3911 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha > > The stderr may have useful info in those cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4323) AMRMClient does not respect SchedulerResourceTypes post YARN-2448
Hitesh Shah created YARN-4323: - Summary: AMRMClient does not respect SchedulerResourceTypes post YARN-2448 Key: YARN-4323 URL: https://issues.apache.org/jira/browse/YARN-4323 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Given that the RM now informs the AM of the resources it supports, AMRMClient should be changed to match correctly by normalizing the invalid resource types. i.e. AMRMClient::getMatchingRequests() should correctly return back matches by only looking at the resource types that are valid. \cc [~vvasudev] [~bikassaha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3911) Add tail of stderr to diagnostics if container fails to launch or it container logs are empty
[ https://issues.apache.org/jira/browse/YARN-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983845#comment-14983845 ] Bikas Saha commented on YARN-3911: -- Sure > Add tail of stderr to diagnostics if container fails to launch or it > container logs are empty > - > > Key: YARN-3911 > URL: https://issues.apache.org/jira/browse/YARN-3911 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha > > The stderr may have useful info in those cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4322) local framwork + io.ReadaheadPool is throwing Failed readahead on ifile EBADF: Bad file descriptor
Mohammad Shahid Khan created YARN-4322: -- Summary: local framwork + io.ReadaheadPool is throwing Failed readahead on ifile EBADF: Bad file descriptor Key: YARN-4322 URL: https://issues.apache.org/jira/browse/YARN-4322 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Environment: Linux 2.6.32.12-0.7-default x86_64 Reporter: Mohammad Shahid Khan run the pi job with 100 map and 2 reduce io.ReadaheadPool is throwing Failed readahead on ifile EBADF: Bad file descriptor stacktrace {code} 15/10/30 16:47:23 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 24, inMemoryMapOutputs.size() -> 89, commitMemory -> 2112, usedMemory ->2136 15/10/30 16:47:23 WARN io.ReadaheadPool: Failed readahead on ifile EBADF: Bad file descriptor at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 15/10/30 16:47:23 WARN io.ReadaheadPool: Failed readahead on ifile EBADF: Bad file descriptor at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 15/10/30 16:47:23 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local108678459_0001_m_86_0 decomp: 24 len: 28 to MEMORY {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982518#comment-14982518 ] Hudson commented on YARN-4312: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #618 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/618/]) Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev d21214ce33cb176926aa3ae5a9f4efe00f66480b) * hadoop-yarn-project/CHANGES.txt > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.2 > > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at >
[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188
[ https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982517#comment-14982517 ] Hudson commented on YARN-4320: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #618 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/618/]) YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no (ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/CHANGES.txt > TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to > default port 8188 > --- > > Key: YARN-4320 > URL: https://issues.apache.org/jira/browse/YARN-4320 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4320.01.patch > > > {noformat} > Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 35.764 sec <<< ERROR! > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188
[ https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982540#comment-14982540 ] Hudson commented on YARN-4320: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2492 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2492/]) YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no (ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java * hadoop-yarn-project/CHANGES.txt > TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to > default port 8188 > --- > > Key: YARN-4320 > URL: https://issues.apache.org/jira/browse/YARN-4320 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4320.01.patch > > > {noformat} > Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 35.764 sec <<< ERROR! > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188
[ https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982486#comment-14982486 ] Hudson commented on YARN-4320: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1341 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1341/]) YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no (ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java * hadoop-yarn-project/CHANGES.txt > TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to > default port 8188 > --- > > Key: YARN-4320 > URL: https://issues.apache.org/jira/browse/YARN-4320 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4320.01.patch > > > {noformat} > Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 35.764 sec <<< ERROR! > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188
[ https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982594#comment-14982594 ] Varun Saxena commented on YARN-4320: Thanks [~ozawa] for the review and commit. > TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to > default port 8188 > --- > > Key: YARN-4320 > URL: https://issues.apache.org/jira/browse/YARN-4320 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4320.01.patch > > > {noformat} > Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 35.764 sec <<< ERROR! > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188
[ https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982478#comment-14982478 ] Hudson commented on YARN-4320: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #606 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/606/]) YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no (ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java > TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to > default port 8188 > --- > > Key: YARN-4320 > URL: https://issues.apache.org/jira/browse/YARN-4320 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4320.01.patch > > > {noformat} > Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 35.764 sec <<< ERROR! > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982479#comment-14982479 ] Hudson commented on YARN-4312: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #606 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/606/]) Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev d21214ce33cb176926aa3ae5a9f4efe00f66480b) * hadoop-yarn-project/CHANGES.txt > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.2 > > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at >
[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982511#comment-14982511 ] Hudson commented on YARN-4312: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2548 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2548/]) Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev d21214ce33cb176926aa3ae5a9f4efe00f66480b) * hadoop-yarn-project/CHANGES.txt > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.2 > > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at >
[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188
[ https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982510#comment-14982510 ] Hudson commented on YARN-4320: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2548 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2548/]) YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no (ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java * hadoop-yarn-project/CHANGES.txt > TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to > default port 8188 > --- > > Key: YARN-4320 > URL: https://issues.apache.org/jira/browse/YARN-4320 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4320.01.patch > > > {noformat} > Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 35.764 sec <<< ERROR! > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails
[ https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982541#comment-14982541 ] Hudson commented on YARN-3580: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2492 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2492/]) Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev d2e01f4ed87c3c41156ec9a68855f923f8c0adf9) * hadoop-yarn-project/CHANGES.txt > [JDK 8] TestClientRMService.testGetLabelsToNodes fails > -- > > Key: YARN-3580 > URL: https://issues.apache.org/jira/browse/YARN-3580 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.8.0 > Environment: JDK 8 >Reporter: Robert Kanter >Assignee: Robert Kanter > Labels: jdk8 > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-3580.001.patch > > > When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails: > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982543#comment-14982543 ] Hudson commented on YARN-4312: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2492 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2492/]) Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev d21214ce33cb176926aa3ae5a9f4efe00f66480b) * hadoop-yarn-project/CHANGES.txt > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.2 > > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at >
[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982487#comment-14982487 ] Hudson commented on YARN-4312: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1341 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1341/]) Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev d21214ce33cb176926aa3ae5a9f4efe00f66480b) * hadoop-yarn-project/CHANGES.txt > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.2 > > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at >
[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982595#comment-14982595 ] Varun Saxena commented on YARN-4312: Thanks [~ozawa] for the review and commit. > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.2 > > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at >
[jira] [Commented] (YARN-4321) Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode
[ https://issues.apache.org/jira/browse/YARN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983556#comment-14983556 ] Hadoop QA commented on YARN-4321: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 8m 2s {color} | {color:red} root in branch-2.7 failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} branch-2.7 passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} branch-2.7 passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} branch-2.7 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 16s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in branch-2.7 cannot run convertXmlToText from findbugs {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} branch-2.7 passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} branch-2.7 passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 886 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 23s {color} | {color:red} The patch has 95 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 46s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 16s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 40m 41s {color} | {color:red} Patch generated 67 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 160m 39s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1
[jira] [Commented] (YARN-4314) Adding container wait time as a metric at queue level and application level.
[ https://issues.apache.org/jira/browse/YARN-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982718#comment-14982718 ] Allen Wittenauer commented on YARN-4314: bq. I feel adding timestamp to each resource request will be costly and all the existing applications will need to migrate to use this metric. The fact that the RM doesn't keep track of timestamps on containers at all is particularly annoying when one wants to implement certain types of scheduling policies (e.g., kill containers that are X old). I think it's inevitable they are going to be required if not now, later. > Adding container wait time as a metric at queue level and application level. > > > Key: YARN-4314 > URL: https://issues.apache.org/jira/browse/YARN-4314 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > > There is a need for adding the container wait-time which can be tracked at > the queue and application level. > An application can have two kinds of wait times. One is AM wait time after > submission and another is total container wait time between AM asking for > containers and getting them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out
[ https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982721#comment-14982721 ] Hudson commented on YARN-4312: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #555 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/555/]) Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev d21214ce33cb176926aa3ae5a9f4efe00f66480b) * hadoop-yarn-project/CHANGES.txt > TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of > the test cases time out > > > Key: YARN-4312 > URL: https://issues.apache.org/jira/browse/YARN-4312 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1, 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.2 > > Attachments: YARN-4312-branch-2.6.01.patch, > YARN-4312-branch-2.7.01.patch > > > These timeouts happen because we do ZK sync operation on RM startup after > YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small > for a couple of tests in TestSubmitApplicationWithRMHA. > {noformat} > testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.162 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234) > > testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) > Time elapsed: 5.146 sec <<< ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320) > at >
[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188
[ https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982720#comment-14982720 ] Hudson commented on YARN-4320: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #555 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/555/]) YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no (ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java * hadoop-yarn-project/CHANGES.txt > TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to > default port 8188 > --- > > Key: YARN-4320 > URL: https://issues.apache.org/jira/browse/YARN-4320 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4320.01.patch > > > {noformat} > Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 35.764 sec <<< ERROR! > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4132: --- Attachment: YARN-4132.4.patch update .4 patch to fix checkstyle and broken unit test > Nodemanagers should try harder to connect to the RM > --- > > Key: YARN-4132 > URL: https://issues.apache.org/jira/browse/YARN-4132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4132.2.patch, YARN-4132.3.patch, YARN-4132.4.patch, > YARN-4132.patch > > > Being part of the cluster, nodemanagers should try very hard (and possibly > never give up) to connect to a resourcemanager. Minimally we should have a > separate config to set how aggressively a nodemanager will connect to the RM > separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4163) Audit getQueueInfo and getApplications calls
[ https://issues.apache.org/jira/browse/YARN-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4163: --- Attachment: YARN-4163.2.patch > Audit getQueueInfo and getApplications calls > > > Key: YARN-4163 > URL: https://issues.apache.org/jira/browse/YARN-4163 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4163.2.patch, YARN-4163.2.patch, YARN-4163.patch > > > getQueueInfo and getApplications seem to sometimes cause spike of load but > not able to confirm due to they are not audit logged. This patch propose to > add them to audit log -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4314) Adding container wait time as a metric at queue level and application level.
[ https://issues.apache.org/jira/browse/YARN-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982221#comment-14982221 ] Lavkesh Lahngir commented on YARN-4314: --- Basically for an application we can track the time taken between container request and allocation and add them to queue-metrics and application level metrics. Also we need to track the AM container wait time separately.. I Will update the details when I have figured out the exact code path. [~raju.bairishetti] : Would you like to comment something? > Adding container wait time as a metric at queue level and application level. > > > Key: YARN-4314 > URL: https://issues.apache.org/jira/browse/YARN-4314 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > > There is a need for adding the container wait-time which can be tracked at > the queue and application level. > An application can have two kinds of wait times. One is AM wait time after > submission and another is total container wait time between AM asking for > containers and getting them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4321) Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode
[ https://issues.apache.org/jira/browse/YARN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4321: --- Attachment: YARN-4321-branch-2.7.01.patch > Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode > -- > > Key: YARN-4321 > URL: https://issues.apache.org/jira/browse/YARN-4321 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4321-branch-2.7.01.patch > > > This applies to only branch-2.7 or earlier code. > When a {{NoAuthException}} is thrown in non HA mode(like in the scenario of > YARN-4127), RM incessantly keeps on retrying the ZK operation. > {noformat} > 2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree > (DataTree.java:processTxn(949)) - Ignoring processTxn failure hdr: -1 : > error: -102 > 2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)] > zookeeper.ClientCnxn (ClientCnxn.java:readResponse(818)) - Reading reply > sessionid:0x15092d1ebe10001, packet:: clientPath:null serverPath:null > finished:false header:: 7591,1 replyHeader:: 7591,7610,-102 request:: > '/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0 response:: > 2015-10-23 09:22:10,210 INFO [ProcessThread(sid:0 cport:-1):] > server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(645)) - Got > user-level KeeperException when processing sessionid:0x15092d1ebe10001 > type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error Path:null > Error:KeeperErrorCode = NoAuth > {noformat} > This is because we do not handle NoAuthException properly in branch-2.7 code > when HA is not enabled. > In {{ZKRMStateStore#runWithRetries}}, we have code as under. As can be seen > if HA is not enabled, we neither rethrow NoAuthException nor do we have any > logic to increment retries and back out if retries are maxed out. > {code} > T runWithRetries() throws Exception { > int retry = 0; > while (true) { > try { > return runWithCheck(); > } catch (KeeperException.NoAuthException nae) { > if (HAUtil.isHAEnabled(getConfig())) { > // NoAuthException possibly means that this store is fenced due to > // another RM becoming active. Even if not, > // it is safer to assume we have been fenced > throw new StoreFencedException(); > } > } catch (KeeperException ke) { > . >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4321) Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode
[ https://issues.apache.org/jira/browse/YARN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983029#comment-14983029 ] Varun Saxena commented on YARN-4321: Straightforward fix. I think we dont need to retry for NoAuthException as the exception is unlikely to change even after retries. > Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode > -- > > Key: YARN-4321 > URL: https://issues.apache.org/jira/browse/YARN-4321 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4321-branch-2.7.01.patch > > > This applies to only branch-2.7 or earlier code. > When a {{NoAuthException}} is thrown in non HA mode(like in the scenario of > YARN-4127), RM incessantly keeps on retrying the ZK operation. > {noformat} > 2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree > (DataTree.java:processTxn(949)) - Ignoring processTxn failure hdr: -1 : > error: -102 > 2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)] > zookeeper.ClientCnxn (ClientCnxn.java:readResponse(818)) - Reading reply > sessionid:0x15092d1ebe10001, packet:: clientPath:null serverPath:null > finished:false header:: 7591,1 replyHeader:: 7591,7610,-102 request:: > '/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0 response:: > 2015-10-23 09:22:10,210 INFO [ProcessThread(sid:0 cport:-1):] > server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(645)) - Got > user-level KeeperException when processing sessionid:0x15092d1ebe10001 > type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error Path:null > Error:KeeperErrorCode = NoAuth > {noformat} > This is because we do not handle NoAuthException properly in branch-2.7 code > when HA is not enabled. > In {{ZKRMStateStore#runWithRetries}}, we have code as under. As can be seen > if HA is not enabled, we neither rethrow NoAuthException nor do we have any > logic to increment retries and back out if retries are maxed out. > {code} > T runWithRetries() throws Exception { > int retry = 0; > while (true) { > try { > return runWithCheck(); > } catch (KeeperException.NoAuthException nae) { > if (HAUtil.isHAEnabled(getConfig())) { > // NoAuthException possibly means that this store is fenced due to > // another RM becoming active. Even if not, > // it is safer to assume we have been fenced > throw new StoreFencedException(); > } > } catch (KeeperException ke) { > . >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode
[ https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983030#comment-14983030 ] Varun Saxena commented on YARN-4127: [~jianhe], raised YARN-4321 for this issue. > RM fail with noAuth error if switched from failover mode to non-failover mode > -- > > Key: YARN-4127 > URL: https://issues.apache.org/jira/browse/YARN-4127 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Jian He >Assignee: Varun Saxena > Attachments: YARN-4127-branch-2.7.01.patch, YARN-4127.01.patch, > YARN-4127.02.patch > > > The scenario is that RM failover was initially enabled, so the zkRootNodeAcl > is by default set with the *RM ID* in the ACL string > If RM failover is then switched to be disabled, it cannot load data from ZK > and fail with noAuth error. After I reset the root node ACL, it again can > access. > {code} > 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194) > {code} > the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to > connect with ZK and thus fail with no Auth error. > We should be able to switch failover on and off with no interruption to the > user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)