[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981997#comment-14981997
 ] 

Tsuyoshi Ozawa commented on YARN-4312:
--

+1, checking this in. 

* A failure of TestResourceTrackerService is reported as YARN-3580. Confirmed 
that it passes with the patch. I'll backport it.
* A failure of TestClientRMTokens is reported as YARN-4306.
* A failure of TestAMAuthorization looks to be not related to this issue since 
the reason of the failure is UnknownHostException.

> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> 

[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails

2015-10-30 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981999#comment-14981999
 ] 

Tsuyoshi Ozawa commented on YARN-3580:
--

This problem is reproduced on JDK v1.7.0_79 on YARN-4312: 
https://issues.apache.org/jira/browse/YARN-4312?focusedCommentId=14979968=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14979968

Backporting this to 2.7.2.

> [JDK 8] TestClientRMService.testGetLabelsToNodes fails
> --
>
> Key: YARN-3580
> URL: https://issues.apache.org/jira/browse/YARN-3580
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
> Environment: JDK 8
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>  Labels: jdk8
> Fix For: 2.8.0
>
> Attachments: YARN-3580.001.patch
>
>
> When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982032#comment-14982032
 ] 

Tsuyoshi Ozawa commented on YARN-4312:
--

2.6.2 is releasing. For now, committing this to branch-2.7 and backport this to 
branch-2.6 after releasing 2.6.3.


> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> 

[jira] [Created] (YARN-4321) Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode

2015-10-30 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4321:
--

 Summary: Incessant retries if NoAuthException is thrown by 
Zookeeper in non HA mode
 Key: YARN-4321
 URL: https://issues.apache.org/jira/browse/YARN-4321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Varun Saxena
Assignee: Varun Saxena


This applies to only branch-2.7 or earlier code.
When a {{NoAuthException}} is thrown in non HA mode(like in the scenario of 
YARN-4127), RM incessantly keeps on retrying the ZK operation.

{noformat}
2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree 
(DataTree.java:processTxn(949)) - Ignoring processTxn failure hdr: -1 : error: 
-102
2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)] 
zookeeper.ClientCnxn (ClientCnxn.java:readResponse(818)) - Reading reply 
sessionid:0x15092d1ebe10001, packet:: clientPath:null serverPath:null 
finished:false header:: 7591,1  replyHeader:: 7591,7610,-102  request:: 
'/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0  response::
2015-10-23 09:22:10,210 INFO  [ProcessThread(sid:0 cport:-1):] 
server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(645)) - Got 
user-level KeeperException when processing sessionid:0x15092d1ebe10001 
type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error Path:null 
Error:KeeperErrorCode = NoAuth
{noformat}

This is because we do not handle NoAuthException properly in branch-2.7 code 
when HA is not enabled.
In {{ZKRMStateStore#runWithRetries}}, we have code as under. As can be seen if 
HA is not enabled, we neither rethrow NoAuthException nor do we have any logic 
to increment retries and back out if retries are maxed out.
{code}
 T runWithRetries() throws Exception {
  int retry = 0;
  while (true) {
try {
  return runWithCheck();
} catch (KeeperException.NoAuthException nae) {
  if (HAUtil.isHAEnabled(getConfig())) {
// NoAuthException possibly means that this store is fenced due to
// another RM becoming active. Even if not,
// it is safer to assume we have been fenced
throw new StoreFencedException();
  }
} catch (KeeperException ke) {
  .
   }
 }
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-10-30 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982169#comment-14982169
 ] 

Tsuyoshi Ozawa commented on YARN-4320:
--

+1, checking this in.

> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982186#comment-14982186
 ] 

Hudson commented on YARN-3580:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2547 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2547/])
Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev 
d2e01f4ed87c3c41156ec9a68855f923f8c0adf9)
* hadoop-yarn-project/CHANGES.txt


> [JDK 8] TestClientRMService.testGetLabelsToNodes fails
> --
>
> Key: YARN-3580
> URL: https://issues.apache.org/jira/browse/YARN-3580
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
> Environment: JDK 8
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>  Labels: jdk8
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-3580.001.patch
>
>
> When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-10-30 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4320:
---
Summary: TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no 
longer binds to default port 8188  (was: TestJobHistoryEventHandler fails on 
trunk as MiniYarnCluster no longer binds to default port 8188)

> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4312:
-
Hadoop Flags: Reviewed

> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> 

[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982047#comment-14982047
 ] 

Tsuyoshi Ozawa commented on YARN-4312:
--

[~varun_saxena] thank you for your contribution! 

> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> 

[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails

2015-10-30 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982015#comment-14982015
 ] 

Tsuyoshi Ozawa commented on YARN-3580:
--

Done. 

> [JDK 8] TestClientRMService.testGetLabelsToNodes fails
> --
>
> Key: YARN-3580
> URL: https://issues.apache.org/jira/browse/YARN-3580
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
> Environment: JDK 8
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>  Labels: jdk8
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-3580.001.patch
>
>
> When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4313) Race condition in MiniMRYarnCluster when getting history server address

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982154#comment-14982154
 ] 

Hudson commented on YARN-4313:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2491 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2491/])
YARN-4313. Race condition in MiniMRYarnCluster when getting history (xgong: rev 
7412ff48eeb967c972c19c1370c77a41c5b3b81f)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/MiniMRYarnCluster.java
* hadoop-yarn-project/CHANGES.txt


> Race condition in MiniMRYarnCluster when getting history server address
> ---
>
> Key: YARN-4313
> URL: https://issues.apache.org/jira/browse/YARN-4313
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4313.1.patch, YARN-4313.2.patch
>
>
> Problem in this place when waiting for JHS to be started
> {code}
> new Thread() {
>   public void run() {
> historyServer.start();
>   };
> }.start();
> while (historyServer.getServiceState() == STATE.INITED) {
>   LOG.info("Waiting for HistoryServer to start...");
>   Thread.sleep(1500);
> }
> {code}
> The service state is updated before the service is actually started. See 
> AbstractServic#start.  So it's possible that when the while loop breaks, the 
> service is not yet started. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982155#comment-14982155
 ] 

Hudson commented on YARN-4127:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2491 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2491/])
YARN-4127. RM fail with noAuth error if switched from failover to (jianhe: rev 
e5b1733e049dc0f1859b93618354e049a0efdc4a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


> RM fail with noAuth error if switched from failover mode to non-failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127-branch-2.7.01.patch, YARN-4127.01.patch, 
> YARN-4127.02.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-10-30 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4320:
-
Hadoop Flags: Reviewed

> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails

2015-10-30 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3580:
-
Fix Version/s: 2.7.2

> [JDK 8] TestClientRMService.testGetLabelsToNodes fails
> --
>
> Key: YARN-3580
> URL: https://issues.apache.org/jira/browse/YARN-3580
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
> Environment: JDK 8
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>  Labels: jdk8
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-3580.001.patch
>
>
> When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982161#comment-14982161
 ] 

Hudson commented on YARN-3580:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #605 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/605/])
Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev 
d2e01f4ed87c3c41156ec9a68855f923f8c0adf9)
* hadoop-yarn-project/CHANGES.txt


> [JDK 8] TestClientRMService.testGetLabelsToNodes fails
> --
>
> Key: YARN-3580
> URL: https://issues.apache.org/jira/browse/YARN-3580
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
> Environment: JDK 8
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>  Labels: jdk8
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-3580.001.patch
>
>
> When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4313) Race condition in MiniMRYarnCluster when getting history server address

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981994#comment-14981994
 ] 

Hudson commented on YARN-4313:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #553 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/553/])
YARN-4313. Race condition in MiniMRYarnCluster when getting history (xgong: rev 
7412ff48eeb967c972c19c1370c77a41c5b3b81f)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/MiniMRYarnCluster.java
* hadoop-yarn-project/CHANGES.txt


> Race condition in MiniMRYarnCluster when getting history server address
> ---
>
> Key: YARN-4313
> URL: https://issues.apache.org/jira/browse/YARN-4313
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4313.1.patch, YARN-4313.2.patch
>
>
> Problem in this place when waiting for JHS to be started
> {code}
> new Thread() {
>   public void run() {
> historyServer.start();
>   };
> }.start();
> while (historyServer.getServiceState() == STATE.INITED) {
>   LOG.info("Waiting for HistoryServer to start...");
>   Thread.sleep(1500);
> }
> {code}
> The service state is updated before the service is actually started. See 
> AbstractServic#start.  So it's possible that when the while loop breaks, the 
> service is not yet started. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981995#comment-14981995
 ] 

Hudson commented on YARN-4127:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #553 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/553/])
YARN-4127. RM fail with noAuth error if switched from failover to (jianhe: rev 
e5b1733e049dc0f1859b93618354e049a0efdc4a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java


> RM fail with noAuth error if switched from failover mode to non-failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127-branch-2.7.01.patch, YARN-4127.01.patch, 
> YARN-4127.02.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981993#comment-14981993
 ] 

Hudson commented on YARN-4183:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #553 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/553/])
YARN-4183. Enabling generic application history forces every job to get 
(jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java


> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 3.0.0, 2.8.0, 2.7.2
>
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode

2015-10-30 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982027#comment-14982027
 ] 

Varun Saxena commented on YARN-4127:


[~jianhe]
bq. however for the branch-2.7 patch, if I run the test case without the core 
change, the test will keep in a loop and not finish. could you take a look ?
This is because we do not handle NoAuth exception properly in branch-2.7 code 
when HA is not enabled.
In ZKRMStateStore#runWithRetries, we have code as under. As can be seen if HA 
is not enabled, we neither rethrow NoAuthException nor do we have any logic 
increment retries and back out if retries are maxed out.
With fix in this patch, probably NoAuth will never come until and unless 
someone changes it from CLI. I will go ahead and file another JIRA.
{code}
T runWithRetries() throws Exception {
  int retry = 0;
  while (true) {
try {
  return runWithCheck();
} catch (KeeperException.NoAuthException nae) {
  if (HAUtil.isHAEnabled(getConfig())) {
// NoAuthException possibly means that this store is fenced due to
// another RM becoming active. Even if not,
// it is safer to assume we have been fenced
throw new StoreFencedException();
  }
} catch (KeeperException ke) {
  .
   }
 }
  }
{code}

> RM fail with noAuth error if switched from failover mode to non-failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127-branch-2.7.01.patch, YARN-4127.01.patch, 
> YARN-4127.02.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.

[jira] [Commented] (YARN-4317) Test failure: TestResourceTrackerService

2015-10-30 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981965#comment-14981965
 ] 

Tsuyoshi Ozawa commented on YARN-4317:
--

{quote}
java.lang.AssertionError: expected:<15360> but was:<10240>
{quote}

Found that this is a another issue. Keep to open.

> Test failure: TestResourceTrackerService 
> -
>
> Key: YARN-4317
> URL: https://issues.apache.org/jira/browse/YARN-4317
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>
> {quote}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.438 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
> testReconnectNode(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<15360> but was:<10240>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testReconnectNode(TestResourceTrackerService.java:624)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982115#comment-14982115
 ] 

Hudson commented on YARN-3580:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8728 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8728/])
Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev 
d2e01f4ed87c3c41156ec9a68855f923f8c0adf9)
* hadoop-yarn-project/CHANGES.txt


> [JDK 8] TestClientRMService.testGetLabelsToNodes fails
> --
>
> Key: YARN-3580
> URL: https://issues.apache.org/jira/browse/YARN-3580
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
> Environment: JDK 8
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>  Labels: jdk8
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-3580.001.patch
>
>
> When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1510) Make NMClient support change container resources

2015-10-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983592#comment-14983592
 ] 

Jian He commented on YARN-1510:
---

lgtm, only one comment:
- NMClientAsyncImpl # increaseContainerResourceAsync
I think INCREASE_CONTAINER_RESOURCE event may happen at DONE and FAILED state, 
in which case InvalidEventTransitionException may be thrown. 

> Make NMClient support change container resources
> 
>
> Key: YARN-1510
> URL: https://issues.apache.org/jira/browse/YARN-1510
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1510-YARN-1197.1.patch, 
> YARN-1510-YARN-1197.2.patch, YARN-1510.3.patch, YARN-1510.4.patch, 
> YARN-1510.5.patch, YARN-1510.6.patch
>
>
> As described in YARN-1197, YARN-1449, we need add API in NMClient to support
> 1) sending request of increase/decrease container resource limits
> 2) get succeeded/failed changed containers response from NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834

2015-10-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983627#comment-14983627
 ] 

Vinod Kumar Vavilapalli commented on YARN-4032:
---

[~adhoot] / [~jianhe] / [~kasha], any update on this? Considering this for a 
2.7.2 RC this weekend. Unless I hear otherwise, I'll move it out to 2.7.3 
assuming this needs more time. Thanks.

> Corrupted state from a previous version can still cause RM to fail with NPE 
> due to same reasons as YARN-2834
> 
>
> Key: YARN-4032
> URL: https://issues.apache.org/jira/browse/YARN-4032
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4032.prelim.patch
>
>
> YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if 
> someone is upgrading from a previous version, the state can still be 
> inconsistent and then RM will still fail with NPE after upgrade to 2.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982262#comment-14982262
 ] 

Hudson commented on YARN-3580:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1340 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1340/])
Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev 
d2e01f4ed87c3c41156ec9a68855f923f8c0adf9)
* hadoop-yarn-project/CHANGES.txt


> [JDK 8] TestClientRMService.testGetLabelsToNodes fails
> --
>
> Key: YARN-3580
> URL: https://issues.apache.org/jira/browse/YARN-3580
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
> Environment: JDK 8
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>  Labels: jdk8
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-3580.001.patch
>
>
> When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982273#comment-14982273
 ] 

Hudson commented on YARN-3580:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #617 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/617/])
Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev 
d2e01f4ed87c3c41156ec9a68855f923f8c0adf9)
* hadoop-yarn-project/CHANGES.txt


> [JDK 8] TestClientRMService.testGetLabelsToNodes fails
> --
>
> Key: YARN-3580
> URL: https://issues.apache.org/jira/browse/YARN-3580
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
> Environment: JDK 8
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>  Labels: jdk8
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-3580.001.patch
>
>
> When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4314) Adding container wait time as a metric at queue level and application level.

2015-10-30 Thread Raju Bairishetti (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982340#comment-14982340
 ] 

Raju Bairishetti commented on YARN-4314:


I feel adding timestamp to each resource request will be costly and all the 
existing applications will need to migrate to use this metric.  

Had a discussion with [~sriksun] earlier about this approach. Resource request 
is prepared by AM. In future if we want to use this timestamp as priority for 
allocating resources then there is a chance that user/AM can misuse the system 
by saying they have older time stamps.

Thinking about this approach:
AppSchedulingInfo has all the scheduling info about an application. When RM 
receives first resource a request from AM then RM can note down the system time 
as resource request time. Whenever new request comes(i.e. 
UpdateResourceRequest() in AppSchedulingInfo or allocate()) then we can measure 
how many containers were waited till this time from the last request time. We 
can mostly listen on the container request & allocate events. 

 I will put up a detailed doc with all my thoughts & approaches.

> Adding container wait time as a metric at queue level and application level.
> 
>
> Key: YARN-4314
> URL: https://issues.apache.org/jira/browse/YARN-4314
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> There is a need for adding the container wait-time which can be tracked at 
> the queue and application level. 
> An application can have two kinds of wait times. One is AM wait time after 
> submission and another is total container wait time between AM asking for 
> containers and getting them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982216#comment-14982216
 ] 

Hudson commented on YARN-4320:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8729 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8729/])
YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no 
(ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java


> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 3.0.0, 2.8.0, 2.7.2
>
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982217#comment-14982217
 ] 

Hudson commented on YARN-4312:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8729 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8729/])
Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev 
d21214ce33cb176926aa3ae5a9f4efe00f66480b)
* hadoop-yarn-project/CHANGES.txt


> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> 

[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982294#comment-14982294
 ] 

Hudson commented on YARN-3580:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #554 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/554/])
Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev 
d2e01f4ed87c3c41156ec9a68855f923f8c0adf9)
* hadoop-yarn-project/CHANGES.txt


> [JDK 8] TestClientRMService.testGetLabelsToNodes fails
> --
>
> Key: YARN-3580
> URL: https://issues.apache.org/jira/browse/YARN-3580
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
> Environment: JDK 8
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>  Labels: jdk8
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-3580.001.patch
>
>
> When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983732#comment-14983732
 ] 

Hadoop QA commented on YARN-4132:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 7s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 15s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk cannot run convertXmlToText from findbugs {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 8s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 24s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 247, now 247). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 12s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_60. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 50s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 45s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 1s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.7.0_79. {color} |
| 

[jira] [Commented] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834

2015-10-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983682#comment-14983682
 ] 

Jian He commented on YARN-4032:
---

The problem in  YARN-2834 is  that if there is an app existing in state-store 
that:
- app state = final state
- attempt state = null
RM will fail with NPE on recovery.

One approach is to delete this inconsistent state app from state-store, is that 
considered ?

Regarding the patch, it captures all exception in app.recover and return 
FAILED.  If the application previously ended as FINISHED, the app is changed to 
FAILD, which I think is inconsistent to user. Also, this exception will happen 
again and again whenever RM gets restarted.
I think what we can do is to check whether app is at FINAL state in 
RMAppAttemptImpl#AttemptRecoveredTransition, skip adding attempt into scheduler 
if it is. 

> Corrupted state from a previous version can still cause RM to fail with NPE 
> due to same reasons as YARN-2834
> 
>
> Key: YARN-4032
> URL: https://issues.apache.org/jira/browse/YARN-4032
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4032.prelim.patch
>
>
> YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if 
> someone is upgrading from a previous version, the state can still be 
> inconsistent and then RM will still fail with NPE after upgrade to 2.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4323) AMRMClient does not respect SchedulerResourceTypes post YARN-2448

2015-10-30 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-4323:
--
Affects Version/s: 2.6.0

> AMRMClient does not respect SchedulerResourceTypes post YARN-2448
> -
>
> Key: YARN-4323
> URL: https://issues.apache.org/jira/browse/YARN-4323
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Hitesh Shah
>
> Given that the RM now informs the AM of the resources it supports, AMRMClient 
> should be changed to match correctly by normalizing the invalid resource 
> types.
> i.e. AMRMClient::getMatchingRequests() should correctly return back matches 
> by only looking at the resource types that are valid. 
> \cc [~vvasudev] [~bikassaha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2934) Improve handling of container's stderr

2015-10-30 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2934:

Attachment: YARN-2934.v1.001.patch

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4163) Audit getQueueInfo and getApplications calls

2015-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983771#comment-14983771
 ] 

Hadoop QA commented on YARN-4163:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 51, now 50). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 1s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 46s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 145m 54s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.0 Server=1.7.0 
Image:test-patch-base-hadoop-date2015-10-31 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12769879/YARN-4163.2.patch |
| JIRA Issue | YARN-4163 |
| Optional Tests |  asflicense  javac  

[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-10-30 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983840#comment-14983840
 ] 

Naganarasimha G R commented on YARN-2934:
-

[~jira.shegalov],
Apologies from my side, missed to update on this jira, thanks [~nijel] for 
pushing me on this !
I have attached an initial version of the patch which suffices but following 
points are open for discussion 
# Errorfilename is sent from the application/client hence name can be any 
thing. Currently have used WildCard Matcher on fileName and the pattern used is 
"*stderr*" and also made it configurable. If the admin wants to attach multiple 
types then Regex matching would be required, but regex pattern is not similar 
to unix pattern hence for the sake of simplicity have kept Wildcard. Thoughts?
# Or other way round may be i can support as you were mentioning like, support 
ApplicationSubmissionContext to expose interface for clients to inform the 
error filename and on error use this name directly if not present then use the 
pattern matching. But my concern is what if the error file name mentioned as 
part of  interface is not matching the execution command sent !. 
# Also [~bikassaha] mentioned in his [comment|  
https://issues.apache.org/jira/browse/YARN-3911?focusedCommentId=14959579=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14959579],
 do i need to additionally check for syslog also? if so only if the syserr 
doesnt exist ? and similarly what approaches would be ideal for identifying the 
name of the log4j file? 
# have hard coded tail size to be 4k, is it required to be configurable ?


> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3911) Add tail of stderr to diagnostics if container fails to launch or it container logs are empty

2015-10-30 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983841#comment-14983841
 ] 

Naganarasimha G R commented on YARN-3911:
-

Hi [~bikassaha], 
I have attached a initial patch in YARN-2934. Hope for feedback on this there, 
and i think i close this jira as duplicate. Thoughts ?

> Add tail of stderr to diagnostics if container fails to launch or it 
> container logs are empty
> -
>
> Key: YARN-3911
> URL: https://issues.apache.org/jira/browse/YARN-3911
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>
> The stderr may have useful info in those cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4323) AMRMClient does not respect SchedulerResourceTypes post YARN-2448

2015-10-30 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-4323:
-

 Summary: AMRMClient does not respect SchedulerResourceTypes post 
YARN-2448
 Key: YARN-4323
 URL: https://issues.apache.org/jira/browse/YARN-4323
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah


Given that the RM now informs the AM of the resources it supports, AMRMClient 
should be changed to match correctly by normalizing the invalid resource types.

i.e. AMRMClient::getMatchingRequests() should correctly return back matches by 
only looking at the resource types that are valid. 

\cc [~vvasudev] [~bikassaha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3911) Add tail of stderr to diagnostics if container fails to launch or it container logs are empty

2015-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983845#comment-14983845
 ] 

Bikas Saha commented on YARN-3911:
--

Sure

> Add tail of stderr to diagnostics if container fails to launch or it 
> container logs are empty
> -
>
> Key: YARN-3911
> URL: https://issues.apache.org/jira/browse/YARN-3911
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>
> The stderr may have useful info in those cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4322) local framwork + io.ReadaheadPool is throwing Failed readahead on ifile EBADF: Bad file descriptor

2015-10-30 Thread Mohammad Shahid Khan (JIRA)
Mohammad Shahid Khan created YARN-4322:
--

 Summary: local framwork + io.ReadaheadPool is throwing Failed 
readahead on ifile EBADF: Bad file descriptor
 Key: YARN-4322
 URL: https://issues.apache.org/jira/browse/YARN-4322
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: Linux 2.6.32.12-0.7-default x86_64
Reporter: Mohammad Shahid Khan


run the pi job with 100 map and 2 reduce

io.ReadaheadPool is throwing Failed readahead on ifile EBADF: Bad file 
descriptor 

stacktrace
{code}
15/10/30 16:47:23 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output 
of size: 24, inMemoryMapOutputs.size() -> 89, commitMemory -> 2112, usedMemory 
->2136
15/10/30 16:47:23 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
Method)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at 
org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/10/30 16:47:23 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
Method)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at 
org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/10/30 16:47:23 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle 
output of map attempt_local108678459_0001_m_86_0 decomp: 24 len: 28 to 
MEMORY
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982518#comment-14982518
 ] 

Hudson commented on YARN-4312:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #618 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/618/])
Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev 
d21214ce33cb176926aa3ae5a9f4efe00f66480b)
* hadoop-yarn-project/CHANGES.txt


> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> 

[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982517#comment-14982517
 ] 

Hudson commented on YARN-4320:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #618 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/618/])
YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no 
(ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* hadoop-yarn-project/CHANGES.txt


> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 3.0.0, 2.8.0, 2.7.2
>
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982540#comment-14982540
 ] 

Hudson commented on YARN-4320:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2492 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2492/])
YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no 
(ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* hadoop-yarn-project/CHANGES.txt


> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 3.0.0, 2.8.0, 2.7.2
>
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982486#comment-14982486
 ] 

Hudson commented on YARN-4320:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1341 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1341/])
YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no 
(ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* hadoop-yarn-project/CHANGES.txt


> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 3.0.0, 2.8.0, 2.7.2
>
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-10-30 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982594#comment-14982594
 ] 

Varun Saxena commented on YARN-4320:


Thanks [~ozawa] for the review and commit.

> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 3.0.0, 2.8.0, 2.7.2
>
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982478#comment-14982478
 ] 

Hudson commented on YARN-4320:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #606 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/606/])
YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no 
(ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java


> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 3.0.0, 2.8.0, 2.7.2
>
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982479#comment-14982479
 ] 

Hudson commented on YARN-4312:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #606 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/606/])
Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev 
d21214ce33cb176926aa3ae5a9f4efe00f66480b)
* hadoop-yarn-project/CHANGES.txt


> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> 

[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982511#comment-14982511
 ] 

Hudson commented on YARN-4312:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2548 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2548/])
Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev 
d21214ce33cb176926aa3ae5a9f4efe00f66480b)
* hadoop-yarn-project/CHANGES.txt


> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> 

[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982510#comment-14982510
 ] 

Hudson commented on YARN-4320:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2548 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2548/])
YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no 
(ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* hadoop-yarn-project/CHANGES.txt


> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 3.0.0, 2.8.0, 2.7.2
>
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3580) [JDK 8] TestClientRMService.testGetLabelsToNodes fails

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982541#comment-14982541
 ] 

Hudson commented on YARN-3580:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2492 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2492/])
Move YARN-3580 in CHANGES.txt from 2.8 to 2.7.2. (ozawa: rev 
d2e01f4ed87c3c41156ec9a68855f923f8c0adf9)
* hadoop-yarn-project/CHANGES.txt


> [JDK 8] TestClientRMService.testGetLabelsToNodes fails
> --
>
> Key: YARN-3580
> URL: https://issues.apache.org/jira/browse/YARN-3580
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
> Environment: JDK 8
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>  Labels: jdk8
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-3580.001.patch
>
>
> When using JDK 8, {{TestClientRMService.testGetLabelsToNodes}} fails:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetLabelsToNodes(TestClientRMService.java:1499)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982543#comment-14982543
 ] 

Hudson commented on YARN-4312:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2492 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2492/])
Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev 
d21214ce33cb176926aa3ae5a9f4efe00f66480b)
* hadoop-yarn-project/CHANGES.txt


> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> 

[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982487#comment-14982487
 ] 

Hudson commented on YARN-4312:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1341 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1341/])
Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev 
d21214ce33cb176926aa3ae5a9f4efe00f66480b)
* hadoop-yarn-project/CHANGES.txt


> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> 

[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982595#comment-14982595
 ] 

Varun Saxena commented on YARN-4312:


Thanks [~ozawa] for the review and commit.

> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> 

[jira] [Commented] (YARN-4321) Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode

2015-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983556#comment-14983556
 ] 

Hadoop QA commented on YARN-4321:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 8m 2s 
{color} | {color:red} root in branch-2.7 failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} branch-2.7 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 16s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in branch-2.7 cannot run convertXmlToText from findbugs {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 886 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 23s 
{color} | {color:red} The patch has 95 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 46s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 16s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 40m 41s 
{color} | {color:red} Patch generated 67 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 160m 39s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 

[jira] [Commented] (YARN-4314) Adding container wait time as a metric at queue level and application level.

2015-10-30 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982718#comment-14982718
 ] 

Allen Wittenauer commented on YARN-4314:


bq. I feel adding timestamp to each resource request will be costly and all the 
existing applications will need to migrate to use this metric. 

The fact that the RM doesn't keep track of timestamps on containers at all is 
particularly annoying when one wants to implement certain types of scheduling 
policies (e.g., kill containers that are X old).  I think it's inevitable they 
are going to be required if not now, later.

> Adding container wait time as a metric at queue level and application level.
> 
>
> Key: YARN-4314
> URL: https://issues.apache.org/jira/browse/YARN-4314
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> There is a need for adding the container wait-time which can be tracked at 
> the queue and application level. 
> An application can have two kinds of wait times. One is AM wait time after 
> submission and another is total container wait time between AM asking for 
> containers and getting them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982721#comment-14982721
 ] 

Hudson commented on YARN-4312:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #555 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/555/])
Add an entry of YARN-4312 to CHANGES.txt (ozawa: rev 
d21214ce33cb176926aa3ae5a9f4efe00f66480b)
* hadoop-yarn-project/CHANGES.txt


> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> 

[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982720#comment-14982720
 ] 

Hudson commented on YARN-4320:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #555 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/555/])
YARN-4320. TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no 
(ozawa: rev ce31b22739512804da38cf87e0ce1059e3128da3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* hadoop-yarn-project/CHANGES.txt


> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 3.0.0, 2.8.0, 2.7.2
>
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-10-30 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4132:
---
Attachment: YARN-4132.4.patch

update .4 patch to fix checkstyle and broken unit test

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.2.patch, YARN-4132.3.patch, YARN-4132.4.patch, 
> YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4163) Audit getQueueInfo and getApplications calls

2015-10-30 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4163:
---
Attachment: YARN-4163.2.patch

> Audit getQueueInfo and getApplications calls
> 
>
> Key: YARN-4163
> URL: https://issues.apache.org/jira/browse/YARN-4163
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4163.2.patch, YARN-4163.2.patch, YARN-4163.patch
>
>
> getQueueInfo and getApplications seem to sometimes cause spike of load but 
> not able to confirm due to they are not audit logged. This patch propose to 
> add them to audit log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4314) Adding container wait time as a metric at queue level and application level.

2015-10-30 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982221#comment-14982221
 ] 

Lavkesh Lahngir commented on YARN-4314:
---

 Basically for an application we can track the time taken between container 
request and allocation and add them to queue-metrics and application level 
metrics. 
Also we need to track the AM container wait time separately.. 
I Will update the details when I have figured out the exact code path. 
[~raju.bairishetti] : Would you like to comment something?

> Adding container wait time as a metric at queue level and application level.
> 
>
> Key: YARN-4314
> URL: https://issues.apache.org/jira/browse/YARN-4314
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> There is a need for adding the container wait-time which can be tracked at 
> the queue and application level. 
> An application can have two kinds of wait times. One is AM wait time after 
> submission and another is total container wait time between AM asking for 
> containers and getting them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4321) Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode

2015-10-30 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4321:
---
Attachment: YARN-4321-branch-2.7.01.patch

> Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode
> --
>
> Key: YARN-4321
> URL: https://issues.apache.org/jira/browse/YARN-4321
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4321-branch-2.7.01.patch
>
>
> This applies to only branch-2.7 or earlier code.
> When a {{NoAuthException}} is thrown in non HA mode(like in the scenario of 
> YARN-4127), RM incessantly keeps on retrying the ZK operation.
> {noformat}
> 2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree 
> (DataTree.java:processTxn(949)) - Ignoring processTxn failure hdr: -1 : 
> error: -102
> 2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)] 
> zookeeper.ClientCnxn (ClientCnxn.java:readResponse(818)) - Reading reply 
> sessionid:0x15092d1ebe10001, packet:: clientPath:null serverPath:null 
> finished:false header:: 7591,1  replyHeader:: 7591,7610,-102  request:: 
> '/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0  response::
> 2015-10-23 09:22:10,210 INFO  [ProcessThread(sid:0 cport:-1):] 
> server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(645)) - Got 
> user-level KeeperException when processing sessionid:0x15092d1ebe10001 
> type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error Path:null 
> Error:KeeperErrorCode = NoAuth
> {noformat}
> This is because we do not handle NoAuthException properly in branch-2.7 code 
> when HA is not enabled.
> In {{ZKRMStateStore#runWithRetries}}, we have code as under. As can be seen 
> if HA is not enabled, we neither rethrow NoAuthException nor do we have any 
> logic to increment retries and back out if retries are maxed out.
> {code}
>  T runWithRetries() throws Exception {
>   int retry = 0;
>   while (true) {
> try {
>   return runWithCheck();
> } catch (KeeperException.NoAuthException nae) {
>   if (HAUtil.isHAEnabled(getConfig())) {
> // NoAuthException possibly means that this store is fenced due to
> // another RM becoming active. Even if not,
> // it is safer to assume we have been fenced
> throw new StoreFencedException();
>   }
> } catch (KeeperException ke) {
>   .
>}
>  }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4321) Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode

2015-10-30 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983029#comment-14983029
 ] 

Varun Saxena commented on YARN-4321:


Straightforward fix.
I think we dont need to retry for NoAuthException as the exception is unlikely 
to change even after retries.

> Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode
> --
>
> Key: YARN-4321
> URL: https://issues.apache.org/jira/browse/YARN-4321
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4321-branch-2.7.01.patch
>
>
> This applies to only branch-2.7 or earlier code.
> When a {{NoAuthException}} is thrown in non HA mode(like in the scenario of 
> YARN-4127), RM incessantly keeps on retrying the ZK operation.
> {noformat}
> 2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree 
> (DataTree.java:processTxn(949)) - Ignoring processTxn failure hdr: -1 : 
> error: -102
> 2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)] 
> zookeeper.ClientCnxn (ClientCnxn.java:readResponse(818)) - Reading reply 
> sessionid:0x15092d1ebe10001, packet:: clientPath:null serverPath:null 
> finished:false header:: 7591,1  replyHeader:: 7591,7610,-102  request:: 
> '/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0  response::
> 2015-10-23 09:22:10,210 INFO  [ProcessThread(sid:0 cport:-1):] 
> server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(645)) - Got 
> user-level KeeperException when processing sessionid:0x15092d1ebe10001 
> type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error Path:null 
> Error:KeeperErrorCode = NoAuth
> {noformat}
> This is because we do not handle NoAuthException properly in branch-2.7 code 
> when HA is not enabled.
> In {{ZKRMStateStore#runWithRetries}}, we have code as under. As can be seen 
> if HA is not enabled, we neither rethrow NoAuthException nor do we have any 
> logic to increment retries and back out if retries are maxed out.
> {code}
>  T runWithRetries() throws Exception {
>   int retry = 0;
>   while (true) {
> try {
>   return runWithCheck();
> } catch (KeeperException.NoAuthException nae) {
>   if (HAUtil.isHAEnabled(getConfig())) {
> // NoAuthException possibly means that this store is fenced due to
> // another RM becoming active. Even if not,
> // it is safer to assume we have been fenced
> throw new StoreFencedException();
>   }
> } catch (KeeperException ke) {
>   .
>}
>  }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode

2015-10-30 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983030#comment-14983030
 ] 

Varun Saxena commented on YARN-4127:


[~jianhe], raised YARN-4321 for this issue.

> RM fail with noAuth error if switched from failover mode to non-failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127-branch-2.7.01.patch, YARN-4127.01.patch, 
> YARN-4127.02.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)